Intelligent Systems AI Forum, Fall 2006

Friday, September 1, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker: Yun Niu, University of Toronto

Title: Analysis of Semantic Classes: Toward Non-Factoid Question Answering

Abstract: A question answering (QA) system aims to provide accurate and concise answers to natural language questions. Current research in QA features two major types: factoid and non-factoid QA. Factoid QA usually have named entities as answers. In contrast, answers to non-factual questions often consist of multiple pieces of information from multiple sources. A lot of achievements have been made to answer fact-based questions, while research on non-factoid QA is still at the early stage. In the talk, I'll discuss our work in non-factoid QA.
QA in medicine is a good context for investigating non-factoid QA as answers to clinical questions are often obtained by combining multiple pieces of relevant information. We propose ``semantic class analysis'' as the organizing principle in medical QA. This talk will address the problem of analyzing some important properties of semantic classes -- cores and polarity. The role of polarity in identifying relevant information in answer construction will also be discussed.


Friday, September 15, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker:Ellen Campana

Title:Cognitive Load and Spoken Interface Design: Comparing Natural and Standardized Approaches to the Generation of Referring Expressions

Abstract: Human language capabilities are both context-dependant and flexible. On the one hand psycholinguistic evidence suggests that listeners naturally and rapidly integrate elements of the visual, discourse-level, and social context with incoming speech, using these elements to improve the speed at which they identify the intended referents of referring expressions. On the other hand research in human-human interaction has also shown that listeners are flexible in their use of language in that they are able to adapt to speaker-dependant patterns, and that they are capable of establishing and using new referring expressions and sub-languages for specific domains. In the spoken language interface literature, these two sets of findings have been used to support two different approaches to interface design, which I call the "natural" approach and the "standardized" approach. The natural approach argues that in order to be easy to use, such interfaces should approximate human-human interaction as closely as possible, including context-dependant generation and understanding of referring expressions. The standardized approach argues that instead systems should take advantage of human abilities to learn and adapt while minimizing computational complexity. Thus, users should be exposed to and use consistent, non-context-dependant referring expressions so that the systems will be easier to learn. There is little direct empirical evidence examining which of these design approaches results in less cognitive load on the part of system users. In this talk I will describe the results from my research applying a classic tool of cognitive psychology, the dual-task paradigm, to spoken interface evaluation with the goal of comparing the two approaches directly. Specifically, I examine natural and standardized design approaches with respect to the role of discourse context in user comprehension / system generation of referring expressions.


Friday, September 22, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker:Nigel Ward (University of Texas at El Paso)

Title: Learning to Show You're Listening: A Trainer for Back-Channeling in Arabic

Abstract: Good listeners generally produce back-channel feedback, and do so in a language-appropriate way. Second language learners often lack this skill. We present a training sequence which enables learners to acquire a basic Arabic back-channel skill, namely, that of producing feedback immediately after the speaker produces a sharp pitch downslope. This training sequence includes an explanation, audio examples, the use of visual signals to highlight occurrences of the pitch downslope, auditory and visual feedback on learners' attempts to produce the cue themselves, and feedback on the learners' performance as they play the role of an attentive listener in response to one side of a pre-recorded dialog. Preliminary experiments suggest that this allows some learners to acquire this behavior. The talk will also touch on the role of back-channels in various types of dialog, methods for the discovery and quantification of dialog-relevant prosodic cues, potential cross-cultural misunderstandings of prosodic signals, the interplay between meta-communication and the communication of content, and ways to quantify the value of good turn-taking relative to other dialog skills.


Friday, September 29, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker1:Wendy Chapman

Title:Identifying Cases of Acute Lower Respiratory Syndrome from Free-text ED Reports

Abstract: Syndromic surveillance systems often monitor ICD-9 discharge diagnoses or free-text chief complaints to look for respiratory outbreaks. I will describe our in-progress development of a system called SySTR (Syndromic Surveillance from Textual Reports) that identifies relevant patients from dictated emergency department (ED) reports. SySTR comprises three modules: 1) a classifier learned from manually annotated features in ED reports; 2) an NLP module that identifies respiratory-related features from the reports; and 3) NLP modules for identifying contextual information such as negation and temporality and for using that information to determine the values for the features used by the classifier.

Speaker2:Rosta Farzan

Title:Social Navigation Support in a Course Recommendation System

Abstract: The volume of course-related information available to students is rapidly increasing. This abundance of information has created the need to help students find, organize, and use resources that match their individual goals, interests, and current knowledge. Our system, CourseAgent, presented in this paper, is an adaptive community-based hypermedia system, which provides social navigation course recommendations based on students' assessment of course relevance to their career goals. CourseAgent obtains students' explicit feedback as part of their natural interactivity with the system. This work presents our approach to eliciting explicit student feedback and then evaluates this approach.


Friday, October 6, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker1:Min Chi

Title:The Impact of Explicit Problem-Solving Strategy Instruction in Probability on Learning Physics through Intelligent Tutoring Systems

Abstract: In this article, we explored how learning a problem-solving strategy impacted initial and future learning through Intelligent Tutoring Systems (ITS). We present data from a study in which students learned two unrelated deductive domains, probability and physics. During the probability instruction, half the students (experimental group) were trained on an ITS that taught an explicit problem-solving strategy while the other half (control group) were trained on another ITS without any explicit strategy instruction. During the subsequent physics instruction, both groups were trained on the same ITS, which did not teach any explicit strategy. The superiority of the initial explicit strategy instruction was found in both domains. The experimental group performed significantly better than the control group in both probability and physics. Thus, we argue that explicit problem-solving strategy instruction in probability prepared students' future learning in physics through ITSs.

Speaker2:Kurt VanLehn


Friday, November 3, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker1:Behrang Mohit

Title:Intelligent Usage of Human Knowledge in Machine Translation

Abstract: A Machine Translation system has various levels of translation difficulties for different segments of a sentence. We propose a system that segments a source language sentence and predicts (classifies) which phrases are difficult or easy to translate. These easy-difficult classes are system-dependent notions and are determined by machine translation quality (judged by BLEU score). Through an oracle study, we use human translation for the difficult phrases. We hypothesize that such a pre-translation step improves the translation quality even for the segments that are not translated by humans. Our phrase classifier uses syntactic and lexical features of the source language and also features from the underlying MT system. The results of our studies are supportive of our hypothesis and we achieve promising accuracy for our easy-difficult phrase classifier.

Speaker2:Amruta Purandare

Title: Humor: Prosody Analysis and Automatic Recognition for FRIENDS

Abstract: We analyze humorous spoken conversations from a classic comedy television show, FRIENDS, by examining acoustic-prosodic and linguistic features and their utility in automatic humor recognition. Using a simple annotation scheme, we automatically label speaker turns in our corpus that are followed by "laughs" as Humorous, and the rest as Non-Humorous. Our humor-prosody analysis reveals significant differences in prosodic characteristics (such as pitch, tempo, energy etc.) of humorous and non-humorous speech. Humor recognition was carried out using standard supervised learning classifiers, and shows promising results significantly above the baseline.


Friday, November 17, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker:Michael Ringenberg

Title:Scaffolding Problem Solving with Embedded Examples to Promote Deep Learning.

Abstract: This study compared the relative utility of an intelligent tutoring system that uses procedure-based hints to a version that uses worked-out examples. The system, Andes, taught college level physics. In order to test which strategy produced better gains in competence, two versions of Andes were used: one offered participants graded hints and the other offered annotated, worked-out examples in response to their help requests. We found that providing examples was at least as effective as the hint sequences and was more efficient in terms of the number of problems it took to obtain the same level of mastery.

Speaker:Janyce Wiebe

Title:Word Sense and Subjectivity

Abstract:Subjectivity and meaning are both important properties of language. This paper explores their interaction, and brings empirical evidence in support of the hypotheses that (1) subjectivity is a property that can be associated with word senses, and (2) word sense disambiguation can directly benefit from subjectivity annotations.


Friday, December 1, 2006

Time: 12:00 noon

Place: 5317 Sennott Square

Speaker1:Tomas Singliar

Title:Online Temporal Clustering for Outbreak Detection

Abstract: We present Cluster Onset Detection (COD), a novel algorithm to aid in the detection of epidemic outbreaks. COD employs unsupervised learning techniques in an online setting to partition the population into subgroups, thus increasing the ability to make a detection over the population as a whole. The population divides into groups that are susceptible to different attack types. COD attempts to detect a cluster made up primarily of infected hosts. We argue that this technique is largely complementary to the existing methods for outbreak detection and can generally be combined with one or more of them. We show empirical results applying COD to the problem of detecting a worm attack on a system of networked computers, and show that this method results in approximately 40% lower infection rate at a false positive rate of 1 per week than the best previously reported results on this data set achieved using an HMM model customized for the outbreak detection task.

Speaker2:Marek Druzdzel

Title:Dynamic Weighting A*Search-Based MAP Algorithm for Bayesian Networks

Abstract: In this talk I will present the Dynamic Weighting A* (DWA*) search algorithm for solving MAP problems in Bayesian networks. By exploiting asymmetries in the joint distributions over random variables, the algorithm is able to greatly reduce the search space and offer excellent performances both in terms of accuracy and efficiency. This is joint work with Xiaoxun Sun and Changhe Yuan.


Friday, December 8, 2006

Time: 2:00 pm

Place: 5313 Sennott Square

Speaker:Raymond Mooney (University of Texas at Austin)

Bio: Raymond J. Mooney is a Professor in the Department of Computer Sciences at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 100 published research papers, primarily in the area of machine learning. He was program co-chair of the 2006 National Conference on Artificial Intelligence, general chair of the 2005 joint Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, co-chair of the 1990 International Conference on Machine Learning, a recipient of the Best Research Paper Award at the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, a former editor of the Machine Learning journal, and a Fellow of the American Association for Artificial Intelligence. His recent research has focused on learning for natural-language processing, text mining, statistical relational learning, transfer learning, active learning, semi-supervised learning, bioinformatics, and autonomic computing.

Title:Learning to Extract Proteins and their Interactions from Biomedical Text

Abstract: Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data on human genes from the 11 million abstracts in Medline. We have developed and evaluated a variety of learned information-extraction systems for identifying human proteins and their interactions in Medline abstracts. We will present our current best results on identifying names of human proteins using Conditional Random Fields and Relational Markov Networks. We will also present our current best results on identifying interactions between proteins using a Support Vector Machine with an underlying string kernel. Finally, we will summarize results from a recent large-scale application of our techniques, in which we mined 753,459 Medline abstracts to extract a database of 6,580 interactions between 3,737 human proteins. By merging this extracted data with existing databases, we have constructed (to our knowledge) the largest database of known human-protein interactions containing 31,609 interactions amongst 7,748 proteins.