Automatic Detection of Negative and Positive User States in Spoken Dialogue Systems

Diane Litman

We are currently building an intelligent tutoring spoken dialogue system with the goal of using spoken and natural language processing to monitor and respond to student emotional states. This talk presents an empirical study examining the utility of a set of speech-based, text-based, and contextual features for automatically predicting student state. We first annotate student turns in a corpus of human-human spoken tutoring dialogues for negative, neutral and positive emotions. We then automatically extract features representing acoustic and prosodic information from the student speech signal, and linguistic information from associated transcriptions. We compare the results of a variety of machine learning experiments using different feature sets to predict the annotated emotions. Our best performing feature set includes acoustic-prosodic, lexical, syntactic, conversational, and contextual features. This feature set yields a prediction accuracy of 84.75%, which is a 44% relative improvement in error reduction over a baseline.