Budgeted Learning of Naive-Bayes Classifiers

Russell Greiner

Abstract

There is often a cost associated with acquiring training data. We consider the situation where the learner, with a fixed budget, can only see data that it has 'purchased' during training. Our first results deal with the simplest such problem: We are given a set of independent systems each of which, when queried, returns a reward, drawn from its specific Bernoulli distribution (think "tossing a coin"). Our goal is to identify which of these systems has the largest expected reward. To do this, we sequentially specify which system to query, subject to the constratint that we can only perform a fixed total number of queries. After explaining how this task differs from many related ideas, including active learning and standard bandit problems, we address the computational complexity of the problem, propose a number of algorithms, and report both the approximation properties of these algorithms and their empirical performance on various problem instances. Our results show that some novel algorithms (including "biased-robin") can significantly out-perform the obvious algorithms, such as round robin and random. We next use these results to learn the parameters of a naive-bayes classifier, subject to the same type of budget constraints. We again consider a number of algorithms, both familiar and novel, and again show empirically that some of our novel algorithms can often out-perform the standard ones.

See http://www.cs.ualberta.ca/~madani/BudgetedLearningPage.html. This is joint work with Dan Lizotte and Omid Madani.

Speaker Bio

After earning a PhD from Stanford, Russell Greiner worked in both academic and industrial research before settling at the University of Alberta, where he is now a Professor in Computing Science and the founding Director of the Alberta Ingenuity Centre for Machine Learning. He is a Program Chair for the 2004 "Int'l Conference on Machine Learning", an Editor-in-Chief for "Computational Intelligence", and serves on the editorial boards of a number of other journals. He has published over 90 refereed papers and patents, most in the areas of machine learning and knowledge representation. The main foci of his current work are (1) bioinformatics and medical informatics; (2) learning effective probabilistic models and (3) formal foundations of learnability.