Hastie, T.; Tibshirani, R.; and Friedman, J. 2003. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer-Verlag.
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604-632.
Berry, M. W.; Drmac, Z.; and Jessup, E. R. 1999. Matrices, vector spaces, and information retrieval. SIAM Review 41(2):335-362.
Borodin, A.; Roberts, G. O.; Rosenthal, J. S.; and Tsaparas, P. 2001. Finding authorities and hubs from link structures on the World Wide Web. In Proceedings of the 10th International Conference on World Wide Web, 415-429.
Roweis, S. 1998. EM algorithms for PCA and SPCA. In Proceedings of the Neural Information Processing Systems, 626-632.
Tipping, M. E., and Bishop, C. M. 1998. Mixture of probabilistic principal component analysers. Technical report NCRG/97/003, Microsoft Research, July 1998.
Tipping, M. E., and Bishop, C. M. 1999. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B 61(3):611-622.
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence.
Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, 50-57.
Cohn, D., and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of the 17th International Conference on Machine Learning, 167-174.
Cohn, D., and Hoffman, T. 2001. The missing link - A probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13:430-436.
Scholkopf, B. 2000. The kernel trick for distances. Technical report MSR-TR-2000-51, Microsoft Research, May 2000.
Fasel, I. 2001. Scholkopf, Smola and Muller: Kernel PCA. Technical report.
Bach, F. R., and Jordan, M. I. 2001. Kernel independent component analysis. Technical report UCB/CSD-01-1166, University of California, November 2001.
Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; and Saul, L. K. 1999. An introduction to variational methods for graphical models. Learning in Graphical Models. Cambridge, MA: MIT Press.
Jaakkola, T. 2000. Tutorial on variational approximation methods. In Advanced Mean Field Methods: Theory and Practice. Cambridge, MA: MIT Press.
Lu, X.; Hauskrecht, M.; and Day, R. S. 2002. Variational Bayesian learning of the cooperative vector quantizer model. Part I: The theory. Technical report CBMI-02-181, Center for Biomedical Informatics, University of Pittsburgh, 2002.
Andrieu, C.; de Freitas, N.; Doucet, A.; and Jordan, M. I. 2003. An introduction to MCMC for machine learning. Machine Learning 50:5-43.
Burges, C. J. C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2:121–167.
Smola, A. J., and Scholkopf, B. 1998. A tutorial on support vector regression. NeuroCOLT2 Technical report NC2-TR-1998-30, October 1998.
Shapire, R. E. 1990. The strength of weak learnability. Machine Learning 5(2):197-227.
Schapire, R. E.; Freund, Y.; Barlett, P.; and Lee, W. S. 1998. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics 26(5):1651-1686.
Freund, Y., and Schapire, R. E. 1999. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14(5):771-780.
Friedman, J.; Hastie, T.; and Tibshirani, R. 1999. Additive logistic regression: A statistical view of boosting. Technical report, Stanford University, November 1999.
Freund, Y., and Schapire, R. E. 2000. Discussion of the paper "Boosting the margin: A new explanation for the effectiveness of voting methods". Technical report, AT&T, January 2000.
Schapire, R. E. 2002. The boosting approach to machine learning: An overview. MSRI Workshop on Nonlinear Estimation and Classification.
Russel, S., and Norvig, P. 1995. Artificial Intelligence: A Modern Approach, chapters 14-17. Upper Saddle River, NY: Prentice-Hall.
Jordan, M. Introduction to Graphical Models, Chapter 2: Basic concepts - joint probabilities and conditional independence.
Jordan, M. Introduction to Graphical Models, Chapter 15: Markov properties.
Charniak, E. 1991. Bayesian networks without tears. AI Magazine 12:50-63.
Heckerman, D. 1995. A tutorial on learning with Bayesian networks. Technical report MSR-TR-95-06, Microsoft Research, March 1995.
Buntine, W. 1996. A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering 8(2):195-210.
Jordan, M. Introduction to Graphical Models, Chapter 3: Basic concepts - the elimination algorithm.
Jordan, M. Introduction to Graphical Models, Chapter 16: Basic concepts - the junction tree algorithm.
Dechter, R. 1996. Bucket elimination: A unifying framework for probabilistic inference. In Uncertainty in Artificial Intelligence, 211-219.
Huang, C., and Darwiche, A. 1996. Inference in belief networks: A procedural guide. International Journal of Approximate Reasoning 15:225-263.
Neapolitan, R. E. 2003. Learning Bayesian Networks, Chapter 4.2: Approximate inference. Upper Saddle River, NJ: Prentice Hall.
Jordan, M. Introduction to Graphical Models, Chapter 8: Completely observed graphical models.
Jordan, M. Introduction to Graphical Models, Chapter 10: The EM algorithm.
Dempster, A. P.; Laird, N. M.; and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1):1-28.
Thiesson, B.; Meek, C.; and Heckerman, D. 2001. Accelerating EM for large databases. Machine Learning 45:279-299.
Eskin, E. 2000. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the 2000 International Conference on Machine Learning.
Eskin, E. 2000. Detecting errors within a corpus using anomaly detection. In Proceedings of 2000 North American Chapter of the Association of Computational Linguistics.
Miller, C. J.; Genovese, C.; Nichol, R. C.; Wasserman, L.; Connolly, A.; Reichart, D.; Hopkins, A.; Schneider, J.; and Moore, A. 2001. Controlling the false discovery rate in astrophysical data analysis. Technical report, Carnegie Mellon University, July 2001.
Wong, W. K.; Moore, A. M.; Cooper, G. F.; and Wagner M. M. 2002. Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence, 217-223.
Wong, W. K.; Moore, A. M.; Cooper, G. F.; and Wagner, M. M. 2003. Bayesian network anomaly pattern detection for disease outbreaks. In Proceedings of the 20th International Conference on Machine Learning.
Dickinson, M., and Meurers, W. D. 2003. Detecting errors in part-of-speech annotation. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, 107-114.
Russel, S., and Norvig, P. 1995. Artificial Intelligence: A Modern Approach, chapters 11-13. Upper Saddle River, NY: Prentice-Hall.
Sutton, R. S., and Barto, A. G. 1999. Reinforcement Learning. Cambridge, MA: MIT Press.
Littman, M. L. 1996. Algorithms for Sequential Decision Making. Ph.D. thesis, Brown Univeristy, chapters 1-2.
Hansen, E. A. 1998. Solving POMDPs by searching in policy space. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 211-219.
Kaelbling, L. P.; Littman, M. L.; and Cassandra, A. R. 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence 101:99-134.
Hauskrecht, M.; Meuleau, N.; Boutilier, C.; Kaelbling, L. P.; and Dean. T. 1998. Hierarchical solution of Markov decision processes using macro-actions. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 220-229.
Boutilier, C.; Dean, T.; and Hanks, S. 1999. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11:1-94.
Meuleau, N.; Kim, K.; Kaelbling, L. P.; and Cassandra, A. R. 1999. Solving POMDPs by searching the space of finite policies. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence 417-426.
Baxter, J., and Bartlett, P. L. 2000. Reinforcement learning in POMDP's via direct gradient ascent. In Proceedings of the 17th International Conference on Machine Learning, 41-48.
Hauskrecht, M. 2000. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research 13:33-94.
Meuleau, N.; Peshkin, L.; and Kim, K. 2001. Exploration in gradient-based reinforcement learning. AI Memo 2001-003, Massachusetts Institute of Technology, April 2001.
Guestrin, C.; Koller, D.; and Parr, R. 2001. Max-norm projections for factored MDPs. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, 673-680.
Schuurmans, D., and Patrascu, R. 2001. Direct value-approximation for factored MDPs. Advances in Neural Information Processing Systems 14.
Guestrin, C.; Koller, D.; and Parr, R. 2001. Multiagent planning with factored MDPs. Advances in Neural Information Processing Systems 14.
Poupart, P.; Boutilier, C.; Patrascu, R.; and Schuurmans, D. 2002. Piecewise linear value function approximation for factored MDPs. In Proceedings of the 18th National Conference on Artificial Intelligence.
Patrascu, R.; Poupart, P.; Schuurmans, D.; Boutilier, C.; Guestrin, C. 2002. Greedy linear value-approximation for factored Markov decision processes. In Proceedings of the 18th National Conference on Artificial Intelligence, 285-291.
de Farias, D. P., and Van Roy, B. 2003. The linear programming approach to approximate dynamic programming. Operations Research 51(6):850-865.
Trick, M. A., and Zin, S. E. 1993. A linear programming approach to solving stochastic dynamic programs. Technical report, Carnegie Mellon University, August 1993.
Hauskrecht, M., and Kveton, B. 2003. Linear program approximations for factored continuous-state Markov decision processes. In Proceedings of the 17th Annual Conference on Neural Information Processing Systems.