Stefano's written questions

  • Tom Mitchell (Machine Learning)

    1. One perspective on machine learning is that all learning algorithms are search methods for exploring large spaces of potential hypotheses, in search of the hypothesis that best fits the given data.

      3.a
      Provide a survey of the main machine learning methods you know of, from this perspective. Describe each in terms of its search strategy and search space. Try to include at least one representative from each of the major search strategies used in the field.
      3.b
      Discuss the two or three most interesting differences among algorithms from this point of view. Among other things, consider practical implications of these differences on the robustness, efficiency, etc. of the learning methods for different problem domains.
      3.c
      Briefly describe at least one objection to this perspective on learning.

    2. Consider the problem of experimentally evaluating two learning algorithms: ID3 and Backpropagation. Imagine a bank gives you a set of 200 loan applications from 1995, each classified positive or negative according to whether the applicant should be granted the loan.

      4.a
      For each algorithm, the bank asks you to (1) apply it to learn the most accurate hypothesis you can, and (2) estimate as precisely as possible the accuracy of this hypothesis over future data. Note for Backpropagation this includes choosing an appropriate number of training iterations and hidden units. For ID3 it includes determining how to post-prune the decision tree. Describe the experimental approach you would take, including the details of any choices you would make to separate the available data into different subsets for different uses. Discuss tradeoffs that arise in making these choices, and defend the choices you make.
      4.b
      Now that you have finished, the bank informs you that in December they intend to rerun the learning algorithm you suggested, to take advantage of the new 1996 training data as well as the 1995 data. They are a little nervous, though, because they believe changes in the new tax laws might make the 1995 data somewhat outdated by then, How do ou suggest they use the combination of 1995 and 1996 data? Discuss the issues, and tradeoffs in approaches you might take.
      4.c
      Finally, the bank tells you that they wish to use their prior knowledge about who is likely to repay loans, to further improve teh accuracy of learned hypotheses. The prior knowledge they have comes in the form of approximate rules such as "a person who is employed is more likely to repay the loan than one who is unemployed." Discuss how you would design a learning algorithm to use such prior knowledge together with available training data. Discuss the tradeoffs and design issues that you find most interesting.

  • Greg Cooper (Reasoning under uncertainty)

    1. Consider an influence diagram I that has one decision node D, such that D has incoming arcs from chance nodes and D has outgoing arcs to chance nodes and a value node, which represents a value (e.g., dollars) for each state of the variables in I.

      Create a simple example instance of I, and call it E; you will be asked to use E in some parts of this problem. In all pars of this problem that use E, please show your work.

      Let S be the set of all chance nodes in I that are not descendants of decision node D.

      5.a
      Show an equation that represents the expected value of optimal decision making using I, under the assumption that lal the variables in S are observed by the decision maker before making decision D. Let EV(I) denote this expected value. Compute EV(E) for your example influence diagram E.
      5.b
      What is the computational time complexity of computing EV(I)? In particular, is it polynomial time or NP-hard? If polynomial time, describe an algorithm for computing it. If NP-hard, show the reduction you use and explain your proof.
      5.c
      Describe a stochastic simulation algorithm for computing an estimate of EV(I). Show pseudocode for the algorithm, comment on how it works, and explain why it will (or will not) converge in the limit of infinite sampling. Apply the algorithm to show several cycles of simulation in computing EV(E) for the example influence diagram E.
      5.d
      Can nontrivial error bounds be computed for a given estimate of EV(I), which is derived using the algorithm you specified for 5.c? If so, explain how, and show how the estimate of EV(E) and its error bounds change over the several cycles of simulation you generated in 5.c for the example.
      5.e
      Let C(I) denote the cost of storing I in a given computer in a given context. C(I) is given in the same expected value units as is EV(I). Develop a particular such cost function.
      5.f
      Let NEV(I) denote the net expected value of storing I and using it for decision making. Show a general formula for computing the NEV(I), which incorporates C(I). Compute NEV(E) for your example influence diagram E, using the particular cost function you specified in 5.e.

    2. Let X and Y be variables that represent two events. Suppose that the event represented by X causes the event represented by Y, but this result is not known to an investigator.

      6.a
      What is the minimum number of additional observational variables beyond X and Y that the investigator will need to measure to prove that X causes Y? Why?

      Let S denote the union of X, Y, and these additional variables.

      6.b
      What must be the causal relationships among the variables in S? Why?
      6.c
      Given observational data on the variables in S, what assumptions are required to draw the conclusion that X causes Y?
      6.d
      Which variables in S and assumptions in 6.c are not needed when using experimental data (e.g., randomized controlled trials) to infer that X causes Y? Why?