CSI5388: Topics in Machine Learning

Performance Evaluation for Classification

Instructor

Nathalie Japkowicz

Office: STE 5029
E-mail: nat@site.uottawa.ca
Telephone: 562-5800 ext. 6693 (Note: e-mail is more reliable)

Meeting Times and Locations

    Mondays: 2:30pm-4pm STE B0138

        Wednesdays: 2:30pm-4pm LMX 223

(Please, note the different rooms!)

Office Hours and Locations

Times: Mondays and Wednesdays, 1pm-2pm
Location: STE 5029

Pre-requesites

CSI5387, although the course can be taken with permission from the instructor (Permission will be granted if the book entitled "Machine Learning" by Tom Mitchell (see below) has been read and well understood prior to taking the course).

Overview

Machine Learning is the area of Artificial Intelligence concerned with the problem of building computer programs that automatically improve with experience. An important problem in Machine Learning is how to evaluate our algorithms. While the routine approach consists of averaging the error rate over 10 cross-validation folds and running a t-test, this method may be problematic, at least in certain cases. In this course, we will look in-depth into the issue of machine learning evaluation in an attempt to discover more suitable evaluation approaches.

The course is a seminar course that will consist of a mixture of regular lectures and student presentations. The regular lectures will cover broad introductions to some of the major areas of research currently under investigation in the subfield of Machine Learning evaluation. The student presentations will consist of research paper presentations: based on recent research papers that describe new results in the areas discussed in class. The presentations will involve one or two papers that will need to be contrasted and put in the context of the class discussion.

Students will be evaluated as follows:

They will have to write short critical commentaries of the assigned research papers on six different weeks (Weeks 3, 4, 7, 8, 10, 11) [12%],
They will have to give three Research Paper Presentation, (Weeks 5, 9 and 12) [18%]
They will have to complete 3 assignments [30%]
They will have to propose and carry out the research for their final project. Suggestions for potential projects are given below, but the student is welcome to pick his/her own topic. Project proposals will be due in mid-semester. [40%]

Topics Covered

Week 1: Review of Machine Learning's main concepts

Readings:

Review of Tom Mitchell and/or Witten & Frank’s textbook

Week 2: Current approaches for the evaluation of Machine Learning and their shortcomings

Readings:

Drummond, 2006: “Machine Learning as an Experimental Science (Revisited).”, 2006 AAAI-Workshop on Evaluation Methods for machine Learning I

Japkowicz, 2006: “Why Question Machine Learning Evaluation Methods?” AAAI-Workshop on Evaluation Methods for machine Learning I

Japkowicz and Drummond, 2008 (Draft): Warning: Statistical Benchmarking is Addictive. Kicking the Habit in Machine Learning

David Hand, 2006: “Classifier Technology and the Illusion of Progress, Statistical Science 2006, vol. 21, No. 1, pp. 1-15.

Week 3: Evaluation Metrics I: ROC Analysis / Cost Curves.

Readings:

Provost, F., Fawcett, T., and Kohavi, R. (1998): The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 43-48.

Davis & Goadrich, 2006: “The Relationship between Precision-Recall and ROC Curves”, ICML-2006

Corinna Cortes and Mehryar Mohri,''AUC

Optimization vs. Error Rate Minimization'', NIPS 2004