CSI5388: Topics in Machine Learning
Instructor Nathalie Japkowicz
Office: STE 5029
Telephone: 562-5800 ext. 6693 (Note: e-mail is more reliable)
Meeting Times and Locations
- Location: UCU 125
Office Hours and Locations
- Times: Mondays and Thursdays, 2:30-3:30
- Location: STE 5029
Machine Learning or data Mining are the areas of Artificial Intelligence
concerned with the problem of building computer programs that automatically
improve with experience. This course will focus on advanced issues from these
fields. Issues such as Feature Selection, Class Imbalances, Cost-sensitivity,
One-class learning, Classifier Combination, Performance Evaluation and
Visualization (and other topics) will be discussed in depth. Students will be
expected to read and critique articles from the recent literature, make two
and propose and complete a research project.
CSI5387, although the course can be taken with
permission from the instructor (Permission will be granted if the book entitled
"Machine Learning" by Tom Mitchell (see below) has been read and well
understood prior to taking the course).
Machine Learning is the area of Artificial Intelligence concerned with the
problem of building computer programs that automatically improve with
experience. This course will cover, in depth, some advanced topics in the
The course will consist of a mixture of regular lectures and student
presentations. The regular lectures will cover broad introductions to some of
the major areas of research currently under investigation. The student
presentations will be of two kinds.
Students will be evaluated as follows:
- Research Paper Presentations: based on recent research papers that
describe new results in the areas discussed in class. The presentations
will involve two papers that will need to be contrasted and put in
the context of the class discussion.
- Theme Presentations: the students will be required to
choose a specialised research theme and give an
introduction to this theme together with glimpses into recent papers on it.
In a way, this will be like teaching a class on the particular theme chosen.
The themes that will be covered this term are:
- Genetic Programming
- Evaluating Unsupervised Learning
- Feature Selection for SVM
- Survey of Single Class Learning Methods, Advantages and Disadvantages
- Class Imbalances versus Cost-Sensitive Learning
- Recent Advances in Classifier Combination
- They will have to write short critical commentaries of the assigned
research papers on four different weeks (the student decides on the weeks
for which s/he will write the commentaries on their own. However, they are not allowed to pick the week of their Research Paper Presentation) [20%],
- They will have to give one Research Paper Presentation, as described above
- They will have to give one theme presentation (as described above) [15%]
- They will have to propose and carry out the research for their final
project. Suggestions for potential projects are given below, but the student is encouraged to pick his/her own topic.
Project proposals will be due in mid-semester. [50%]
- Week 1: Review of Machine Learning's main concepts
- Week 2: Topics in the Evaluation of Machine Learning algorithms
(ROC Curves, Cost Curves) based on:
ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, by Tom Fawcett, Submitted to Knowledge Discovery and Data Mining, 2003. [No slides available, at least for the time being' Please, read the Fawcett paper]
- Week 3: Intro to Genetic Algorithms (Fitness Functions, Genetic Operators,
- Week 4: Intro to Unsupervised Learning
- Week 5: Combining Supervised and Unsupervised Learning, Intro
to Radial-Basis Functions.
[No available Notes (presentation will be made on the blackboard)]
- Week 6: Advice on how to formulate a project
proposal, presentation by a librarian on ressources available for ML
- Week 7: Topics in Feature Selection (Wrapper, Filter and Hybrid Methods)
- Week 8: BREAK
- Week 9: Single Class Learning Methods (Autoassociators, single-class SVMs)
- Week 10: The Class Imbalance Problem
- Week 11: Support Vector Machines and Set Covering Machines
- Week 12: Combination of Classifiers (Bagging, Random Forests, Boosting,
Error-Correcting Codes, Mixtures-of-Experts)
- Week 13: Project Presentations
A compilation of selected research papers from the recent literature.
- Machine Learning, Tom Mitchell, McGraw Hill, 1997.
Introduction to Machine Learning, Nils J. Nilsson (Draft of a Proposed
Textbook available on the Web)
- Data Mining, Witten, I & Frank, E., Morgan-Kaufmann, 2000.
Machine Learning Ressources on the Web: