CSI 5387: Machine
Learning 
Project Description 
Project Proposal (3-5
pages) Due: Last class before the mid-term break 
Final Project Report Due: Last day of classes 
Presentation: Last weeks of classes 
Demo (optional): By Appointment 
Introduction 
In
this project, you are expected (1) to select a particular area of Machine
Learning that interests you, (2) to conduct a literature search on this area,
(3) to focus on a specific problem in the area you selected, and (4a) to design
and implement a novel learning scheme or (4b) to extend an existing scheme to
deal with the problem you have identified. Alternatively (4c), you can compare
the performance of different existing schemes on the specific problem you have
identified in (1), (2) and (3) or on a particular real-world data set (but not
one of the benchmark data sets such as those in the UCI repository: such a data
set must be of interest to industry or research). 
It is important to start working on this project as soon as the semester
begins. I suggest that you start reading the textbook, some of its suggested
follow-up material, conference proceedings, journals, and papers available from
the Web, early enough to settle quickly on a subject of interest to you. I will
be available for discussions both before the project proposal is due and after
that, during the development of your research. 
In order to help you select a topic, here is a list of project suggestions
though you are more than welcome to propose your own idea. 
Project Suggestions
 - Design a combination scheme for combining learning
     methods that present different stengths and
     weaknesses. This scheme should benefit from the different learning
     methods' advantages but not suffer from their individual weaknesses. 
- Ensemble-based combination schemes often perform more
     accurately than a single "best classifier". Investigate the
     relationship between the accuracy of the individual combined classifiers
     and that of their combination. 
- Identify an area of Natural Language Processing that
     could be handled by a machine learning method (example, the translation of
     certain prepositions from one language to another), propose a method for
     automatically constructing a training set for
     that problem from raw text and a lexicon, and apply one or several
     learning algorithm to that data set. 
- Implement a program for detecting domain specific
     keywords in a collection of texts written for that domain. 
- If you have a data set of interest to you (example:
     from a past or present job, or another academic project), evaluate the
     performance of standard learning techniques on that set, identify
     particular properties of your data set that may negatively affect the
     learning performance, devise and implement a scheme for addressing this
     deficiency. 
- Design a method for generating new features and
     selecting the most useful ones for a given learning task. 
- Use the Mixture-of-Experts Framework with different
     learning schemes. Is it a useful scheme for combining different classifiers?
     
- Design and implement a concept-learner (or extend an
     existing concept- learner) for dealing with class imbalance (the situation
     where a training set contains more positive than
     negative data (or the other way around)). 
- Design and implement a concept-learner (or extend an
     existing concept- learner) for dealing with the case of small disjuncts and rare cases. 
- Compare the performance of a number of unsupervised
     classifiers used in supervised mode to the performance of supervised
     classifiers. 
- Compare the performance of combination methods such as
     bagging or boosting when used with different learning methods.