CSI 5180: Statistical NLP
Project Proposal (3-5 pages) Due: November 6, 2001
Final Project Report Due: Last Day of Classes
Presentation: Weeks 13 and 14
Demo (optional): By Appointment
In this project, you are expected (1) to select a particular area of Statistical
NLP that interests you, (2) to conduct a literature search on this area,
(3) to focus on a specific problem in the area you selected, and (4a) to
design and implement a novel learning scheme or (4b) to extend an existing
scheme to deal with the problem you have identified. Alternatively (4c), you
can compare the performance of different existing schemes on the specific
problem you have identified in (1), (2) and (3) and on different
It is important to start working on this project as soon as the semester
begins. I suggest that you start reading the textbook, some of its suggested
follow-up material, conference proceedings, journals, and papers available
from the Web, early enough to settle quickly on a
subject of interest to you. I will be available for discussions both before the
project proposal is due and after that, during the development of your research.
In order to help you select a topic, here is a list of project suggestions
though you are more than welcome to propose your own idea.
- Compare the performance of several key-word extraction systems
on several corpora. Describe the strengths and weaknesses of
each of them.
- Compare the performance of various machine learning tools on
different representations of the REUTERS text categorization
data set (e.g., bag of word representation, keyword representation,
bag of word representation of summaries of the text, etc...)
- Implement a program for detecting domain specific keywords in
a collection of texts written for that domain.
- Design a method for establishing the similarity between two
- Design a system that summarizes several documents into a single summary.
- Design a system that makes use of a bilingual corpus to perform
word sense disambiguation.
- Design a system that improves (in some way such as word order, verb
tense, choice of preposition, word sense disambiguation, etc...)
on the translation of an existing system (example BabelFish)
- Design a system that detects proper nouns and/or geographical entity