CSI 5386: NLP
Project Description

Project Proposal (2-3 pages) Due: After the reading week
Final Project Report Due: At the end of the exam period
Project presentation: Last class
Demo (optional during presentation)

Introduction

In this project, you are expected (1) to select a particular area of NLP that interests you, (2) to conduct a literature search on this area, (3) to focus on a specific problem in the area you selected, and (4a) to design and implement a novel learning scheme or (4b) to extend an existing scheme to deal with the problem you have identified. Alternatively (4c), you can compare the performance of different existing schemes on the specific problem you have identified in (1), (2) and (3) and on different corpora.

It is important to start working on this project early. I suggest that you start reading the textbook, some of its suggested follow-up material, conference proceedings, journals, and papers available from the Web, early enough to settle quickly on a subject of interest to you. I will be available for discussions both before the project proposal is due and after that, during the development of your research.

In order to help you select a topic, here is a list of project suggestions though you are more than welcome to propose your own idea.

Sources of datasets and project ideas:

· SemEval

· CLEF

· Kaggle (search for text data)

· TREC

Other project suggestions

Neural language models for different applications and comparison to other types of language models.
Extract information from medical texts (patient data or scientific articles).
Detects topics, events, opinions, or user profiles from social media texts. Apply deep learning techniques.
Implement a system for automatic classification and information extraction from medical articles. Apply deep learning techniques.
Implement a system for automatic classification of poems by themes or styles.
Compare the performance of several terminology extraction systems on several corpora. Describe the strengths and weaknesses of each of them.
Compare the performance of several tools for extracting multi-word expressions.
Compare the performance of various machine learning tools on different representations of the REUTERS text categorization data set (e.g., document embeddings, word embeddings, bag of word representation, keyword representation, bag of word representation of summaries of the text, etc.).
Design a method for establishing the degree of similarity between two documents in different languages. Maybe using word embeddings or neural topic models.
Design a system that makes use of a bilingual corpus or wikipedia pages to perform word sense disambiguation (or compare WSD systems).
Design a system that improves (in some ways, such as word order, verb tense, choice of preposition, word sense disambiguation, etc) an existing machine translation system.
Design a system that detects proper nouns and/or geographical entity in text (or other kinds of entities and relations between entities).
Compare the performance of several part-of-speech taggers on social media texts versus newspaper texts, especially based on deep learning.
Compare the performance of several parsers or chunkers on social media texts versus newspaper texts, especially based on deep learning.
Develop any of the above systems for languages other than English. French is of special interest.

CSI 5386: NLP Project Description

Project Proposal (2-3 pages) Due: After the reading week Final Project Report Due: At the end of the exam period Project presentation: Last class Demo (optional during presentation)

Introduction

Other project suggestions

CSI 5386: NLP
Project Description

Project Proposal (2-3 pages) Due: After the reading week
Final Project Report Due: At the end of the exam period
Project presentation: Last class
Demo (optional during presentation)