CSI5180: Topics in Artificial Intelligence
Natural Language Processing, A Statistical Approach
Winter 2012

Instructor: Diana Inkpen

Office: SITE 5015
E-mail: diana@site.uottawa.ca
Telephone: 562-5800 ext. 6711

Announcements

Assignment 2 is posted.
The due date for Assignment 1 was extended with one week, because three students from the class have many coop interviews this week. If you finish it on time, you can submit it, and focus on choosing a paper for presentation and a topic for the project.
Assignment 1 is posted.

Meeting Times and Locations

Mon 16:00-19:00, in Simard 429

Office Hours: Fri 12:30-1:30pm or by email appointment, in SITE 5015.

Overview

Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems. Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods. This course will focus mainly on statistical approaches. In particular, we will concentrate on approaches such as n-gram models and markov models. If time permits, we will consider applications such as information retrieval, text categorization, clustering, and statistical machine translation.

Pre-Requisites

Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high-level language. Please check with the instructor.

Evaluation

Students will be evaluated on:

Two written and programming assignments (40%: 20% for A1, 20% for A2)
One in-class Presentation(15%)
Class participation (5%)
A Final Project (40%)

Required Textbook

Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schütze, MIT Press, 1999.

Timetable (no late assignments are considered)

Assignment 1, due Fri, Feb 10, 21:00, extended till Feb 17, 21:00.
Paper Presentation - See Schedule
Project outline (2-3 pages), due Mon, Feb 27, in class
Assignment 2, due Fri, March 23, 21:00.
Project Presentations, Mon, April 2, in class
Project Reports, due Fri, April 27, 21:00, by email

Assignments

The programming part should be done in Perl or Java. If you don't know Perl, it is very easy to learn enough Perl to do the assignments. Here is a Perl tutorial that we migth discuss in class if time allows. Here is a very simple Perl script. Here are some more sample Perl scripts: t4.pl t5.pl t6.pl

Course Support:

Project Description A list of NLP resources that you could use in your project.
Project Report Guidelines
How to read a research paper

Useful Links:

Syllabus (subject to minor modifications)
(The lecture slides will be in pdf format, you can read them with Acrobat Reader)

Week 1: Jan 9
Preliminaries
Introduction to Statistical NLP
Readings: Ch1 Links: Webster LDOCE WordNet Slides Tom Sawyer Connexor parser and tagger demo Stanford parser demo

Week 2: Jan 16
Linguistics Essentials
Mathematical Foundations I: Probability Theory
Readings: Ch2,3 Links: FrameNet More slides on Probability Teory and Information Theory Online demos PenTreebank tagset

Week 3: Jan 23
Mathematical Foundations II: Information Theory
Corpus-Based Work
Readings: Ch2,4

Week 4: Jan 30
Collocations
Readings: Ch5

Week 5: Feb 6
Statistical Inference: N-gram Models
Readings: Ch6, Links: Statistical Language Modeling Toolkit

Week 6: Feb 13
Word Sense Disambiguation
Readings: Ch7, Links: Senseval WSD tutorial

Week 7: Feb 20
Reading week (no classes)

Week 8: Feb 27
Lexical Acquisition Semantic Similarity
Readings: Ch8 Links: Corpus-based Similarity Demo Dekang Lin's Demos WordNet::Similarity

Week 9: Mar 5
Hidden Markov Models
Readings: Ch9 Extra slides on HMM

Week 10: Mar 12
Part-of-Speech Tagging
Readings: Ch 10

Week 11: Mar 19
Text Categorization Text Clustering
Readings: Ch 16 Links Weka

Week 12: Mar 26
Information Retrieval Latent Semantic Indexing Probabilistic Retrieval
Readings: Ch15 Links: TREC Textbook errata p560-563 Extra slides

Week 13: Apr 2
Statistical Alignment & Machine Translation
Readings: Ch13 Slides by George Foster (NRC) Statistical MT tutorial
Possible extra topic: Question Answering Links to IBM's Watson Deep QA Answers Ottawa Citizen article
Student presentations for projects (April 2)

CSI5180: Topics in Artificial Intelligence Natural Language Processing, A Statistical Approach Winter 2012