CSI5180: Topics in Artificial Intelligence
Natural Language Processing, A Statistical Approach
Winter 2012


Instructor:  Diana Inkpen

Office: SITE 5015
E-mail: diana@site.uottawa.ca
Telephone: 562-5800 ext. 6711

Announcements

Meeting Times and Locations

Office Hours: Fri 12:30-1:30pm or by email appointment, in SITE 5015.

Overview

Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems. Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods. This course will focus mainly on statistical approaches. In particular, we will concentrate on approaches such as n-gram models and markov models. If time permits, we will consider applications such as information retrieval, text categorization, clustering, and statistical machine translation.

Pre-Requisites

Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high-level language. Please check with the instructor.

Evaluation

Students will be evaluated on:

Required Textbook

Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schütze, MIT Press, 1999.

Timetable (no late assignments are considered)

Assignments

The programming part should be done in Perl or Java. If you don't know Perl, it is very easy to learn enough Perl to do the assignments. Here is a Perl tutorial that we migth discuss in class if time allows. Here is a very simple Perl script. Here are some more sample Perl scripts: t4.pl t5.pl t6.pl

Course Support:

Useful Links:



Syllabus (subject to minor modifications)
(The lecture slides will be in pdf format, you can read them with Acrobat Reader)


Week 1: Jan 9
Preliminaries
Introduction to Statistical NLP

Readings: Ch1     Links:    Webster      LDOCE       WordNet      Slides Tom Sawyer      Connexor parser and tagger demo Stanford parser demo
Week 2: Jan 16
Linguistics Essentials
Mathematical Foundations I: Probability Theory
Readings: Ch2,3     Links:    FrameNet     More slides on Probability Teory and Information Theory      Online demos      PenTreebank tagset
Week 3: Jan 23
Mathematical Foundations II: Information Theory
Corpus-Based Work
Readings: Ch2,4

Week 4: Jan 30
Collocations
Readings: Ch5
Week 5: Feb 6
Statistical Inference: N-gram Models
Readings: Ch6, Links:   Statistical Language Modeling Toolkit
Week 6: Feb 13
Word Sense Disambiguation
Readings: Ch7,  Links:    Senseval    WSD tutorial

Week 7: Feb 20
Reading week (no classes)
Week 8: Feb 27
Lexical Acquisition    Semantic Similarity

Readings: Ch8  Links:  Corpus-based Similarity Demo Dekang Lin's Demos WordNet::Similarity
Week 9: Mar 5
Hidden Markov Models
Readings: Ch9    Extra slides on HMM
Week 10: Mar 12
Part-of-Speech Tagging
Readings: Ch 10

Week 11: Mar 19
Text Categorization   Text Clustering
Readings: Ch 16 Links Weka

Week 12: Mar 26
Information Retrieval    Latent Semantic Indexing    Probabilistic Retrieval
Readings: Ch15    Links:    TREC    Textbook errata p560-563    Extra slides
Week 13: Apr 2
Statistical Alignment & Machine Translation
Readings: Ch13    Slides by George Foster (NRC)    Statistical MT tutorial
Possible extra topic: Question Answering Links to IBM's Watson Deep QA Answers Ottawa Citizen article
Student presentations for projects (April 2)