CSI5180: Topics in Artificial Intelligence

Natural Language Processing, A Statistical Approach


Nathalie Japkowicz

Office: STE 5029
E-mail: nat@site.uottawa.ca
Telephone: 562-5800 ext. 6693 (Note: e-mail is more reliable)

Meeting Times and Locations

Office Hours and Locations


Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems.

Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods.

This course will focus mainly on statistical approaches. In particulat, we will concentrate on approaches such as n-gram models, markov models, probabilitic context free grammars. If time permits, we will consider applications such as statistical alignment and machine translation, clustering, information retrieval and text categorization.


Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high level language.


Students will be evaluated on:

Required Textbooks

Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schutze, MIT Press.



Course Support:

Useful Links:


(click on "Powerpoint File" to retrieve lecture slides)




Week 1: September 4

Powerpoint File
Introduction to Statistical NLP
Powerpoint File

Week 2: September 8-11

Linguistics Essentials
Powerpoint File
Mathematical Foundations I: Probability Theory
Powerpoint File

Chap. 1, 2 & 3

Week 3: September 15-18

Mathematical Foundations II: Information Theory
Powerpoint File
Corpus Based Work
Powerpoint File

Chap. 2 & 4

Week 4: September 22-25

Powerpoint File

Chap. 5

Week 5: September 29 - October 2

Statistical Inference
Powerpoint File

Chap. 6

Week 6: October 6-9

Word Sense Disambiguation
Powerpoint File

Chap. 7 &

Week 7: October 13-16

Lexical Acquistion
Powerpoint File

Chap. 8

Week 8: October 20-23

MidTerm Week (Review + Exam)

Chaps. 1, 2, 3, 4, 5, 6, 7 & 8

Week 9: October 27-30

Markov Models
Powerpoint File

Chap. 9

Week 10: November 3-6

Part-of-Speech Tagging

Chap. 10

Week 11: November 10-13

Probabilistic Context Free Grammars
Powerpoint File

Chap. 9 & 11

Week 12: November 17-20

Probabilistic Parsing

Chap. 12

Week 13: November 24-27

Statistical Alignment & Machine Translation;
Powerpoint File

Chap. 13