CSI5386: Natural Language Processing
Winter 2018

Instructor:  Diana Inkpen

Office: SITE 5015
E-mail: diana@site.uottawa.ca
Telephone: 562-5800 ext. 6711



Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems. Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods. This course will discuss both approaches, with more emphasis on the statistical ones. If time permits, we will consider applications such as information retrieval, text categorization, clustering, and statistical machine translation.


Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high-level language. Please check with the instructor.


Students will be evaluated on:

Recommended Textbooks

[M&S] Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schütze, MIT Press, 1999. A Companion Website for the TextBook
[J&M] Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, by Jurafsky, Daniel, and James H. Martin. 2009, 2nd edition, Prentice-Hall. Link
[NLP4SM] Natural Language Processing for Social Media, by Atefeh Farzindar and Diana Inkpen, Morgan and Claypool Publishers, August 2015, available here (free via uOttawa library)

Timetable (no late assignments are considered)


The programming part should be done in Java, Python, or other programming language. You can use any existing tools or code, as long as you properly acknowledge what you used.

Course Support:

Useful Links:

Syllabus (subject to modifications)
(The lecture slides will be in pdf format, you can read them with Acrobat Reader)

Week 1: Jan 8
Preliminaries Introduction to NLP Introduction to Statistical NLP Linguistics Essentials
Readings: M&S Ch1, J&M Ch1
Links:    Webster LDOCE WordNet Slides Tom Sawyer Connexor parser and tagger demo Stanford parser demo FrameNet Online demos

Week 2: Jan 15
Words. Morphology Tokenization FSA
Readings: J&M Ch2,3
Mathematical Foundations I: Probability Theory Mathematical Foundations II: Information Theory
Readings: M&S Ch2,3 More slides on Probability Teory and Information Theory     

Week 3: Jan 22
Words. Corpus-Based Work Collocations
Readings: M&S Ch4 (corpus-based work), M&S Ch5 (collocations)

Week 4: Jan 29
Words. Part-of-Speech Tagging More on POS Tagging
Readings: M&S Ch10, J&M Ch5
Links: PennTreeBank tagset
Hidden Markov Models
Readings: M&S Ch9, J&M Ch6 Extra slides on HMM More on HMM Conditional Random Fields (CRF)

Week 5: Feb 5
Words. Statistical Inference: N-gram Models Neural Language Models
Readings: M&S Ch6, J&M Ch4, Section 12.4 Natural Language Processing of the Deep learning book
Links:   Statistical Language Modeling Toolkit RNN LM toolkit word2vec tool GloVe word embeddings

Week 6: Feb 12
Text Categorization    Text Clustering    Deep Learning for Natural Language Processing
Readings: M&S Ch16 Links Weka

Week 7: Feb 19
Sentiment Analysis

Week 8: Feb 26
Reading week, no classes. Prepare you project outline.

Week 9: Mar 5
Syntax. Parsing Probabilistic Parsing Partial Parsing
Reading: J&M Ch12,13,14

Week 10: Mar 12
Semantics. Word Sense Disambiguation
Readings: M&S Ch7 (WSD)
Links: Senseval    WSD tutorial    BabelNet
Invited Lecturer: Parinaz Sobhani, Deep Learning for NLP

Week 11: Mar 19
Semantics. Deep Semantics Lexical Acquisition Semantic Similarity
Readings: M&S Ch8
Links: Corpus-based Similarity Demo Dekang Lin's Demos WordNet::Similarity

Week 12: Mar 26
Information Retrieval Latent Semantic Indexing Probabilistic Retrieval
Readings: M&S Ch15
Links: TREC M&S Textbook errata p560-563 Extra slides
Possible extra topic: Question Answering IBM's Watson Slides Deep QA Answers

Week 13: Apr 2
Statistical Alignment & Machine Translation
Readings: M&S Ch13 Slides by George Foster (NRC) Statistical MT tutorial

Project-based work.