CSI5386: Natural Language Processing
Fall 2017


Instructor:  Diana Inkpen

Office: SITE 5015
E-mail: diana@site.uottawa.ca
Telephone: 562-5800 ext. 6711

Announcements

Overview

Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems. Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods. This course will discuss both approaches, with more emphasis on the statistical ones. If time permits, we will consider applications such as information retrieval, text categorization, clustering, and statistical machine translation.

Pre-Requisites

Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high-level language. Please check with the instructor.

Evaluation

Students will be evaluated on:

Recommended Textbooks

[M&S] Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schütze, MIT Press, 1999. A Companion Website for the TextBook
[J&M] Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, by Jurafsky, Daniel, and James H. Martin. 2009, 2nd edition, Prentice-Hall. Link
[NLP4SM] Natural Language Processing for Social Media, by Atefeh Farzindar and Diana Inkpen, Morgan and Claypool Publishers, August 2015, available here (free via uOttawa library)

Timetable (no late assignments are considered)

Assignments

The programming part should be done in Java, Python, or other programming language. You can use any existing tools or code, as long as you properly acknowledge what you used.

Course Support:

Useful Links:



Syllabus (subject to modifications)
(The lecture slides will be in pdf format, you can read them with Acrobat Reader)


Week 1: Sept 6
Preliminaries Introduction to NLP Introduction to Statistical NLP Linguistics Essentials
Readings: M&S Ch1, J&M Ch1
Links:    Webster LDOCE WordNet Slides Tom Sawyer Connexor parser and tagger demo Stanford parser demo FrameNet Online demos


Week 2: Sept 13
Words. Morphology Tokenization FSA
Readings: J&M Ch2,3
Mathematical Foundations I: Probability Theory Mathematical Foundations II: Information Theory
Readings: M&S Ch2,3 More slides on Probability Teory and Information Theory     


Week 3: Sept 20
Words. Corpus-Based Work Collocations
Readings: M&S Ch4 (corpus-based work), M&S Ch5 (collocations)


Week 4: Sept 27
Words. Part-of-Speech Tagging More on POS Tagging
Readings: M&S Ch10, J&M Ch5
Links: PennTreeBank tagset
Hidden Markov Models
Readings: M&S Ch9, J&M Ch6 Extra slides on HMM More on HMM Conditional Random Fields (CRF)


Week 5: Oct 4
Words. Statistical Inference: N-gram Models Neural Language Models
Readings: M&S Ch6, J&M Ch4, Section 12.4 Natural Language Processing of the Deep learning book
Links:   Statistical Language Modeling Toolkit RNN LM toolkit word2vec tool GloVe word embeddings


Week 6: Oct 11
Text Categorization    Text Clustering    Deep Learning for Natural Language Processing
Readings: M&S Ch16 Links Weka


Week 7: Oct 18
Sentiment Analysis


Week 8: Oct 25
Reading week, no classes. Prepare you project outline.


Week 9: Nov 1
Syntax. Parsing Probabilistic Parsing Partial Parsing
Reading: J&M Ch12,13,14


Week 10: Nov 8
Semantics. Word Sense Disambiguation
Readings: M&S Ch7 (WSD)
Links: Senseval    WSD tutorial    BabelNet
Invited Lecturer: Parinaz Sobhani, Deep Learning for NLP


Week 11: Nov 15
Semantics. Deep Semantics Lexical Acquisition Semantic Similarity
Readings: M&S Ch8
Links: Corpus-based Similarity Demo Dekang Lin's Demos WordNet::Similarity


Week 12: Nov 22
Information Retrieval Latent Semantic Indexing Probabilistic Retrieval
Readings: M&S Ch15
Links: TREC M&S Textbook errata p560-563 Extra slides
Possible extra topic: Question Answering IBM's Watson Slides Deep QA Answers


Week 13: Nov 29
Statistical Alignment & Machine Translation
Readings: M&S Ch13 Slides by George Foster (NRC) Statistical MT tutorial Project-based work.