CSI5386: Natural Language Processing
Winter 2025


Instructor:  Diana Inkpen
E-mail: diana.inkpen@uottawa.ca

Lectures: Mondays 2:30-5:30pm in VNR 1095

Office Hours: Fridays 10am-11am in SITE 5015, or by email appointment.

Announcements

Overview

Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems. Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods. This course will discuss both approaches, with more emphasis on the statistical ones. If time permits, we will consider applications such as information retrieval, text categorization, clustering, and statistical machine translation.

Pre-Requisites

Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high-level language. Please check with the instructor.

Evaluation

Students will be evaluated on:

Recommended Textbooks

[J&M] Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, by Jurafsky, Daniel, and James H. Martin. 2020, 3nd edition, Prentice-Hall. Link to draft
Draft of a new textbook on Natural Language Processing, by Jacob Eisenstein, forthcoming with MIT press.

[M&S] Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schütze, MIT Press, 1999. A Companion Website for the TextBook
[NLP4SM] Natural Language Processing for Social Media, by Atefeh Farzindar and Diana Inkpen, Morgan and Claypool Publishers, second edition, Dec 2017, available here (free via uOttawa library)

Timetable

Assignments

The programming part should be done in Java, Python, or other programming language. You can use any existing tools or code, as long as you properly acknowledge what you used.

Course Support:

Useful Links:



Syllabus (subject to modifications)
(The lecture slides will be in pdf format, you can read them with Acrobat Reader)


Week 1:
Preliminaries Introduction to NLP Introduction to Statistical NLP Linguistics Essentials
Readings: M&S Ch1, J&M Ch1
Links:    Webster LDOCE WordNet Slides Tom Sawyer Stanford parser FrameNet SpaCy NLTK


Week 2:
Words. Morphology Tokenization FSA
Readings: J&M Ch2
Mathematical Foundations I: Probability Theory Mathematical Foundations II: Information Theory
Readings: M&S Ch2,3 More slides on Probability Theory and Information Theory     


Week 3:
Words. Corpus-Based Work Collocations  Word Embeddings

Readings: M&S Ch4 (corpus-based work), M&S Ch5 (collocations), J&M Ch6 (word embeddings)


Week 4:
Words. Statistical Inference: N-gram Models Neural Language Models
Readings: M&S Ch6, J&M Ch3,7, Section 12.4 Natural Language Processing of the Deep learning book
Links:   Statistical Language Modeling Toolkit RNN LM toolkit word2vec tool GloVe word embeddings ELMo BERT


Week 5:

Words. Part-of-Speech Tagging More on POS Tagging
Readings: M&S Ch10, J&M Ch8
Links: PennTreeBank tagset
Readings: M&S Ch9, J&M Appendix A

Other: Hidden Markov Models Extra slides on HMM More on HMM Conditional Random Fields (CRF)


Week 6:
Text Categorization    Text Clustering   Introduction to Deep Learning Transformers&BERT  Transformers LLMs Masked LLMs

Other: SVM 
Readings: J&M Ch 9,10,11   M&S Ch16 

Links Weka Scikit-learn TensorFlow PyTorch Keras


Week 7:

Reading week, no classes. Prepare your project outline.


Week 8: Chatbots and Dialogue Systems
Other: Sentiment Analysis More on Deep Learning and Sentiment Analysis

Reading: J&M Ch22


Week 9:
Syntax. Parsing Probabilistic Parsing Partial Parsing
Reading: J&M Ch18,19


Week 10:
Semantics. Word Sense Disambiguation
Readings: M&S Ch7 (WSD)
Links: Semeval    WSD tutorial    BabelNet
Deep Learning for NLP


Week 11:
Semantics. Deep Semantics Lexical Acquisition Semantic Similarity
Readings: M&S Ch8
Links: Corpus-based Similarity Demo Dekang Lin's Demos WordNet::Similarity


Week 12:
Information Retrieval Neural IR Latent Semantic Indexing Probabilistic Retrieval
Readings: M&S Ch15
Links: TREC M&S Textbook errata p560-563 Extra slides
Other: Question Answering IBM's Watson Slides Deep QA Answers


Week 13:
Machine Translation  Neural MT
Readings: J&M Ch13, M&S Ch13

Other: Slides by George Foster (NRC) Statistical MT tutorial