CSI5386: Natural Language Processing
Winter 2024


Instructor:  Diana Inkpen
E-mail: diana.inkpen@uottawa.ca

Lectures: See uozone

Office hours: Fri, 1:30pm-2:30pm in SITE 5015

Announcements

Overview

Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems. Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods. This course will discuss both approaches, with more emphasis on the statistical ones. If time permits, we will consider applications such as information retrieval, text categorization, clustering, and statistical machine translation.

Pre-Requisites

Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high-level language. Please check with the instructor.

Evaluation

Students will be evaluated on:

Recommended Textbooks

[J&M] Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, by Jurafsky, Daniel, and James H. Martin. 2020, 3nd edition, Prentice-Hall. Link to draft
Draft of a new textbook on Natural Language Processing, by Jacob Eisenstein, forthcoming with MIT press.

[M&S] Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schütze, MIT Press, 1999. A Companion Website for the TextBook
[NLP4SM] Natural Language Processing for Social Media, by Atefeh Farzindar and Diana Inkpen, Morgan and Claypool Publishers, second edition, Dec 2017, available here (free via uOttawa library)

Timetable

Assignments

The programming part should be done in Java, Python, or other programming language. You can use any existing tools or code, as long as you properly acknowledge what you used.

Course Support:

Useful Links:

 


Syllabus (subject to modifications)
(The lecture slides will be in pdf format, you can read them with Acrobat Reader)


Week 1:
Preliminaries Introduction to NLP Introduction to Statistical NLP Linguistics Essentials
Readings: M&S Ch1, J&M Ch1
Links:    Webster LDOCE WordNet Slides Tom Sawyer Connexor parser and tagger demo Stanford parser demo FrameNet Online demos SpaCy AllenNLP


Week 2:
Words. Morphology Tokenization FSA
Readings: J&M Ch2
Mathematical Foundations I: Probability Theory Mathematical Foundations II: Information Theory
Readings: M&S Ch2,3 More slides on Probability Theory and Information Theory     


Week 3:
Words. Corpus-Based Work Collocations  Word Embeddings

Readings: M&S Ch4 (corpus-based work), M&S Ch5 (collocations), J&M Ch6 (word embeddings)


Week 4:
Words. Statistical Inference: N-gram Models Neural Language Models
Readings: M&S Ch6, J&M Ch3,7, Section 12.4 Natural Language Processing of the Deep learning book
Links:   Statistical Language Modeling Toolkit RNN LM toolkit word2vec tool GloVe word embeddings ELMo BERT


Week 5:

Words. Part-of-Speech Tagging More on POS Tagging
Readings: M&S Ch10, J&M Ch8
Links: PennTreeBank tagset
Hidden Markov Models
Readings: M&S Ch9, J&M Appendix A  Extra slides on HMM More on HMM Conditional Random Fields (CRF)


Week 6:
Text Categorization    Text Clustering   Introduction to Deep Learning Transformers&BERT

SVM  More on Deep Learning for Natural Language Processing
Readings: M&S Ch16 Links Weka Scikit-learn TensorFlow PyTorch Keras


Week 7:

Reading week, no classes. Prepare your project outline.


Week 8: Chatbots and Dialogue Systems
Sentiment Analysis

Reading: J&M Ch24


Week 9:

LLMs and Prompt Learning 
Syntax. Parsing Probabilistic Parsing Partial Parsing
Reading: J&M Ch12,13,14


Week 10:
Semantics. Word Sense Disambiguation
Readings: M&S Ch7 (WSD)
Links: Senseval    WSD tutorial    BabelNet
Deep Learning for NLP


Week 11:
Semantics. Deep Semantics Lexical Acquisition Semantic Similarity
Readings: M&S Ch8
Links: Corpus-based Similarity Demo Dekang Lin's Demos WordNet::Similarity


Week 12:
Information Retrieval Neural IR Latent Semantic Indexing Probabilistic Retrieval
Readings: M&S Ch15
Links: TREC M&S Textbook errata p560-563 Extra slides
Possible extra topic: Question Answering IBM's Watson Slides Deep QA Answers


Week 13:
Machine Translation
Readings: J&M Ch11, M&S Ch13

Slides by George Foster (NRC) Statistical MT tutorial Neural MT