Natural Language Processing (NLP) is the subfield of Artificial
Intelligence concerned with building computer systems such as natural language
interfaces to databases or the World-Wide Web, automatic machine-translation
systems, text analysis systems, speech understanding systems, or computer-aided
instruction systems. Until recently, NLP was mainly approached by rule-based or
symbolic methods. In the past few years, however, statistical methods have been
given a lot of attention as they seem to address many of the bottlenecks
encountered by the symbolic methods. This course will discuss both approaches,
with more emphasis on the statistical ones. If time permits, we will consider
applications such as information retrieval, text categorization, clustering,
and statistical machine translation.
Students should have reasonable exposure to Artificial
Intelligence and some programming experience in a high-level language. Please
check with the instructor.
Students will be evaluated on:
[J&M] Speech and Language Processing: An Introduction to
Natural Language Processing, Speech Recognition, and Computational Linguistics, by Jurafsky,
Daniel, and James H. Martin. 2020, 3nd edition, Prentice-Hall. Link to draft
Draft of a new textbook on
Natural Language Processing, by
Jacob Eisenstein, forthcoming with MIT press.
[M&S] Foundations of Statistical Natural Language Processing, by Chris
Manning and Hinrich Schütze, MIT Press, 1999. A Companion Website for the TextBook
[NLP4SM] Natural Language Processing for Social Media, by Atefeh
Farzindar and Diana Inkpen, Morgan and Claypool Publishers, second edition, Dec
2017, available
here (free via uOttawa library)
The programming part should be done in Java, Python, or other
programming language. You can use any existing tools or code, as long as you
properly acknowledge what you used.
Week 1:
Preliminaries
Introduction
to NLP Introduction to
Statistical NLP Linguistics
Essentials
Readings: M&S Ch1, J&M Ch1
Links: Webster LDOCE WordNet Slides Tom
Sawyer Stanford
parser FrameNet
SpaCy NLTK
Week 2:
Words. Morphology
Tokenization
FSA
Readings: J&M Ch2
Mathematical
Foundations I: Probability Theory Mathematical
Foundations II: Information Theory
Readings:
M&S Ch2,3 More
slides on Probability Theory and Information Theory
Week 3:
Words. Corpus-Based
Work Collocations Word Embeddings
Readings: M&S Ch4 (corpus-based work), M&S Ch5
(collocations), J&M Ch6 (word embeddings)
Week 4:
Words. Statistical
Inference: N-gram Models Neural
Language Models
Readings: M&S Ch6, J&M Ch3,7, Section 12.4 Natural Language Processing
of the Deep learning book
Links: Statistical
Language Modeling Toolkit RNN
LM toolkit word2vec
tool GloVe word
embeddings ELMo BERT
Week 5:
Words. Part-of-Speech Tagging
More on POS
Tagging
Readings:
M&S Ch10, J&M Ch8
Links: PennTreeBank
tagset
Readings: M&S Ch9, J&M Appendix A
Other: Hidden Markov Models Extra slides
on HMM More
on HMM Conditional
Random Fields (CRF)
Week 6:
Text
Categorization Text
Clustering Introduction to Deep Learning Transformers&BERT Transformers LLMs Masked LLMs
Other: SVM
Readings: J&M Ch 9,10,11 M&S
Ch16
Links Weka Scikit-learn
TensorFlow
PyTorch
Keras
Week 7:
Reading week, no classes. Prepare your project outline.
Week 8: Chatbots and Dialogue Systems
Other: Sentiment
Analysis More on
Deep Learning and Sentiment Analysis
Reading: J&M Ch22
Week 9:
Syntax. Parsing
Probabilistic
Parsing Partial
Parsing
Reading: J&M Ch18,19
Week 10:
Semantics. Word Sense
Disambiguation
Readings: M&S Ch7 (WSD)
Links: Semeval WSD
tutorial BabelNet
Deep Learning for NLP
Week 11:
Semantics. Deep
Semantics Lexical
Acquisition Semantic
Similarity
Readings: M&S Ch8
Links: Corpus-based Similarity
Demo Dekang Lin's
Demos WordNet::Similarity
Week 12:
Information
Retrieval Neural IR Latent Semantic
Indexing Probabilistic
Retrieval
Readings: M&S Ch15
Links: TREC M&S
Textbook errata p560-563 Extra
slides
Other: Question
Answering IBM's Watson Slides Deep QA
Answers
Week 13:
Machine
Translation Neural
MT
Readings: J&M Ch13, M&S Ch13
Other: Slides by
George Foster (NRC) Statistical MT
tutorial