CSci4152 Natural Language Processing, A Statistical Approach


Nathalie Japkowicz, Room 214, CS Building, x3157

Meeting Times and Locations

Office Hours and Locations


Natural Language Processing (NLP) is the subfield of Artificial Intelligence concerned with building computer systems such as natural language interfaces to databases or the World-Wide Web, automatic machine-translation systems, text analysis systems, speech understanding systems, or computer-aided instruction systems.

Until recently, NLP was mainly approached by rule-based or symbolic methods. In the past few years, however, statistical methods have been given a lot of attention as they seem to address many of the bottlenecks encountered by the symbolic methods.

This course will focus mainly on statistical approaches. In particulat, we will concentrate on approaches such as n-gram models, markov models, probabilitic context free grammars. If time permits, we will consider applications such as statistical alignment and machine translation, clustering, information retrieval and text categorization.


Students should have reasonable exposure to Artificial Intelligence and some programming experience in a high level language.


Students will be evaluated on:

Required Textbooks

Foundations of Statistical Natural Language Processing, by Chris Manning and Hinrich Schütze, MIT Press.