Abstract Building a Lexical Knowledge-Base of Near-Synonym Differences Diana Inkpen Doctor of Philosophy Department of Computer Science University of Toronto 2004 Current natural language generation or machine translation systems cannot distinguish among near-synonyms - words that share the same core meaning but vary in their lexical nuances. This is due to a lack of knowledge about differences between near-synonyms in existing computational lexical resources. The goal of this thesis is to automatically acquire a lexical knowledge-base of near-synonym differences (LKB of NS) from multiple sources, and to show how it can be used in a practical natural language processing system. I designed a method to automatically acquire knowledge from dictionaries of near-synonym discrimination written for human readers. An unsupervised decision-list algorithm learns patterns and words for classes of distinctions. The patterns are learned automatically, followed by a manual validation step. The extraction of distinctions between near-synonyms is entirely automatic. The main types of distinctions are: stylistic (for example, "inebriated" is more formal than "drunk"), attitudinal (for example, "skinny" is more pejorative than "slim"), and denotational (for example, "blunder" implies "accident" and "ignorance", while "error" does not). I enriched the initial LKB of NS with information extracted from other sources. First, information about the senses of the near-synonym was added (WordNet senses). The other near-synonyms in the same dictionary entry and the text of the entry provide a strong context for disambiguation. Second, knowledge about the collocational behaviour of the near-synonyms was acquired from free text. Collocations between a word and the near-synonyms in a dictionary entry were classified into: preferred collocations, less-preferred collocations, and anti-collocations. Third, knowledge about distinctions between near-synonyms was acquired from machine-readable dictionaries (the General Inquirer and the Macquarie Dictionary). These distinctions were merged with the initial LKB of NS, and inconsistencies were resolved. The generic LKB of NS needs to be customized in order to be used in a natural language processing system. The parts that need customization are the core denotations and the strings that describe peripheral concepts in the denotational distinctions. To show how the LKB of NS can be used in practice, I present Xenon, a natural language generation system system that chooses the near-synonym that best matches a set of input preferences. I implemented Xenon by adding a near-synonym choice module and a near-synonym collocation module to an existing general-purpose surface realizer.