Projects
for undergraduate students -- CSI 4900
Guidelines
for writing your final report
Fall 2010
Project code: inkpen11
Title: Voice control for robots
Status: available
Description: In this project you will program a robot to be able to execute commands
spoken by a user. You will install a voice recognition program and implement a natural
language understanding module that extracts the information about what move is
the robot is asked to perform. Then you will program the robot to execute the
move. There is the possibility of individual work or in a group of two
students. The robots will be available in the Robotics Lab of prof. Emil Petriu.
Project code: inkpen10
Title: Synonyms and
semantic similarity processing for French texts
Status: available
Description: In this project you will implement tools for processing a corpus of
French texts and develop a program that can choose the best word in a context.
Fall 2008
Project code: inkpen9
Title: Video and text
information retrieval
Status: available
Description: In this project you will build an information retrieval system that can
find video clips and dialog text that answer a given query. There is the
possibility of individual work or in a group of two students.
Project code: inkpen8
Title: Grapheme-to-phoneme
conversion tool for French
Status: available
Description: Transforming words from
written form onto phonetic form is useful in Text-to-Speech systems and in
language learning support tools. In this project a tool will be developed for
French words. The tool will learn pronunciation from data, using machine
learning approaches. Training data and starter Java code will be provided.
Winter 2007
Project code: inkpen7
Title: Information
retrieval experiments
Status: taken
Description: In this
project the performance of several information retrieval systems will be
compared, and several query expansion methods will be tried.
Fall 2006
Project code: inkpen6
Title: Tools for French
text processing
Status: taken
Description: Many natural language
processing tools exist for English texts. In this project some tools will be
developed to work on a corpus of French texts. The corpus will be provided. The
tools include: an automatic phonetic transcriptor, an
automatic syllabifier, etc.
Project code: inkpen5
Title: Information
extraction for financial information
Status: taken
Description:
Financial information about companies is available on the Web, but the user
needs to know how to find it and interpret it, in order to decide in which
companies to invest. This project will provide a user with various financial
ratios and advice. The user inputs the company name, through a GUI interface
implemented in Java. The program fetched
relevant webpages form Yahoo!Finance and other sites, and navigates through them
to find the desired pages. Then it automatically extracts the information from
the pages, calculates ratios, and displays results to the user.
Fall 2005
Project code: inkpen4
Title: Intelligent
thesaurus using Roget synonyms
Status: taken
Description: A thesaurus assists a
writer with a list of words that are similar to a given word. The writer has to
choose one of the words. An intelligent thesaurus assists the user by
indicating the best choices. The project will focus on the automatic choice of
the best alternative in the context of writing. Roget thesaurus will be used as
a source of synonyms and similar words, in order to allow for a wide-coverage
of the English language. The implementation will be done in Java.
Winter 2004
Project code: inkpen3 Title: Intelligent thesaurusStatus: taken
Description: A thesaurus
assists a writer with a list of words that are similar to a given word. The
writer has to choose one of the words without
being offered explanations about the differences in nuances of meaning between
the possible choices. This project will develop an intelligent thesaurus that
offers, in addition to the list of similar words, explanations about the
differences between them. Moreover, it will be context-sensitive: it will order
the possible choices by their suitability to the writing context. A
knowledge-base of differences between synonyms will be provided. It also
included knowledge about the collocations of synonyms (what words they combine
well with and what words they do not). The implementation will be done in Java.
Fall 2003
Project code: inkpen2 Title: Language models for the texts of the Web Status: taken
Description: A language
model reflects the distribution of the words in a large
collections of texts. It computes probabilities of occurrence of
individual words (unigrams) and pairs of consecutive words (bigrams). There are
tools that compute language models for a given collection of texts. This
project will modify such a tool to work with word co-occurrence counts
collected from the Web. In this way, the probabilities of rare words will be
computed more accurately. The implementation will be done in C++, Java, or Perl
(to be determined).
Project code: inkpen1 Title: Natural language interface for animationStatus: taken
Description: This
project implements a natural language interface that allows a human to
communicate with an animated character using natural language (English in this
case). The focus on the project is on translating from natural language into a
simplified script-like animation language. An example of input text is: “Walk
five steps to the right, jump three times, and then run back“. This text needs
to be translated into something like: “walk steps:5
direction: Est, speed: slow; jump; jump; jump; walk
steps:5, direction:West, speed: fast”. Then the
character will execute this simple animation script, by moving around on the
screen in the required sequence. The implementation will be done in Java.