Projects for undergraduate students -- CSI 4900

Guidelines for writing your final report

Fall 2010 

Project code: inkpen11
    Title: Voice control for robots
    Status: available

Description: In this project you will program a robot to be able to execute commands spoken by a user. You will install a voice recognition program and implement a natural language understanding module that extracts the information about what move is the robot is asked to perform. Then you will program the robot to execute the move. There is the possibility of individual work or in a group of two students. The robots will be available in the Robotics Lab of prof. Emil Petriu.

Project code: inkpen10
    Title: Synonyms and semantic similarity processing for French texts
    Status: available

Description: In this project you will implement tools for processing a corpus of French texts and develop a program that can choose the best word in a context.

Fall 2008 

Project code: inkpen9
    Title: Video and text information retrieval
    Status: available

Description: In this project you will build an information retrieval system that can find video clips and dialog text that answer a given query. There is the possibility of individual work or in a group of two students.

Project code: inkpen8
    Title: Grapheme-to-phoneme conversion tool for French
    Status: available

Description: Transforming words from written form onto phonetic form is useful in Text-to-Speech systems and in language learning support tools. In this project a tool will be developed for French words. The tool will learn pronunciation from data, using machine learning approaches. Training data and starter Java code will be provided.

Winter 2007 

Project code: inkpen7
    Title: Information retrieval experiments
    Status: taken

Description: In this project the performance of several information retrieval systems will be compared, and several query expansion methods will be tried.

Fall 2006 

Project code: inkpen6
    Title: Tools for French text processing
    Status: taken

Description: Many natural language processing tools exist for English texts. In this project some tools will be developed to work on a corpus of French texts. The corpus will be provided. The tools include: an automatic phonetic transcriptor, an automatic syllabifier, etc.

Project code: inkpen5
    Title: Information extraction for financial information
    Status: taken

Description:  Financial information about companies is available on the Web, but the user needs to know how to find it and interpret it, in order to decide in which companies to invest. This project will provide a user with various financial ratios and advice. The user inputs the company name, through a GUI interface implemented in  Java. The program fetched relevant webpages form Yahoo!Finance and other sites, and navigates through them to find the desired pages. Then it automatically extracts the information from the pages, calculates ratios, and displays results to the user.

Fall 2005

Project code: inkpen4
    Title: Intelligent thesaurus using Roget synonyms
    Status: taken

Description: A thesaurus assists a writer with a list of words that are similar to a given word. The writer has to choose one of the words. An intelligent thesaurus assists the user by indicating the best choices. The project will focus on the automatic choice of the best alternative in the context of writing. Roget thesaurus will be used as a source of synonyms and similar words, in order to allow for a wide-coverage of the English language. The implementation will be done in Java.

Winter 2004

 
 
 
 
Project code: inkpen3
    Title: Intelligent thesaurus
    Status: taken 

Description: A thesaurus assists a writer with a list of words that are similar to a given word. The writer has to choose one of the words  without being offered explanations about the differences in nuances of meaning between the possible choices. This project will develop an intelligent thesaurus that offers, in addition to the list of similar words, explanations about the differences between them. Moreover, it will be context-sensitive: it will order the possible choices by their suitability to the writing context. A knowledge-base of differences between synonyms will be provided. It also included knowledge about the collocations of synonyms (what words they combine well with and what words they do not). The implementation will be done in Java.



 
 
 
Fall 2003      

 
 
 
 
Project code: inkpen2
    Title: Language models for the texts of the Web 
    Status: taken

Description: A language model reflects the distribution of the words in a large collections of texts. It computes probabilities of occurrence of individual words (unigrams) and pairs of consecutive words (bigrams). There are tools that compute language models for a given collection of texts. This project will modify such a tool to work with word co-occurrence counts collected from the Web. In this way, the probabilities of rare words will be computed more accurately. The implementation will be done in C++, Java, or Perl (to be determined).

 
Project code: inkpen1
    Title: Natural language interface for animation
    Status: taken

Description:  This project implements a natural language interface that allows a human to communicate with an animated character using natural language (English in this case). The focus on the project is on translating from natural language into a simplified script-like animation language. An example of input text is: “Walk five steps to the right, jump three times, and then run back“. This text needs to be translated into something like: “walk steps:5 direction: Est, speed: slow; jump; jump; jump; walk steps:5, direction:West, speed: fast”. Then the character will execute this simple animation script, by moving around on the screen in the required sequence. The implementation will be done in Java.