Projects for undergraduate students -- CSI 4900
Project code: inkpen18
Title: Detecting early signs of mental illness from ReachOut forum messages.
Description: In this project you will design a text classifier for ReachOut mental health forum posts. A small corpus of posts was labelled with a red/amber/green semaphore that indicates how urgently a post needs moderator attention. A text classifier will be developed to predict the label for unlabeled posts.
Project code: inkpen17
Title: Detecting early signs of mental illness from Twitter messages.
Description: In this project you will design a text classifier to predict the level of risk that a social media user presents signs of mental illness, based on his/her tweets.
Project code: inkpen16
Title: Information retrieval using a formal semantic language
Description: In this project you will design a visualizer of semantic representations for words and phrases. The semantic representation language will allow more precise information retrieval.
Project code: inkpen15
Title: Web opinion mining for product reviews
Description: This project includes a web crawler to collect products reviews over the Internet, and a classifier to detect positive and negative opinions.
Project code: inkpen14
Title: Annotation tool for error correction
Description: In this project you will develop a tool that allows teachers to annotate errors made by language learners, to customize the process, to insert their own
error tags, to include feedback, etc. The tool should work for any language, but it will be used by teachers of English and French as second language.
Project code: inkpen13
Title: Automatic processing of poetry
Description: In this project you will develop a tool that allows to detect similar poems, by detecting similar fragments of texts, similar themes, and similar
structures. A graph will be automatically produced to represent links between similar poems.
Project code: inkpen12
Title: Blog classification
Description: In this project you will apply automatic text classification algorithms in order to classify blogs by the opinions expressed in the texts
(positive/negative) and by the types of emotions expressed (happy/surprised/sad/angry/scared/disgusted).
Project code: inkpen11
Title: Voice control for robots
Description: In this project you will program a robot to be able to execute commands spoken by a user. You will install a voice recognition program and implement a natural language understanding module that extracts the information about what move is the robot is asked to perform. Then you will program the robot to execute the move. There is the possibility of individual work or in a group of two students. The robots will be available in the Robotics Lab of prof. Emil Petriu.
Project code: inkpen10
Title: Synonyms and semantic similarity processing for French texts
Description: In this project you will implement tools for processing a corpus of French texts and develop a program that can choose the best word in a context.
Project code: inkpen9
Title: Video and text information retrieval
Description: In this project you will build an information retrieval system that can find video clips and dialog text that answer a given query. There is the possibility of individual work or in a group of two students.
Project code: inkpen8
Title: Grapheme-to-phoneme conversion tool for French
Description: Transforming words from written form onto phonetic form is useful in Text-to-Speech systems and in language learning support tools. In this project a tool will be developed for French words. The tool will learn pronunciation from data, using machine learning approaches. Training data and starter Java code will be provided.
Project code: inkpen7
Title: Information retrieval experiments
Description: In this project the performance of several information retrieval systems will be compared, and several query expansion methods will be tried.
Project code: inkpen6
Title: Tools for French text processing
Description: Many natural language processing tools exist for English texts. In this project some tools will be developed to work on a corpus of French texts. The corpus will be provided. The tools include: an automatic phonetic transcriptor, an automatic syllabifier, etc.
Project code: inkpen5
Title: Information extraction for financial information
Description: Financial information about companies is available on the Web, but the user needs to know how to find it and interpret it, in order to decide in which companies to invest. This project will provide a user with various financial ratios and advice. The user inputs the company name, through a GUI interface implemented in Java. The program fetched relevant webpages form Yahoo!Finance and other sites, and navigates through them to find the desired pages. Then it automatically extracts the information from the pages, calculates ratios, and displays results to the user.
Project code: inkpen4
Title: Intelligent thesaurus using Roget synonyms
Description: A thesaurus assists a
writer with a list of words that are similar to a given word.
The writer has to choose one of the words. An intelligent thesaurus assists the
user by indicating the best choices. The project will focus on the automatic
choice of the best alternative in the context of writing. Roget thesaurus will
be used as a source of synonyms and similar words, in order to allow for a
wide-coverage of the English language. The implementation will be done in Java.
Project code: inkpen3
Title: Intelligent thesaurus
Description: A thesaurus assists a writer with a list of words that are similar to a given word. The writer has to choose one of the words without being offered explanations about the differences in nuances of meaning between the possible choices. This project will develop an intelligent thesaurus that offers, in addition to the list of similar words, explanations about the differences between them. Moreover, it will be context-sensitive: it will order the possible choices by their suitability to the writing context. A knowledge-base of differences between synonyms will be provided. It also included knowledge about the collocations of synonyms (what words they combine well with and what words they do not). The implementation will be done in Java.
Project code: inkpen2
Title: Language models for the texts of the Web
Description: A language model reflects the distribution of the words in a large collections of texts. It computes probabilities of occurrence of individual words (unigrams) and pairs of consecutive words (bigrams). There are tools that compute language models for a given collection of texts. This project will modify such a tool to work with word co-occurrence counts collected from the Web. In this way, the probabilities of rare words will be computed more accurately. The implementation will be done in C++, Java, or Perl (to be determined).
Project code: inkpen1
Title: Natural language interface for animation
Description: This project implements a natural language interface that allows a human to communicate with an animated character using natural language (English in this case). The focus on the project is on translating from natural language into a simplified script-like animation language. An example of input text is: “Walk five steps to the right, jump three times, and then run back”. This text needs to be translated into something like: “walk steps:5 direction: Est, speed: slow; jump; jump; jump; walk steps:5, direction:West, speed: fast”. Then the character will execute this simple animation script, by moving around on the screen in the required sequence. The implementation will be done in Java.