My main focus is TEXT, a written medium allowing the use of LANGUAGE as a vehicule for expressing INFORMATION.   Therefore I am very interested in studying text structures and text contents mostly with the goal of KNOWLEDGE EXTRACTION AND ACQUISITION.    The Internet is overwhelming us with texts.  I am trying to find ways to have automatic pre-processing of texts, so that interesting information can be extracted from these texts and presented to users in a more schematic way, which leads to my interest in KNOWLEDGE REPRESENTATION.    This means looking inside of texts, at their content, at the words and the semantic relations between those words.  Looking at words and their meaning leads me toward research in LEXICOGRAPHY and TERMINOLOGY.   Thinking of these texts as containing valuable information not available to a large part of the population who do not speak the language they are written in leads me toward research in MACHINE TRANSLATION.  Texts are also used for learning a new language, and hopefully this learning process can be made more efficient with research in CALL (Computer Assisted Language Learning).

My research has been mostly on the English language, however I am also very interested in working on French texts.


You can go to the Publication page to get references for some of these projects.


One principal project is the design & realization of the software
        SeRT: Semantic Relation in Texts
within a larger goal of doing knowledge extraction from texts

Started in summer 2000 and on-going.

SeRT (Semantic Relations in Text) is an integrated tool being developed for extracting information from texts to build a repository of domain knowledge in the form of a large semantic network. SeRT integrates multiple functions within a single environment to provide a highly interactive way of finding the building blocks (terms) of the semantic network as well as its links (relations). It allows multiple possibilities, such as a) semi-automatic extraction of terms from a document or corpus of documents, b) discovery of semantic relations between terms by searching on linguistic patterns, c) discovery of new linguistic patterns based on terms already extracted, and d) view of the resulting database. The constructed relational database can be seen as a large semantic network connecting the multiple terms in the domain. SERT takes a highly interactive semi-automatic approach, instead of a fully automatic one, to provide the user control over results presented by the tool.

Research associate on this project : Terry Copeck

Another large project
       ARC-Concept:  Acquisition, Representation and Clustering of Concepts
       Toward automatically building a Lexical Knowledge Base

Started in 1995, put aside for now, but waiting to be revived!

This project has started during my Ph.D. and attempts at gathering the information from Machine Readable Dictionaries into a knowledge base, to provide background knowledge for processing natural language texts.  It is to some extent in the same spirit as WordNet but created automatically through parsing and semantic analysis of dictionary definitions.
See abstract of thesis.

More theoretical work
        Semantic similarity

Started in 1999, on-going work.

More theoretical work on the problem of semantic similarity.   This is a fascinating topic raising more questions than answers.  What does it mean for 2 words to be "similar"?  There has been much research on this topic, using different ressources leading to the development of different measures.


Élaboration d'un didacticiel intelligent pour l'aide à la lecture du français
- Subventionné par le programme "Initiative du développement à la recherche"
- Équipe:  Lise Duquette, Institut des Langues Secondes, U. d'Ottawa
                C. Barrière, S. Szpakowicz, S. Matwin, School of Information Technology and Engineering, U of Ottawa
                A. Desrochers, École de Psychologie, U. of Ottawa
                D. Forget, Lettres Françaises, U. of Ottawa
                J. Liceras, Langues et littérature moderne, U. of Ottawa

Investigation of the top-level structure of texts through the analysis of signal words
- Subventionné par le fond de démarrage du GRIL
- Collaboration avec Lise Duquette, Institut des Langues Secondes, U. d'Ottawa
- Assistant de recherche pour ce travail : Akakpo Agbago

Investigation of integration of semantic information in a French large-scale electronic dictionary
- En phase d'élaboration
- Collaboration avec Alain Desrochers, École de Psychologie, U. d'Ottawa


Matthieu Hermet (mémoire de DESS de l'INALCO, Paris, juillet à décembre 2001
La certitude sur les indices de causalité dans les textes

Exploration de la relation de causalité.   Exploration de marqueurs en corpus.  Élaboration d'un schéma ou formalisme de représentation pour englober les exemples trouvés.   Incorporation de la certitude dans cette représentation.

Pascal Blais (Ph.D)
co-supervision avec Robert Laganière
Intellectual Evolution of Artificial Beings

Define a model of an artificial brain that would allow an autonomous agent to evolve intellectually.  Based on such a model, an autonomous agent would be able to form new concepts and acquire new behaviors for which the agent was not programme for initially.

Souhail Zaki (Master's degree, started January 2000, now part-time)
Investigating semantic similarity in corpus

Technical report --- TR-2001-03  Méthode itérative pour le calcul de la similarité comportementale des mots

Jérôme Tétreault (Master's degree, started January 1999 - interrupted August 2000)
Automatic Machine Translation Evaluation

The evaluation of MT systems by human judges is a long and costly process.
We are investigating ways of doing partial evaluation automatically.

Technical report --- TR-2001-04


Winter 2001
        Student Translation Tracking System
        co-supervised with Lynne Bowker, School of Translation and Interpretation
        Students:   Baskaran Vijayaratnam & Emeka Ike
        CSI4900 (3 cr. project)

Winter 2000
        Visual Tools for Parsing Context-Free Grammars
        Students:  Pierre Dalcourt & Matthew Tyrer
        CSI4900 (3 cr. project)

        Word clustering based on word co-occurrences in corpus
        Student: Sophie Houle
        CSI4900 (3 cr. project)

Fall 99 - Winter 99
        Word clustering based on definition analysis in a Machine Readable Dictionary
        Student: Marc Chéné
        CEG4000 (6 cr. project)

        Interrogateur de base de connaissance
        Pascal Tellier
        CEG4000 (6 cr. project)

Fall 1999
         Integrating a visual interface to CG-LITE
         CSI4900 (3 cr. project)

Summer 98 + Fall 99
        CG-LITE : a java platform for conceptual graph manipulation
        Marc Perron
        Summer assistantship + CSI4900 (3 cr. project)

