Coreference resolution in Information Extraction
Abstract:
Information
Extraction (IE) is a process that takes free texts as input and creates a
structured representation of predefined target information as output. This data could be used directly for
display to users, or stored in a database for later analysis. As more and more information exists only
in natural language form, the need for information extraction system grows
dramatically.
The coreference resolution is one of the key components in the
IE system. It merges partial data
objects about the same entities, entity relationships, and evens described at
different discourse position. Most
of the coreference resolution systems deal with
resolution of anaphors with noun phrases or pronoun. Traditional approaches to anaphora
resolution are relying on a set of “anaphora resolution factor”,
which could be gender or number constrains.
This presentation will first introduce the general architecture for information extraction and outline several of its real world applications. Special emphasis will be given to coreference resolution component within the IE system. We will discuss the importance of the coreference resolution and analysis different approaches in the most well known works. Finally, we will talk about the coreference resolution under E-mail context, and the algorithm used to solve this problem.