CSI5180: Topics in Artificial Intelligence: Natural Language Processing, A Statistical Approach

 

 

Assignment 2

Due: Fri, Mar 23, 21:00

 

 

Semantic Role Labeling [50 points]

A semantic role in language is the relationship that a syntactic constituent has with a predicate. Typical semantic arguments include Agent, Patient, Instrument, etc. and also adjunctive arguments indicating Locative, Temporal, Manner, Cause, etc. aspects. Recognizing and labeling semantic arguments is a key task for answering "Who", "When", "What", "Where", "Why", etc. questions in Information Extraction, Question Answering, Summarization, and, in general, in all NLP tasks in which some kind of semantic interpretation is needed.

The following sentence, taken from the PropBank corpus, exemplifies the annotation of semantic roles:

[A0 He ] [AM-MOD would ] [AM-NEG n't ] [V accept ] [A1 anything of value ] from [A2 those he was writing about ] .

Here, the roles for the predicate accept (that is, the roleset of the predicate) are defined in the PropBank Frames scheme as:

V: verb
A0: acceptor
A1: thing accepted
A2: accepted-from
A3: attribute
AM-MOD: modal
AM-NEG: negation

The PropBank notation includes:  A0 .. A5 : arguments associated with a verb predicate, defined in the PropBank Frames scheme;  and AM-T : adjunctive arguments of various sorts, where T is the type of the adjunct. Types include locative, temporal, manner, etc.

The Shared Task of CoNLL-2005 concerned the recognition of semantic roles for the English language, based on PropBank predicate-argument structures. Given a sentence, the task consists of analyzing the propositions expressed by some target verbs of the sentence. In particular, for each target verb, all the constituents in the sentence which fill a semantic role of the verb had to be recognized. This problem is called Semantic Role Labeling (SRL).

Your task is to produce automatic SRL annotations using one or more tools / SRL systems available on the Internet (adapt their input or output format as needed). Alternatively, you can implement your own SRL method.

 

Please write a report, describing the methods implemented in the systems that you tried (or that you implemented). Also submit a file named Results with your results for the test data (for the best scoring method).  Put the results of any system that you tried in the format described below.

Please mention in your report your results in terms of Precision, Recall, and F-measure for any SRL tool that you tried. Your goal is to achieve the best F-score on the test data. We will have a mini-competition (chocolate prizes) for the best F-score.

 

The test data is available here. Use only the directory test.wsj. The input data is already tokenized in the subdirectory words (one word per line, with an empty line between two sentences). The format of the Results file is described in more detail here (the last column, props is the one that matters for the evaluation). The expected solution is in the subdirectory props. Use the script srl-eval.pl in order to calculate P, R, and F-score for your Results file by comparing it to the expected solution. For more information about the data and its format, read the task webpage.

 

Submission instructions:

 

1. Prepare a report (.pdf, .doc, or .txt). Describe your methods, any tools that you used, present the results, and analyze / discuss them.

 

2. Submit your report by email to diana@site.uottawa.ca. No need to submit your code. Please submit the result file separately.