Noun-modifier relations data

Why this document?

I have received many requests in the past few years for my tagged set of noun-modifier pairs. This document explains the data, the tag set used, the formatting of my data file, and how you can get it if you want, and the conditions attached.

The data

The file contains 600 tagged base noun phrases (modifier-noun pairs). These phrases were collected from Judith Levi's "The syntax and semantics of complex nominals" (1978) (manually) , Nancy Larrick's "The junior science book of rain, hail, sleet and snow" (1961) (automatically), SemCor - the version annotated with WordNet 1.6 senses (semi-automatically) and some examples were constructed and added for relations infrequent in the previous texts. The examples that were not extracted from SemCor were manually annotated with WordNet 1.6 senses (which was the rage at the time). All the pairs were annotated with semantic relations from a list of 47 (only 30 relations of these have instances in this data set), which I will tell you briefly about in the next section. There is also a second file which contains the same data, annotated with 5 more coarse relation tags (which are really relation classes, so to speak -- causal, temporal, spatial, participant, quality).

The set of semantic relations

While there is no consensus on a comprehensive list of semantic relations, the one we used contains 47 relatively generic relations (in the sense that they are not domain specific), and which are necessary and sufficient for the analysis of pairs extracted from semi-technical texts (Ken Barker showed that in his PhD thesis). You can see the list of relations with examples here.

If you are curious to know how this list was developed, here is the short story: Ken Barker developed three lists of relations for three separate syntactic levels (clause level, intra-clause (cases) and noun phrase) based on the literature on semantic relations at the time (around 1997). I then combined these three lists (by aligning, grouping and splitting) such that the same set of relations will cover phenomena at all three syntactic levels. If you want the long story, check Ken's thesis and my thesis.

It happened that somebody whom I gave this set to asked me why certain relations were assigned to certain pairs, when more relations looked like possible options. First of all, it is our premise that one and only one relation should be assigned to a pair of units (words, clauses, etc.). But there are ambiguities, that is true. However, when this set was annotated, each pair was discussed by two judges, and the one that seemed more appropriate was assigned.


I was asked, for example, why concert hall is assigned PURPOSE relation and why LOCATION is not a better choice. The reason is that you would call something a concert hall if it is a place designed with the purpose of holding concerts there, while other events may also take place, but a room, or hall, where concerts are occasionally held is not necessarily a concert hall.

The file format

The data is in Prolog format, as facts that give information for modifier-noun pairs (base NPs):

rel(nmr,HeadNoun,HeadInfo,Modifier,ModifierInfo,Relation).

The name of each variable is self explanatory. Relation is one of the 47 relations in our list. For each word in the pair there is a bit (two bits) of additional information ( both HeadInfo, ModifierInfo will contain these two bits of information):

[PartOfSpeech,WordNet1.6_sense]


How you can get the data, and the conditions attached

You can get the data annotated with 47 relations here, or annotated with 5 general relations here.
Citing/References

Vivi Nastase, Jelber Sayyad-Shiarabad, Marina Sokolova, Stan Szpakowicz, Learning noun-modifier semantic relations with corpus-based and WordNet-based features, AAAI 2006

Vivi Nastase and Stan Szpakowicz, Exploring Noun-Modifier Semantic Relations , IWCS 2003