[Prev][Next][Index][Thread]

on the meaning of 'word sense'




   Ted Dunning writes:
   > the fact is that most humans have very great difficulty performing
   > sense disambiguation.  doesn't this seriously bring into whether the
   > task is pertinent to language processing?

   You might do me and other computational linguists a service by 
   summarizing the psychological evidence you're referring to 
   (and providing a reference or two to get started).

i am talking about the task where a subject is given dictionary
definitions, some training text and a chance to discuss with others
which senses various words are.  the goal is to tag a number of
instances of a particular word with particular dictionary senses.

what i find particularly interesting is the question of whether two
such subjects will agree on the various sense-tags.  i find the
question of whether the two subjects can be coerced into agreeing much
less interesting.

the actual efforts which i am aware of include rebecca bruce's efforts
to get data for her dissertation research and a recent private comment
by bob amsler.  generally, subjects can only agree about (roughly) 70%
of the time.  it is possible to find words for which this task is
easier, and words for which this task is harder.  i get the impression
that people who do work on automated sense disambiguation tend to
select the words they disambiguate so that they can get useable
training data.

a real example might be helpful.  taking the word stock, LDOCE shows
the following 21 major senses split across three homographs:



stock
0100 a supply (of something) for use: a good stock of food 0200 goods for 
sale: Some of the stock is being taken without being paid for 0300 the 
thick part of a tree trunk 0400 (a)  a piece of wood used as a support 
or handle, as for a gun or tool  (b)  the piece which goes across the 
top of an ANCHOR^1 (1) from side to side 0500 (a)  a plant from 
which CUTTINGs are grown (b) a stem onto which another plant is 
GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. 
cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900
 money lent to a government at a fixed rate of interest 1000 the money 
(CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden 
flower with a sweet smell 1200 a liquid made from the juices of meat, bones, 
etc., used in cooking 1300 (in former times) a stiff cloth worn by men round
 the neck of a shirt -compare TIE 1400 \fB in/out of stock \fR  kept/not 
kept in the shop at the present moment and therefore able/not able to be 
bought: ``Have you any blue shirts in stock?'' ``No, I'm afraid they're out 
of stock, but we shall be having some more in next month'' 1500 \fB out of 
stock \fR  having none for sale: ``Have you any blue shirts in stock?'' 
``No, I'm afraid we're out of stock (of them) at the moment'' 1600 \fB take 
stock (of) \fR  to consider the state of things so as to take a decision 
(often in the phr. \fB  take stock of the situation \fR ) -compare 
STOCKTAKING; see also LAUGHINGSTOCK, LOCK^2 (8), \fB  stock and barrel\fR 

stock
0100 to keep supplies of: They stock all types of shoes 0200 to supply: a 
shop well stocked with goods 0300 to store: They've stocked their crops in 
the BARN -see also STOCK UP

stock
0100 commonly used, esp. without much meaning: a stock greeting such as 
``Good morning'' 0200 kept in STOCK^1 (14), esp. because of a standard or 
average type: stock sizes

and here are some uses of stock taken from the wall street journal.
note that stock is a particularly easy word to pick senses for in the
wall street journal since you just have to guess homograph 1, sense 10
to get pretty good accuracy.  of course, this is wrong since that
definition isn't really an accurate definition of the most common
uses.  many other words are much more difficult.

in order to emphasize differences, i have selected sentences here.
these are not a representative distribution.

<s> Stocks of manufacturers were up 0.6% and their inventory-sales ratio reached 1.57 months. </s>

is stock here stock_1_10 or is it referring to inventory?

<s> Wholesale stocks inched up 0.1% with the ratio of inventories to sales at 1.28. </s>

again, same question.

<s> Finally, with regard to stock market system reform, Mr. Melloan cavalierly asserts that I made the comment that it "would be better to do the wrong thing than nothing at all." </s>

this one is relatively easy, except that stock market doesn't quite
match the generative meaning.

<s> In composite trading on the American Stock Exchange, the company closed unchanged yesterday at $8.375 a share. </s>

isn't this part of a proper noun?  does it have the same sense?

<s> Besides causing a big stock buildup at the smelter, harming bilateral ties and cutting Jakarta's export earnings, the aluminum-export ban is damaging Indonesia's investment climate, businessmen from both countries say. </s>

clearly inventory.

<s> Vulcan Corp. moved to distribute most of the 520,180 shares of Eagle-Picher Industries Inc. it holds by declaring a special dividend to stockholders. </s>

is this an occurrence of stock?

<s> Bob's Ski Shop in Portland, Ore., says it can't keep enough kids' skis in stock. </s>

this should be easy.

<s> Equity investors complain that Harcourt shouldn't have sold stock at such a low price. </s>

but is this inventory, or common stock?

if anybody wants, i can perform this same exercise one a more
difficult word.



References: