ILASH seminar 10 January 1996

Extracting the essence

Chris Paice: Computing Department; Lancaster University; Wednesday 10 January 1996, 12:00 pm; ILASH Suite; Room 206, West Court, 2 Mappin Street, Sheffield S1

An abstract can be defined as a concise statement of the central message of a 'formal' document such as a scientific paper. An abstract is said to be 'informative' if it can serve as a substitute for the complete paper, or 'indicative' if it enables the reader to decide whether the complete paper is likely to be worth reading.

A concise representation of the message of an informal document, such as a news report, is usually called a 'summary'.

The aim of automatic abstracting (and automatic summarisation) is to take the full 'source text' of a document and generate a brief and hopefully intelligible statement from it.

Research on automatic summarisation has mainly concentrated on 'understanding' the source text, by instantiating frames. These programs have tended to be slow and domain-specific.

Automatic abstracting makes us of the relatively formal and stereotyped nature of scientific papers. The usual method has involved estimating the 'importance' of each sentence in a text, using various structural and lexical clues. A more recent method developed at Lancaster uses contextual clues to identify the main concepts discussed in a paper, and then uses an output template to generate an abstract incorporating all the concepts found.

In my talk, I shall outline the sentence extraction approach, and explain the problems that it encounters. I shall then explain how some of these problems can be avoided by the new method, and will discuss possible future directions for research. If time permits, I will also talk about the nasty question of how the quality of abstracts may be assessed.

Last modified: January 7 1996

Malcolm Crawford <m.crawford@dcs.shef.ac.uk>