ILASH seminar series

Extracting the essence

Chris Paice
Computing Department
Lancaster University
Wednesday 10 January 1996, 12:00 pm
Room 206, West Court, 2 Mappin Street, Sheffield S1

An abstract can be defined as a concise statement of the central message of a 'formal' document such as a scientific paper. An abstract is said to be 'informative' if it can serve as a substitute for the complete paper, or 'indicative' if it enables the reader to decide whether the complete paper is likely to be worth reading.

A concise representation of the message of an informal document, such as a news report, is usually called a 'summary'.

The aim of automatic abstracting (and automatic summarisation) is to take the full 'source text' of a document and generate a brief and hopefully intelligible statement from it.

Research on automatic summarisation has mainly concentrated on 'understanding' the source text, by instantiating frames. These programs have tended to be slow and domain-specific.

Automatic abstracting makes us of the relatively formal and stereotyped nature of scientific papers. The usual method has involved estimating the 'importance' of each sentence in a text, using various structural and lexical clues. A more recent method developed at Lancaster uses contextual clues to identify the main concepts discussed in a paper, and then uses an output template to generate an abstract incorporating all the concepts found.

In my talk, I shall outline the sentence extraction approach, and explain the problems that it encounters. I shall then explain how some of these problems can be avoided by the new method, and will discuss possible future directions for research. If time permits, I will also talk about the nasty question of how the quality of abstracts may be assessed.

[ILASH home] Last modified: January 7 1996
Malcolm Crawford <>