Papers On Text Summarization / Keyphrase Extraction
compiled by Peter Turney,
February 17, 1997
Comments, corrections, additions welcome. Random order.
Online Computer Library Center
- Jean Godby
- highly related to my work
- but no concrete measure of performance
- seems to have changed research directions since 94
Jean Godby, "Two techniques for the identification of phrases
in full text", Annual Review of OCLC Research, 1994
of Massachusetts (Amherst)
- Lehnert, Riloff, et al.
- Information Extraction / Text Extraction
- this work is largely done within the context of the
MUC conferences, where the task is to fill in
slots in a template
- MUC = Message Understanding Conference
- for example, corpus of text on news stories on terrorist
attacks; template contains slots for:
- terrorist organization
- number of victims
- type of attack (bomb, kidnap, ...)
W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, E. Rioff, and S.
Soderland,"Evaluation an Information Extraction system.", Journal of
Integrated Computer-Aided Engineering, 1(6), 1994.
Ellen Riloff and Wendy Lehnert, "Information Extraction as a Basis
for High-Precision Text Classification,", ACM Transaction on
Information Systems, July 1994, vol. 12, No. 3, pp. 296-333.
Ellen Riloff, "Dictionary Requirements for Text Classification: A
Comparison of Three Domains.", In Working Notes of the AAAI Spring
symposium on Representation and Acquisition of Lexical Knowledge:
Polysemy, Ambiguity, and Generativity, 1995, pp. 123-128
Ellen Riloff,"Dictionary Requirements for Text Classification: A
comparison of Three Domains." In working Notes of the AAAi Spring
Symposium on Representation and Acquisition of Lexical Knowledge:
Polysemy, ambiguity, and Generativity, 1995, pp.123-128.
Ellen Riloff, "Automatically Constructing a Dictionary for
Information Extraction Tasks.", Proceeding of the Eleventh National
Conference on Artificial Intelligence, 1993, AAAI Press/MIT Press,
Ellen Rilof and Wendy Lehnert, "Information Extraction as a Basis
for High-Precision Text Classification.", In ACM Transactions on
Information systems, July 1994, vol. 12, no. 3, pp.296-333.
Ellen Riloff,"Automatically Construction a Dictionary for Information
Extraction Tasks.", Proceedings of the Eleventh National Conference on
Artificial Intelligence, 1993, AAAI Press/ MIT Press, pp. 811-816.
University of Arizona
- Chen et al.
- describes his own work on the application of
Utgoff's ID5R incremental decision tree induction
algorithm to the problem of selecting keyphrases
- relevance feedback from the user during search
for a document is used to refine the keyphrases
associated with each document in the collection
- also reports his own work on the application of
genetic algorithms to the problem of optimizing
the keyphrases associated with documents
- the "fitness" measure used by the GA is the
agreement among the keyphrases in a subset
of documents that the user has clustered together
Chen, Hsinchun. (1995). "Machine learning for information retrieval:
Neural networks, symbolic learning, and genetic algorithms", Journal of
the American Society for Information Science, 46(3): 194-216.
University of Ottawa
- Stan Matwin, Stan Szpakowicz, Doug Skuce, Judy Kavanagh, et al.
- extraction of knowledge from text
S. Delisle, K. Barker, J.-F. Delannoy, S. Matwin and S. Szpakowicz (1994)
"From Text to Horn Clauses: Combining Linguistic Analysis and Machine
Learning". R. Elio (ed.), Proc Tenth Canadian Conf on AI, CSCSI, Banff,
C. Feng, T. Copeck, S. Szpakowicz and S. Matwin (1994) "Semantic Clustering.
Acquisition of Partial Ontologies from Public Domain Lexical Sources". Proc
AAAI Knowledge Acquisition for Knowledge-Based Systems Workshop. Banff,
Kavanagh, J. (1995) The Text Analyzer: A Tool for Knowledge Acquisition
from Texts. Masters Thesis, Department of Computer Science,
University of Ottawa.
- Tomek Strzalkowski et al.
- involved in TREC
- interested in use of NLP to improve IR performance
- uses NLP to find phrases in text
Document indexing and retrieval using natural language processing.
Strzalkowski. RIAO '94 Proceedings.
- no journal articles cited
- summarization of text by extracting key sentences
- often does not work well
- Marti Hearst, Julian Kupiec, Jan Pedersen, Francine Chen
- Distilling Information from Documents
- NLP technology
- automatic document summarization, information
extraction, morphological analysis
A trainable document summarizer, by J. Kupiec, J. Pederson, and F.
Chen, Proc. SIGIR '95, pp. 68-73, Seattle, 1995, ACM Press.
University of California (San Diego)
- Rik Belew
- various interests in genetic algortithms and information
Exporting phrases: A statistical analysis of topical language, by A. M.
Steier and R. K. Belew, Proc. 2nd Symp. on Document Analysis and
Information Retrieval, pp. 179-190, 1993
- Gerard Salton (deceased), Chris Buckley, Joel Fagan
- concerned with weighting of terms and phrases in order
to improve search
Salton, G. and Buckley, C. (1987) "Term weighting approaches
in automatic text retrieval", Department of Computer Science,
Cornell University, Technical Report 87-881.
Fagan, J.L. (1987) "Experiments in automatic phrase indexing
for document retrieval: a comparison of syntactic and non-syntactic
methods", Ph.D. Thesis, Department of Computer Science,
Cornell University, Technical Report 87-868.
University of Brabant (The Netherlands)
- Hans Paijmans
- interesting comparison of term weighting by TFIDF with
Paijmans, J.J. (1994) "Relative weights of words in documents". In
L.G.M. Noordman and W.A.M. de Vroomen, editors, Conference proceedings
of STINFON, pp. 195-208.
- Mitchell Wyle
- Automatic Phrase Generation
- use of phrase weighting to improve search performance
- now works at IntelliMatch, "Internet's #1 Service for
Matching Job Seekers and Employers"
Wyle, M.F., and Frei, H.P. (1991) "Retrieval algorithm effectiveness
in a wide area network information filter", Proceedings of the 14th
ACM SIGIR Conference on R&D in Information Retrieval, ACM, Chicago IL,
University Carlos III (Spain)
- Alberto Munoz
- keyword generation by fuzzy neural nets
Munoz, A. (1996). "Creating term associations using a hierarchical
ART architecture". In C.v.d. Malsburg and W.v. Seelen, editors,
International Conference on Artificial Neural Networks, Lecture
Notes in Computer Science, pp. 171-177, Bochum, Germany, Spring
University of Nantes (France)
- Christian Jacquemin
- interested in use of NLP to improve IR performance
- uses NLP to find phrases in text
What is the tree we see through the window: A linguistic approach
to windowing and term variation. Jacquemin. Unpublished. 1995 draft.
University of Westminster (UK)
- John Sykes, Vassilis Konstantinou, et al.
- NLP and legal reasoning
J.T. Sykes, V. Konstantinou and P.L.R. Morse, "Extraction Explicit
and Implicit Knowledge from Natural Language Texts,"..
Konstantinou, V., Sykes, J., and Yannopoulos, G.N. (1993). "Can legal
knowledge be derived from legal texts?", Proceedings of the
Fourth International Conference on Artificial Intelligence and
Law. ACM Press.
- Kathy McKeown, Dragomir Radev, et al.
- NLP group
- news summarization
Generating summaries of multiple news articles, K. McKeown, D.R. Radev,
Proc. SIGIR '95, pp. 74-82, Seattle, 1995, ACM Press.
The Dagstuhl Seminar -
December 13 to 17, 1993, Dagstuhl, Germany
Karen Sparck Jones, Univ. of Cambridge (chairperson)
Brigitte Endres-Niggemeyer, Polytechnic of Hannover (organizer)
Jerry Hobbs, SRI International, Menlo Park
Elizabeth Liddy, Syracuse University
Cecile Paris, ISI Marina del Rey
"Summarizing Text for Intelligent Communication:
Building a research platform for theoretical and
practical progress in summarizing, as a key task in
natural language processing, artificial intelligence,
and related disciplines"
"The PLUM system has been shown in Spanish, German, Chinese, and Japanese.
A component of PLUM, called IdentiFinder, can extract names from text in
both English and Spanish. Another component, POST (Part Of Speech Tagger),
uses probabilistic techniques to assign likely parts of speech to words
in arbitrary text. Both POST and IdentiFinder have been successfully used
by BBN clients and research colleagues in other institutions, and represent
the leading edge of text processing technology."
An automatic hyperlinking and indexing program for HTML documents to be
presented on an internet or intranet site;
- Web Anchor:
An automatic organizer and hyperlinking program for downloaded web pages;
An automatic indexing program for word processing.
of Western Ontario - Graduate School of Library and Information Science,
Timothy C. Craven.
"My principal current research is on the development and testing of
computerized tools to assist in the writing of abstracts. This
research is funded by NSERC."
Verity - "Summarization extracts a few key snippets from each document and
displays them in a summary allowing you to see at a glance whether
the document is of interest."
"JAN PEDERSEN JOINS VERITY TO LEAD ADVANCED TECHNOLOGY GROUP".
(Jan Pedersen worked at Xerox; patented clustering algorithms;
co-developer of part-of-speech tagger)
InXight - Xerox spin-off.
"InXight Software Inc announced it has licensed its LinguistX software to
Verity Inc. LinguistX is a suite of natural language software For
analyzing and retrieving text-based information. Verity will use
LinguistX in its SEARCH'97 advanced search and retrieval engine and
both companies will collaborate to further improve the technology."
"Summarization can add additional document analysis capabilities to your
application. The LinguistX Summarizer automatically examines the content
of a document in real-time to identify the document's key phrases and
extract sentences to form an indicative summary, either by highlighting
excerpts within a document or creating a bulleted list of the
document's key phrases."
Mark Kantrowitz, Research Scientist.
- main area of research: text summarization
- famous as founder of CMU AI Repository
Mitre - Mark T. Maybury.
"Automatically summarizing events from data or knowledge bases is a
desirable capability for a number of application areas including report
generation from databases (e.g., weather, financial, medical) and
simulations (e.g., military, manufacturing, economic)."