Papers On Text Summarization / Keyphrase Extraction

compiled by Peter Turney, February 17, 1997
Comments, corrections, additions welcome. Random order.

Online Computer Library Center
- Jean Godby - highly related to my work - but no concrete measure of performance - seems to have changed research directions since 94
- http://www.oclc.org/oclc/
- Jean Godby, "Two techniques for the identification of phrases in full text", Annual Review of OCLC Research, 1994
University of Massachusetts (Amherst) - Lehnert, Riloff, et al. - Information Extraction / Text Extraction - this work is largely done within the context of the MUC conferences, where the task is to fill in slots in a template - MUC = Message Understanding Conference - for example, corpus of text on news stories on terrorist attacks; template contains slots for: - terrorist organization - number of victims - type of attack (bomb, kidnap, ...) ...
- http://www.seas.gwu.edu/student/chulee/bib.html#info-ext
- http://ciir.cs.umass.edu/info/people/staff/lehnert.html
- http://www.cs.utah.edu/csinfo/handbook/node71.html
- W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, E. Rioff, and S. Soderland,"Evaluation an Information Extraction system.", Journal of Integrated Computer-Aided Engineering, 1(6), 1994.
- Ellen Riloff and Wendy Lehnert, "Information Extraction as a Basis for High-Precision Text Classification,", ACM Transaction on Information Systems, July 1994, vol. 12, No. 3, pp. 296-333.
- Ellen Riloff, "Dictionary Requirements for Text Classification: A Comparison of Three Domains.", In Working Notes of the AAAI Spring symposium on Representation and Acquisition of Lexical Knowledge: Polysemy, Ambiguity, and Generativity, 1995, pp. 123-128
- Ellen Riloff,"Dictionary Requirements for Text Classification: A comparison of Three Domains." In working Notes of the AAAi Spring Symposium on Representation and Acquisition of Lexical Knowledge: Polysemy, ambiguity, and Generativity, 1995, pp.123-128.
- Ellen Riloff, "Automatically Constructing a Dictionary for Information Extraction Tasks.", Proceeding of the Eleventh National Conference on Artificial Intelligence, 1993, AAAI Press/MIT Press, pp.881-816.
- Ellen Rilof and Wendy Lehnert, "Information Extraction as a Basis for High-Precision Text Classification.", In ACM Transactions on Information systems, July 1994, vol. 12, no. 3, pp.296-333.
- Ellen Riloff,"Automatically Construction a Dictionary for Information Extraction Tasks.", Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993, AAAI Press/ MIT Press, pp. 811-816.
University of Arizona - Chen et al. - describes his own work on the application of Utgoff's ID5R incremental decision tree induction algorithm to the problem of selecting keyphrases - relevance feedback from the user during search for a document is used to refine the keyphrases associated with each document in the collection - also reports his own work on the application of genetic algorithms to the problem of optimizing the keyphrases associated with documents - the "fitness" measure used by the GA is the agreement among the keyphrases in a subset of documents that the user has clustered together
- Chen, Hsinchun. (1995). "Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms", Journal of the American Society for Information Science, 46(3): 194-216.
University of Ottawa - Stan Matwin, Stan Szpakowicz, Doug Skuce, Judy Kavanagh, et al. - extraction of knowledge from text
- S. Delisle, K. Barker, J.-F. Delannoy, S. Matwin and S. Szpakowicz (1994) "From Text to Horn Clauses: Combining Linguistic Analysis and Machine Learning". R. Elio (ed.), Proc Tenth Canadian Conf on AI, CSCSI, Banff, 9-16.
- C. Feng, T. Copeck, S. Szpakowicz and S. Matwin (1994) "Semantic Clustering. Acquisition of Partial Ontologies from Public Domain Lexical Sources". Proc AAAI Knowledge Acquisition for Knowledge-Based Systems Workshop. Banff, 1-1--1-16.
- Kavanagh, J. (1995) The Text Analyzer: A Tool for Knowledge Acquisition from Texts. Masters Thesis, Department of Computer Science, University of Ottawa.
General Electric - Tomek Strzalkowski et al. - involved in TREC - interested in use of NLP to improve IR performance - uses NLP to find phrases in text
- http://hobart.cs.umass.edu/~allan/irtopics-past.html
- Document indexing and retrieval using natural language processing. Strzalkowski. RIAO '94 Proceedings.
British Telecom - NetSumm - no journal articles cited - summarization of text by extracting key sentences - often does not work well
Xerox - Marti Hearst, Julian Kupiec, Jan Pedersen, Francine Chen - Distilling Information from Documents - NLP technology - automatic document summarization, information extraction, morphological analysis
- http://www.xerox.com/XSoft/lexdemo/
- A trainable document summarizer, by J. Kupiec, J. Pederson, and F. Chen, Proc. SIGIR '95, pp. 68-73, Seattle, 1995, ACM Press.
University of California (San Diego) - Rik Belew - various interests in genetic algortithms and information retrieval
- Exporting phrases: A statistical analysis of topical language, by A. M. Steier and R. K. Belew, Proc. 2nd Symp. on Document Analysis and Information Retrieval, pp. 179-190, 1993
Cornell University - Gerard Salton (deceased), Chris Buckley, Joel Fagan - concerned with weighting of terms and phrases in order to improve search
- Salton, G. and Buckley, C. (1987) "Term weighting approaches in automatic text retrieval", Department of Computer Science, Cornell University, Technical Report 87-881.
- Fagan, J.L. (1987) "Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods", Ph.D. Thesis, Department of Computer Science, Cornell University, Technical Report 87-868.
University of Brabant (The Netherlands) - Hans Paijmans - interesting comparison of term weighting by TFIDF with alternative methods
- Paijmans, J.J. (1994) "Relative weights of words in documents". In L.G.M. Noordman and W.A.M. de Vroomen, editors, Conference proceedings of STINFON, pp. 195-208.
LaSalle University - Mitchell Wyle - Automatic Phrase Generation - use of phrase weighting to improve search performance
- http://vhdl.org/~wyle/ - now works at IntelliMatch, "Internet's #1 Service for Matching Job Seekers and Employers"
- Wyle, M.F., and Frei, H.P. (1991) "Retrieval algorithm effectiveness in a wide area network information filter", Proceedings of the 14th ACM SIGIR Conference on R&D in Information Retrieval, ACM, Chicago IL, pp. 114-122.
University Carlos III (Spain) - Alberto Munoz - keyword generation by fuzzy neural nets
- Munoz, A. (1996). "Creating term associations using a hierarchical ART architecture". In C.v.d. Malsburg and W.v. Seelen, editors, International Conference on Artificial Neural Networks, Lecture Notes in Computer Science, pp. 171-177, Bochum, Germany, Spring Verlag.
University of Nantes (France) - Christian Jacquemin - interested in use of NLP to improve IR performance - uses NLP to find phrases in text
- http://hobart.cs.umass.edu/~allan/irtopics-past.html
- What is the tree we see through the window: A linguistic approach to windowing and term variation. Jacquemin. Unpublished. 1995 draft.
University of Westminster (UK) - John Sykes, Vassilis Konstantinou, et al. - NLP and legal reasoning
- J.T. Sykes, V. Konstantinou and P.L.R. Morse, "Extraction Explicit and Implicit Knowledge from Natural Language Texts,"..
- Konstantinou, V., Sykes, J., and Yannopoulos, G.N. (1993). "Can legal knowledge be derived from legal texts?", Proceedings of the Fourth International Conference on Artificial Intelligence and Law. ACM Press.
Columbia University - Kathy McKeown, Dragomir Radev, et al. - NLP group - news summarization
- Generating summaries of multiple news articles, K. McKeown, D.R. Radev, Proc. SIGIR '95, pp. 74-82, Seattle, 1995, ACM Press.
The Dagstuhl Seminar - December 13 to 17, 1993, Dagstuhl, Germany participants. Organizers: Karen Sparck Jones, Univ. of Cambridge (chairperson) Brigitte Endres-Niggemeyer, Polytechnic of Hannover (organizer) Jerry Hobbs, SRI International, Menlo Park Elizabeth Liddy, Syracuse University Cecile Paris, ISI Marina del Rey
"Summarizing Text for Intelligent Communication: Building a research platform for theoretical and practical progress in summarizing, as a key task in natural language processing, artificial intelligence, and related disciplines"
BBN - "The PLUM system has been shown in Spanish, German, Chinese, and Japanese. A component of PLUM, called IdentiFinder, can extract names from text in both English and Spanish. Another component, POST (Part Of Speech Tagger), uses probabilistic techniques to assign likely parts of speech to words in arbitrary text. Both POST and IdentiFinder have been successfully used by BBN clients and research colleagues in other institutions, and represent the leading edge of text processing technology."
Iconovex - AnchorPage: An automatic hyperlinking and indexing program for HTML documents to be presented on an internet or intranet site; - Web Anchor: An automatic organizer and hyperlinking program for downloaded web pages; - Indexicon: An automatic indexing program for word processing.
University of Western Ontario - Graduate School of Library and Information Science, Timothy C. Craven. "My principal current research is on the development and testing of computerized tools to assist in the writing of abstracts. This research is funded by NSERC."
Verity - "Summarization extracts a few key snippets from each document and displays them in a summary allowing you to see at a glance whether the document is of interest." "JAN PEDERSEN JOINS VERITY TO LEAD ADVANCED TECHNOLOGY GROUP". (Jan Pedersen worked at Xerox; patented clustering algorithms; co-developer of part-of-speech tagger)
InXight - Xerox spin-off. "InXight Software Inc announced it has licensed its LinguistX software to Verity Inc. LinguistX is a suite of natural language software For analyzing and retrieving text-based information. Verity will use LinguistX in its SEARCH'97 advanced search and retrieval engine and both companies will collaborate to further improve the technology."
"Summarization can add additional document analysis capabilities to your application. The LinguistX Summarizer automatically examines the content of a document in real-time to identify the document's key phrases and extract sentences to form an indicative summary, either by highlighting excerpts within a document or creating a bulleted list of the document's key phrases."
- http://www.inxight.com/products/linguistx/overview.shtml
CMU - Mark Kantrowitz, Research Scientist. - main area of research: text summarization - famous as founder of CMU AI Repository
Mitre - Mark T. Maybury. "Automatically summarizing events from data or knowledge bases is a desirable capability for a number of application areas including report generation from databases (e.g., weather, financial, medical) and simulations (e.g., military, manufacturing, economic)."