Title: Document’s Logical Structure Extraction
Abstract
We would like to create a knowledge base for the UML superstructure
specifications. Our motivation is that such specifications are dense,
repetitive and difficult to use. They are written primarily in
semi-structured text, but the structure must be maintained manually as
they are edited. End users can not use them efficiently because of the
general complexity of the document. Our immediate objective is the
extraction of the document’s logical structure. Many key concepts of a
document are expressed in this structure, which includes the headings. By
extracting such a structure in a tagging style, we can form a good
infrastructure for the subsequent KB creation steps.