ConText: Questions & Answers For more information contact 415.506.4514 or infotext@us.oracle.com _________________________________________________________________ Version 1.1 July 1994 Table of Contents: I. INTRODUCTION 1.1 What is Oracle ConText? 1.2 Why use Oracle ConText? 1.3 Is Oracle ConText a document management system, proofreader, or text retrieval application? 1.4 How do I use Oracle ConText? 1.5 What are some applications of the Oracle ConText technology? 1.6 How does Oracle ConText differ from other language processing technology? 1.7 Does Oracle ConText have any competitors? 1.8 What types of writing can Oracle ConText process? II. LANGUAGE ANALYSIS 2.1 How does Oracle ConText analyze language? 2.2 How does Oracle ConText know what is important to me, the reader? 2.3 Where does Oracle ConText get the vocabulary necessary for processing language? 2.4 Will Oracle ConText recognize the vocabulary used in my industry? 2.5 Can I add words to the Oracle ConText lexicon? 2.6 What happens when Oracle ConText cannot recognize a word or phrase? 2.7 Does Oracle ConText recognize British English? III. LINGUISTIC FEATURES 3.1 What information about text does Oracle ConText generate? 3.2 What type of theme information does Oracle ConText produce? 3.3 Can I affect the theme information generated by Oracle ConText? 3.4 What grammatical analysis does Oracle ConText provide? 3.5 What statistical analysis does Oracle ConText provide? 3.6 What indexing information does Oracle ConText generate? 3.7 Does Oracle ConText provide parts of speech or parse trees? IV. PROCESSING TEXT 4.1 Does Oracle ConText rewrite documents? 4.2 What type of text input does Oracle ConText require? 4.3 Can Oracle ConText analyze text containing grammatical errors? 4.4 How does text get "into" and "out of" Oracle ConText? 4.5 How is the output from Oracle ConText presented? 4.6 What is the size of the output generated by Oracle ConText? 4.7 Can Oracle ConText output be stored? 4.8 Does a document have to be reprocessed each time I want a different type of output? V. INTEGRATING WITH ORACLE CONTEXT 5.1 What components are required to integrate Oracle ConText? 5.2 Are development resources required to integrate Oracle ConText with an application? 5.3 Do I need to know the C programming language to create components for Oracle ConText? 5.4 Is Oracle ConText built on the Oracle7 Server? 5.5 Does Oracle ConText support a client/server architecture? 5.6 Does Oracle ConText provide any example components? VI. DEVELOPMENT PLANS 6.1 Is Oracle ConText being integrated with Oracle products? 6.2 What are the plans for the Oracle ConText API? 6.3 Will Oracle ConText be integrated with any non-Oracle products? 6.4 What type of National Language Support (NLS) does Oracle ConText provide? VII. GENERAL QUESTIONS 7.1 How long has Oracle ConText been in development? 7.2 On which platforms is Oracle ConText available? 7.3 What are the system requirements for Oracle ConText? 7.4 How fast does Oracle ConText perform? 7.5 Can Oracle ConText analyze documents of any length? 7.6 What is included with Oracle ConText? 7.7 How much does Oracle ConText cost? I. INTRODUCTION 1.1 What is Oracle ConText? Oracle ConText is a natural language processing technology that identifies themes and content in English text. Because it "understands" the text it proces ses, it can extract all the vital information contained in a text block as well as determine the meaning of the text. 1.2 Why use Oracle ConText? Like most business professionals today, you probably don't have enough time to read all the documentation, reports, and trade journals that provide the information you need to do your job well. Oracle ConText offers the solution to this information challenge by identifying themes and content in text to create powerful new management and navigation methods for electronically-stored text. For example, Oracle ConText can "read" your documents, and systematically and intelligently condense them into concise document summaries and outlines. ConTe xt creates summaries and outlines using theme and meaning rather than simple word frequency or other static methods. Oracle ConText can also accelerate the process of looking for information acros s multiple documents. For example, Oracle ConText's MasterIndex system creates index entries by extracting not only key words, but also every piece of informa tion in the text of a document, as well as the relationships between the information . These index entries can then be gathered from multiple documents and consolidat ed into global indexes. 1.3 Is Oracle ConText a document management system, proofreader, or text ret rieval application? No. Oracle ConText is an application-independent component technology that neither stores or manages online text, nor corrects grammar and spelling in text. The power of Oracle ConText's language processing would be underutilized on simple text retrieval tasks such as pattern matching or keyword searches. Rather, Oracle ConText determines the meaning and themes contained in the text of your documents. 1.4 How do I use Oracle ConText? Oracle ConText's collection of advanced linguistic functions can be integrated, through a C language application programmer's interface (API), with any system dealing with text. Using these functions, you can add an intelligent layer of language-processing to the tasks performed by text retrieval, document manag ement, proofreading, and other document automation tools. Please note that Oracle ConText does not provide components for integrating the ConText functions with another system; you must supply the necessary compon ents. And because experienced linguists and C programmers are needed to build these components, you must be an approved partner before beginning development with Oracle ConText. However, you can use Oracle TextServer3, Oracle's first fully-integrated text management solution, to access many of the ConText functions without any knowle dge of linguistics or C programming. Formerly SQL*TextRetrieval, Oracle TextServer3 provides powerful tools from Oracle's Cooperative Development Environment (CDE) and Cooperative Server Technology (CST) for building client-server applications that can store text in a database and retrieve the information using predefined synonym families, sound-alike words, and structured fields. Oracle ConText provides Oracle TextServer3 with a number of read-to-use, enhanc ed text retrieval and viewing capabilities, including: - filtering of words that Oracle TextServer3 tracks in a document so that the most thematically prominent words and phrases are used for more precise and quicker retrieval of documents. - document querying by theme, rather than simply by key words. - speed-reading and summarizing of the documents in a database (thematic outpu t is stored in the database along with each document). For more information about TextServer3, available in the Fall of 1994, please consult Oracle TextServer3 documentation. 1.5 What are some applications of the Oracle ConText technology? Some practical applications of Oracle ConText include: - Automatic summaries for online documents and mail messages. Fixed-width sub ject fields and fixed-length file naming restrictions often prevent the subject line of a message or the file name of a document from accurately reflecting the content of the text. Oracle ConText can quickly analyze the text of a docum ent or message to produce more meaningful and intelligent access methods for the text. It can also summarize the contents of a document or message for quicker reading or review. Summarized information can be critical for accessing documents or messages over costly modem or wireless connections. - Automatic evaluation and forwarding of electronic mail messages. ConText ca n read a message, determine the main themes, and pass that information to an electronic mail system for automatic forwarding to the appropriate recipients. - Automatic hypertext linking in online information. Oracle ConText can deter mine which sections and words in a document are thematically related and identify the exact position of these words. Then, an online document design application can automatically add the hypertext links. - Intelligent information extraction. Because it understands the text it proc esses, Oracle ConText can enable information-gathering agencies, such as online news services or government agencies, to build advanced applications for tracking and extracting specific information and trends. 1.6 How does Oracle ConText differ from other language processing technology ? Systems that attempt to process text typically rely more on word recognition and repetition than any true understanding of the text. Oracle ConText represen ts a new paradigm in language analysis. Oracle ConText focuses on grammatical content and theme to determine the actual meaning of the text it processes. It recognizes that the position and role of a word, more than the repeated occurrence of the word, influences how the word contributes to the meaning of the surrounding text. In effect, it determin es meaning in text by answering such questions as: - "What grammatical elements are present in the text?" - "What grammatical and thematic relationships exist between the individual el ements?" - "Does an element contribute to the main idea of the sentence, or does it pro vide supporting detail for the main ideas?" - "Within the context of the surrounding text, how do the elements contribute to the development of theme?" 1.7 Does Oracle ConText have any competitors? For the most part, no. There is no other technology commercially available today that matches Oracle ConText's ability to process text and understand the themes and concepts contained in the text. And integrated with document summarization, indexing, viewing, retrieval, or navigation tools, Oracle ConTex t can create "language-intelligent" applications with abilities beyond most stand ard applications. For example, a standard text retrieval system usually relies on a "brute force" statistical approach, tracking or "indexing" every word in the text, then count ing the occurrences of each word or phrase to determine the "key words" for the text. You can then specify these key words when searching for and retrieving text. Oracle TextServer3 provides a powerful text management system for quickly and easily accessing text stored in a database. It utilizes the same methodology as a standard text retrieval system; however, it does not rely solely on word repetition for querying and retrieving text. Oracle ConText, with its content- and theme-based language analysis, provides TextServer3 with enhanced retrieval features, such as query-by-theme and text reduction for intelligent text tracki ng, as well as advanced text viewing and summarizing capabilities. 1.8 What types of writing can Oracle ConText process? Oracle ConText is capable of analyzing hundreds of writing styles and types, ranging from highly structured, complex writing to more informal, simple writin g. It is extremely well suited for business, instructional, and technical communic ation. Some examples of the types of documents that Oracle ConText can analyze include : - newspaper articles - legal documents - patents and patent applications - technical and scientific journals - multiple-topic documents, such as encyclopedias and newspapers - electronic-mail messages Oracle ConText is not as well-suited for processing transcriptions of unstructu red spoken word, such as colloquial dialogue or casual conversation. This type of written communication often contains incomplete or rambling sentences that do not provide a clear, linear development of theme. In addition, ConText does not work well with non-natural languages such as computer programming languages. However, a technical manual containing examples of a computer programming language can be successfully analyzed if the examples are first removed. II. LANGUAGE ANALYSIS 2.1 How does Oracle ConText analyze language? Oracle ConText uses a linguistic routine that simulates the complex human proce ss that takes place when you read text. Because this process is so complex, Oracle ConText does not rely on a single linguistic approach to arrive at its understa nding of the text. Instead, it uses what can best be described as a "working" approac h, combining principles and rules from a variety of diverse linguistic theories to produce the best overall results. Beginning with the smallest grammatical unit, individual words or word phrases, ConText identifies the grammatical function of each word in a sentence, taking into account the word's placement in the sentence and its relationships, or bindings, to the surrounding words. It then determines the thematic function, if any, of the word in the sentence. These grammatical and thematic assessments provide the basis for ConText's analysis. As it encounters successively larger text blocks (sentences, paragraphs, or the whole document), ConText systematically expands its analysis to add the new information to its knowledge base. Using this method, ConText can identify informational content as it is introduced and can track the development of themes across sentences and paragraphs. 2.2 How does Oracle ConText know what is important to me, the reader? When analyzing themes in text, it is often misleading to try to determine "impo rtance" as it relates to the reader, because importance relies on knowledge of the reader's intent. Oracle ConText does not presume to know what is important to the individual reader. Instead, it weighs the thematic prominence of a piece of text as it relates to the understanding of the text as a whole. A piece of text is importa nt only within the boundaries of the surrounding text and only when it provides insight into the meaning of the text. In addition, programmable settings in the API allow you to customize Oracle ConText. Through these settings, you can ensure that the specific information that you are interested in extracting from your text is always assigned the proper thematic prominence. 2.3 Where does Oracle ConText get the vocabulary necessary for processing la nguage? Oracle ConText gets its "knowledge" of the English language from the Oracle ConText lexicon -- an extensive, dictionary-like collection of more than 600,00 0 words and phrases, with up to 1,000 units of linguistic knowledge, called bindi ngs, for each word. 2.4 Will Oracle ConText recognize the vocabulary used in my industry? For the most part, yes. The ConText lexicon includes many of the terms and phrases used in more than 1,000 industries and fields of study. While the cover age in a particular area may not be extensive, the lexicon provides broad coverage of such diverse subjects as pharmaceutical manufacturing, aviation, finance, ornithology, and hair care. The lexicon also provides extensive coverage of geographical areas, government agencies, company names (with types of business), and product names (with types of product). And the lexicon is continually updated and enhanced to reflect the latest trends and developments in every subject or area. 2.5 Can I add words to the Oracle ConText lexicon? Currently, no. However, future releases of Oracle ConText will include a tool for creating user dictionaries that can be used in conjunction with the embedde d lexicon. In a user dictionary, you will be able to define the specific words and phrases that you want ConText to recognize. You will also be able to use a user diction ary to customize the behavior, thematic value, index properties, and conceptual family assigned to existing words and phrases in the lexicon. 2.6 What happens when Oracle ConText cannot recognize a word or phrase? Oracle ConText does not delete or ignore a word or word phrase that is not included in the system lexicon. Instead, Oracle ConText assigns greater themati c prominence to the word as a safeguard against the word being mishandled. As a result of the word's increased thematic prominence, the word may appear as one of the themes that Oracle ConText extracts from the surrounding text. For many applications, the function of the word is more important than its precise meaning. Most domain specific words are either simple nouns or regular verbs whose function is easily recognized by ConText. 2.7 Does Oracle ConText recognize British English? Yes. Most grammar and spelling variations between British English and American English are not substantial enough to affect Oracle ConText's parsing. However, Oracle ConText has been designed to properly recognize and account for those variations that might have an effect. For example, the lexicon currently recogn izes most British spelling versions, such as labour and honour, and processes them identical to the American spellings. Furthermore, two of the primary grammatical references used in the development and testing of ConText, The Grammar for Contemporary English and The Oxford English Dictionary, were written by British authors. III. LINGUISTIC FEATURES 3.1 What information about text does Oracle ConText generate? Oracle ConText produces four main types of output: - theme information - grammatical analysis - statistical analysis - indexing/content information A detailed description of each type of output, along with possible uses, is provided in the following questions. 3.2 What type of theme information does Oracle ConText produce? Oracle ConText extracts two types of thematic information from the text it processes: - Theme Grading. The 16 theme gradings identify the function and importance of each word within the context of the containing sentence. You can use this information to reduce sentences to their main thematic elements for creating document outlines, summaries, and specialized views of the original text. If you combine Oracle ConText with a full-text retrieval system, such as Oracle TextServer3, the system can make use of theme grading information to improve the precision of its searches and the accuracy of its relevance ranking. - Theme Profiles. A theme profile identifies the strongest themes contained in each sentence in a paragraph, each paragraph in a document, or in the docume nt as a whole. Oracle ConText also generalizes or abstracts the themes that it identifies to create concept categories. For example, Oracle ConText abstracts the word font to the concept printing and assigns a value to the theme printing. Oracle ConText increases the value assigned to the printing theme if the document contains other words, such as typeface, that belong to the printing concept category. The result of this process is a list of 16 theme/concept words which an applica tion can use to classify or rank themes according to programmable criteria. You can use these classifications to automate document routing, build document synopses, and intelligently search on and retrieve documents, as well as in many other document automation applications. 3.3 Can I affect the theme information generated by Oracle ConText? Yes. Using over 40 programmable settings provided with the ConText API, you can create custom theme profiles for text processed through Oracle ConText. The settings allow you to specify that words with certain grammatical or themat ic characteristics (including theme grading) should be thematically highlighted or suppressed, as well as specify the degree of thematic prominence assigned to these words. And in future releases, you will be able to create user dictionaries for modify ing the attributes of an individual word or adding your own terms to ConText's knowledge base. 3.4 What grammatical analysis does Oracle ConText provide? Oracle ConText returns a comprehensive assessment of the grammatical content, writing style, and general readability level of the sentences, paragraphs, and documents it processes. In addition, it can identify grammatical errors in sentences, providing up to 30 error messages (from a dictionary of over 300 messages) per sentence. You can use this information to build a full grammar checker capable of evaluat ing the content and meaning of sentences and identifying poorly-written or potentia lly ambiguous text, as well as identifying grammatical errors. Most standard gramma r checkers are limited by a rigid set of grammatical rules that focus on local groups of words rather than full sentences, which often results in the grammar checker missing the "point" of the text. You can also use this grammatical output to rank documents according to their level of readability. For example, after you use theme profiles to identify a set of documents with the same or similar thematic content, you can compare ConText's grammatical assessment for each document to select the most clearly written and easily understood document. 3.5 What statistical analysis does Oracle ConText provide? Oracle ConText generates up to 16 different theme statistics for each sentence, paragraph, or document it processes. These statistics provide a numeric measure ment of the overall thematic/grammatical content and structure of a text block. For example, one statistic determines the amount of "filler" in text by calcula ting the ratio of theme words to non-theme, or function, words in the text. Other statistics measure such characteristics as theme concept, strength, and ambigui ty. Yet another statistic measures the percentage of sentences in a text block that have grammatical errors or ambiguities, thus providing a quantitative assessment of the grammatical composition of the text. Theme statistics can be used to identify specific problems when dealing with text that is unedited, grammatically or stylistically poor, ambiguous, or conta ins other such problems. They can also be used to rank documents according to their theme characteristics or grammatical composition. 3.6 What indexing information does Oracle ConText generate? Oracle ConText's MasterIndex identifies every important piece of information in a document, including concepts, definitions, actions and actors, and keyword s, and extracts the information for structured storage or presentation. In effect, the information produced by MasterIndex represents a normalized, structured listing of the contents of a text block. Oracle ConText's indexing capabilities should not be confused with the indexing functions found in a standard text retrieval system. The index generated by a standard text retrieval system is usually a simple listing of every word in the text, whereas the indexing information generated by MasterIndex lists all the thematically relevant and information-bearing words in the text and describes the relationships between the words. MasterIndex output can be used to: - automatically create a back-of-book style of index for a single document or global indexes for multiple documents. - populate databases with structured content information. - enable intelligent information-extraction agents to track specific informati on and trends. In upcoming releases, Oracle Book, Oracle's online multimedia viewing tool, will use MasterIndex to automatically create hyperlinked, back-of-book indexes for Oracle Book documents. 3.7 Does Oracle ConText provide parts of speech or parse trees? To some degree, yes. MasterIndex provides a "thematic parse" of the information in sentences, including the Actor, Action, Object, etc. This is similar to a full parse, but certain adjectives, adverbs, or other weak sentence elements that do not materially add to a sentence's theme are not included. However, ConText's advanced analysis of semantic relationships gives more information than a simple part of speech model. IV. PROCESSING TEXT 4.1 Does Oracle ConText rewrite documents? No. Oracle ConText does not alter any of the text it processes. Instead, it produces its output as an array of theme, grammar, statistic, and index informa tion that is separate from the original text. You can apply this output to the origi nal text, either directly or through a user interface, to present a different versi on or view of the text, but the original text remains unchanged. Oracle ConText may include words in its output, in the form of nominals and concepts, which do not appear in the original text. A nominal is the noun form for a word. If the word is a noun, the nominal is simply the pluralized form of the word. For example, swim nominalizes to swimming, while swimmer nominaliz es to swimmers. Concept words provide a higher-level categorization or "generalization" for the words with which they are associated. For example, ConText abstracts the word font to the concept printing. If ConText determines that the text containi ng the word font significantly develops the topic of printing, it may return print ing as one of the themes for the text. 4.2 What type of text input does Oracle ConText require? Oracle ConText requires English text in ASCII format. Documents in other format s must be filtered into ASCII before being processed through Oracle ConText. Of course, such a filter could be built into the components used to create a system that integrates with Oracle ConText. For instance, Oracle TextServer3 automatically handles all the filtering requirements for Oracle ConText. Also, because Oracle ConText analyzes text in blocks, each word (or word phrase ), sentence, and paragraph must be clearly identified. Each word or word phrase must be set off from other words by spaces, each sentence must start with a capitalized character or number and end with a valid punctuation mark, and all paragraph boundaries must be clearly marked (typically by one or more hard returns). Finally, the text should consist of complete sentences and paragraphs, presente d as a single text flow. The text may require some filtering to provide a smooth text flow and to remove non-text objects such as graphics, tables, text formatt ing and SGML tags, captions, footnotes, and electronic mail addresses. 4.3 Can Oracle ConText analyze text containing grammatical errors? Yes. Not all text is structured in complete, grammatically correct sentences. Oracle ConText compensates for grammatical errors by changing its clause-orient ed analysis and reduction style to a word- or phrase-oriented mode. Since analysis begins with the single word or word phrase (the smallest grammatical unit proce ssed by Oracle ConText), local judgements are often unaffected by errors elsewhere in a sentence. In addition, Oracle ConText recognizes over 10,000 of the most common misspelli ngs of words. When it encounters one of these misspelled words, it assigns the linguistic bindings for the correct spelling to ensure that the misspelled word is analyzed correctly for usage and function. It also returns a grammatica l error message showing the correct spelling. However, to ensure high quality output, application developers building an Oracle ConText system may want to combine a proofreading tool, such as Oracle CoAuthor, with the ConText components to correct spelling and usage errors before the text processed. 4.4 How does text get "into" and "out of" Oracle ConText? Oracle ConText is a component technology which does not include any modules for managing the input or output of text; it simply processes text input and generates results. You create the host program that provides the engine for passing text to Oracle ConText and gathering the results. A host program must call the ConText API, provide values for the required setti ngs, and pass text, one paragraph at a time, to Oracle ConText. You may also use the program to provide an interface for specifying the source of the text (usua lly a file) and instructions for processing the text. After ConText completes its analysis, the host program must gather and structur e the results, then direct the structured output to an application or other outpu t device, such as a file or monitor. The type and extent of output that the host program gathers, as well as the format (e.g. binary or ASCII) that the host program uses to present the output, should be dictated by the application or other device that receives the output. 4.5 How is the output from Oracle ConText presented? Each time a text block is processed, Oracle ConText returns the full range of theme, grammar, statistic, and index information extracted from the text. Oracle ConText does not manipulate this output in any fashion; it simply return s the output through an array of C language structures stored in memory. The host program that passes the text to ConText determines the method of prese ntation for the output information. You can build a host program that presents the information as markup for use in an application. Or, you could architect the host program to interpret the output information and produce a view, such as a summary, name list, or index, of the content of the original text. For example, the theme grading information for a document identifies the theme gradings assigned to each word. The host program could use this application-ind ependent markup information in a document viewing application, such as a speed-reader, to highlight words in a document according to their assigned theme grading. Or, the host program could use the theme grading information to present a readi ng summary of the document. The summary, containing only those words that were assigned specific theme gradings, could then be stored in an ASCII flat file. 4.6 What is the size of the output generated by Oracle ConText? Because Oracle ConText performs a full parse each time it processes a text block, the size of the output can be enormous. It is usually unnecessary, howev er, to retain the full array of information that Oracle ConText produces. The type and extent of output saved from the results of ConText's analysis should be dictated by the needs of the application. In effect, the application, or the host program that provides the output for the application, keeps only the infor mation it needs and discards the rest. For example, a simple application that uses theme profiles to sort and route documents would require the host program to retain only the 16 words or phrases that make up the document's theme profile. An application that provides a back- of-book index for a multiple-topic document, such as an encyclopedia, might require all of the indexing output from ConText, but not any of the grammatical or statistical output. 4.7 Can Oracle ConText output be stored? Yes. Once the host program extracts and structures the required information from ConText's vast output, the information can be stored in a variety of media including files, structured fields, and database tables. For example: - summaries and abstracts can be stored as file attachments to the original do cuments. - the themes of a document can be stored in a structured field outside the doc ument to serve as keywords for queries. - MasterIndex information, which includes definitions, transactions, and conce pts contained in a document, can be stored in database tables or other structured schemes. 4.8 Does a document have to be reprocessed each time I want a different type of output? No, provided the host program is architected to retain the necessary informatio n for the application that uses the output. Each time text is processed, Oracle ConText produces its full array of output and the host program that passes the text to Oracle ConText controls the type of information and level of detail returned in the output. Because Oracle ConText performs a full parse each time it is run, you should process a document as few times as possible and use the host program to store the level of output required for the applications you build. The more detailed and varied the output is that you store with each ConText parse, the fewer times the document needs to be processed. V. INTEGRATING WITH ORACLE CONTEXT 5.1 What components are required to integrate Oracle ConText? A typical Oracle ConText implementation makes use of the following components. The components can be combined to create a stand-alone system or they can be integrated with other applications to create a complete text management system. - Oracle ConText. A stand-alone set of functions released as an object librar y with a C language application program interface (API). Included in this compone nt are the lexicon and parsing rules that ConText uses to process text. In order to process text, a host program, which calls the ConText functions through the API, must be built. - Input. ASCII text that you want to process through Oracle ConText. The text is usually contained in a flat file. - Host program. A C program that calls the ConText API, provides values for the required settings, and passes text, one paragraph at a time, to the API. It also gathers and formats the output generated by ConText. - Output. The theme, grammatical, statistical, and indexing information gener ated by ConText and presented through an array of C structures. The host program gathers the output, formats it, and directs it to an output device, such as a user interface or flat file. The output can be formatted as binary code for interpretation by a user interface or as ASCII text for reading purposes. - User interface. An application that accesses the ConText output along with the original input text, and provides an interface for viewing and/or manipulat ing the input text. 5.2 Are development resources required to integrate Oracle ConText with an a pplication? Yes. Oracle ConText consists only of the API and the underlying functions. You must build the components required for creating a stand-alone Oracle ConTex t text application or for integrating Oracle ConText with other applications. Because of the considerable effort and expertise that are required to build these components, Oracle must approve all potential Oracle ConText users as development partners. To qualify as an approved partner, you must be able to devote the time and resources required to plan and build the necessary integrat ion components. Oracle Consulting Services, with its experienced Oracle ConText consultants, is available to help plan and build any system that integrates with Oracle ConText. In addition, Oracle TextServer3 provides access to a number of Oracle ConText's advanced linguistic functions, such as text summarization/reduction and theme extraction, without the need for linguistic resources or approved partner statu s. As TextServer3 indexes documents and stores them in the database, it automatica lly processes the text from the documents through ConText and stores the output in the database. You can then access this information through easy-to-use tools such as Oracle Forms and industry-standard SQL. 5.3 Do I need to know the C programming language to create components for Or acle ConText? Currently, yes. The host program must be written in C, then compiled for the ConText API. However, PL/SQL covers will be added to the API in future releases . 5.4 Is Oracle ConText built on the Oracle7 Server? No. Currently, Oracle ConText does not require the Oracle7 Server. However, future releases of Oracle ConText will make use of the Oracle7 Server. At that time, application developers will be able to access Oracle ConText output throu gh stored procedures and a number of other methods. In addition, Oracle TextServer3 with Oracle ConText provides full integration with the Oracle7 Server. 5.5 Does Oracle ConText support a client/server architecture? Yes. In a typical configuration, the ConText functions, API, and host program would be located on the server, while the application(s) that interpret the ConText output would reside on a client machine. The client and server could be connected directly, via a remote procedure call (RPC), or indirectly, via a database or other techniques. 5.6 Does Oracle ConText provide any example components? Yes. Oracle ConText 1.1 includes a number of working programs and applications for evaluation and demonstration purposes, including: - SpeedRead output processor. This sample host program processes text through ConText, gathers theme grading, profile, and statistics information for the text, and stores the results in a binary output file used by the SpeedRead text viewer. - SpeedRead text viewer. This sample application interprets the output file from the SpeedRead output processor to provide 5 customizable levels of reducti on for speed-reading and summarization of input text. It also displays the theme profile and statistics generated for the text. The viewer is available as part of Oracle ConText for SunOS and also as a stand -alone client application for Microsoft Windows. - ASCII output processor. This sample host program processes text through Con Text and returns the output, as ASCII text, to a standard output device, such as a monitor screen or flat file. The program provides access to the full range of thematic, grammatical, indexin g, and statistical output generated by Oracle ConText. Program parameters let you control the type of output and level of detail returned by the program. - Document Digest builder. This sample host program processes text through Co nText and gathers theme profile information for the entire text block and each paragr aph in the text. It then creates a digest of the text by selecting the paragraphs that best represent the overall themes in the text and returning these paragrap hs as ASCII output. Program parameters let you control the number of paragraphs selected and the method by which the theme profiles are matched to determine the most representa tive paragraphs. These relatively simple, stand-alone components are intended mostly for demonst rating Oracle ConText's language processing abilities; however, they can also serve as models for building the more complex components required for a full-text management system. In fact, the source code for the sample host programs is provided with Oracle Context 1.1 to help illustrate the structure of typical ConText components. VI. DEVELOPMENT PLANS 6.1 Is Oracle ConText being integrated with Oracle products? Yes. Oracle ConText is being integrated with a number of Oracle products, some of which will be available as soon as the Fall of 1994: - Oracle TextServer3. Oracle TextServer3 with ConText provides a robust, easy -to-use development platform for creating Oracle ConText-enabled text management system s. Oracle ConText provides text filtering to reduce the number of words tracked, or "indexed", by TextServer3 during the initial storage of a document. Text filtering allows for documents to be retrieved more quickly and with greater precision. Also, ConText's thematic analysis allows documents to be queried by theme. Once a document is retrieved, Oracle ConText provides summarization and highlig hting of the text in the document for speed#reading and quick review. In addition, indexing will be available in future releases to create new browsing/navigation methods within and between retrieved documents. - Oracle Book. Oracle ConText provides hyperlinked, back-of-book indexes, com plete with See and See also entries, for documents in Oracle Book, Oracle's online multimedia viewing tool. It also automatically generates a synopsis for an Oracle Book document and allow the user to navigate from any point in the synop sis to the corresponding point in the document. In addition, Oracle ConText provides users of Oracle Book Designer, the tool for creating Oracle Book documents, with the ability to specify words that should always/never be included in the Oracle ConText index. - Oracle Office. With Oracle ConText added to Oracle Office, Oracle's office scheduling and electronic-mail system, mail messages can be summarized to varyi ng levels of detail, and sorted, queried, and routed using the themes that ConText identifies for the message. Oracle Office with ConText can also minimize the costs incurred with modems or other expensive connections by summarizing messages before you read them. 6.2 What are the plans for the Oracle ConText API? While Oracle TextServer3 provides an easy-to-use, integrated solution for proce ssing and managing text, Oracle will continue to provide direct access to ConText's advanced linguistic functions through the C language API. Future releases of Oracle ConText will provide a suite of tools to facilitate application development and integration, including: - Oracle Cooperative Development Environment (CDE). Application developers wi ll be able to use a wide range of CDE tools, including Oracle Forms, SQL*Plus, PL/SQL, and Oracle Glue, to develop applications for accessing ConText function s and manipulating the output. - Microsoft Visual Basic for Windows. Application developers will be able to use many of Microsoft Visual Basic powerful development tools, including Window s API functions and VBX controls, to build Windows applications that integrate with Oracle ConText. 6.3 Will Oracle ConText be integrated with any non-Oracle products? Eventually, yes. Oracle is currently studying options for integration with third-party systems, however no integration has been planned yet. Oracle TextSe rver3, with its access to the Oracle7 database and ready-to-use text input, filtering, output storage, and retrieval functions, is available now for application devel opers and resellers who wish to integrate Oracle ConText with their text applications . Also, if you are an approved Oracle ConText development partner, the ConText functions and API are available for integrating Oracle ConText with any system that deals with text. And the enhancements being developed for future releases of Oracle ConText will make integration easier and more flexible. However, if you plan to integrate Oracle ConText with other systems, using either Oracle TextServer3 or the ConText API, you may wish to enlist Oracle Consulting Services to help design the integration and build the necessary components. 6.4 What type of National Language Support (NLS) does Oracle ConText provide ? Oracle ConText currently only supports English text. However, Oracle is develop ing plans for ConText-based modules capable of analyzing the grammatical structure and syntax of a number of non-English languages, including several major Europe an and Asian languages. While these "language-intelligent" modules will not have the full range of Oracle ConText's advanced linguistic functions or the complet e coverage of the system lexicon, they will provide new language-processing capab ilities for the Oracle7 Server and other Oracle products that support multiple language s. VII. GENERAL QUESTIONS 7.1 How long has Oracle ConText been in development? Originally developed as a tool for processing online documentation, ConText represents over 140 person-years of development, spanning a 20 year period. Employing many of the original architects and developers, Oracle has been activ ely developing Oracle ConText for the past 2 years. 7.2 On which platforms is Oracle ConText available? Oracle ConText is available in controlled release for the Sun UNIX platform. The release includes a sample text viewer, SpeedRead, for personal computers (PC) running Microsoft Windows. In addition, Oracle ConText is currently being ported to the Sequent UNIX platf orm, which will be available by Fall 1994. Plans for future ports include most major UNIX and PC platforms, including OS/2. 7.3 What are the system requirements for Oracle ConText? Implementing Oracle ConText requires the following: - 6 megabytes of memory (suggested minimum) - 35 megabytes of disk space Oracle ConText does not have any Oracle product dependencies. Please note that while it can be implemented stand-alone, Oracle ConText should be implemented as a component of Oracle TextServer3. Consult Oracle TextServer3 documentation for configuration requirements and details. 7.4 How fast does Oracle ConText perform? On a Sun SPARCstation 10, Oracle ConText can process approximately 4 kilobytes of text per second, which translates loosely to about 450 words per second. To increase throughput, you can run a host program for Oracle ConText on multip le processors connected to multiple text streams. It should be noted that requesting only a fraction of the output does not impro ve performance, since Oracle ConText performs a full parse each time it processes a text block. Oracle ConText extracts all the thematic and grammatical informat ion from the text and it is the host program which determines the amount of output to retain. 7.5 Can Oracle ConText analyze documents of any length? Yes. A longer document will take longer to process than a shorter one, but the length of a document has no effect on the ability of Oracle ConText to analyze the text in the document. In addition, because Oracle ConText performs a full parse each time it processe s a text block, the processing time required for a document does not increase exponentially as the document increases in length. In general, processing time scales in direct proportion to the length of the document. 7.6 What is included with Oracle ConText? When you purchase Oracle ConText, you receive the following components: - API and underlying functionality for Oracle ConText - sample programs that work with the API: + SpeedRead output processor + ASCII output processor + document digest builder - source code for sample programs - SpeedRead sample text viewer 7.7 How much does Oracle ConText cost? Please contact your account manager for Oracle ConText pricing information. _________________________________________________________________ [Oracle Home] [New Media Products] Oracle Home | New Media Products Copyright 1995 Oracle Corporation, 500 Oracle Parkway, Redwood Shores, California 94065. All rights reserved.