
This chapter provides an overview of the Oracle ConText Option.
The following topics are covered in this chapter:
What Users Need and Want
Businessmen and women need information to make business decisions, design and develop new products, conquer new markets, and do thousands of other tasks to run successful businesses. But before they can use the data, they have to find documents that contain relevant information, read through them, and determine which documents are most applicable to their needs.
What users want is a tool that simplifies these four steps:
- find documents that contain the information they need
- sift through them to find the information; eliminate documents that are not germane
- retrieve the information in a concise and useful form
- keep track of the information and update it when it changes
The Context Option Solution
Most of today's business data is not stored as structured data; it is stored as non-structured text in thousands of formats: letters, memos, manuals, reports, news articles, electronic mail, notes, messages, etc.
For many businesses, this huge volume of text is a vast, valuable and unmanageable information resource. Relevant documents are usually difficult to locate, hard to retrieve, and often impossible to digest. Oracle solves the text management problem with ConText Option.
ConText Option is built on the power and scalability of Oracle Universal Server. It uses advanced text analysis and retrieval technology to give users the exact information they need when they need it. With ConText Option, Oracle Universal Server is a complete solution for managing any data resource --relational, text, spatial, image, video, or audio--in any application, at any scale.
ConText Option manages unstructured text as quickly and as easily as structured data. It is an online text management system that uses SQL or PL/SQL to search through large volumes of text stored in either structured databases or system files.
Using ConText Option, developers can quickly and efficiently build mission-critical applications that provide hundreds or even thousands of concurrent users with fast, efficient access to text-based information. And, because text is now a supported datatype in the Oracle Universal Server, new applications and extensions to existing Oracle applications are quick and easy to build with standard tools.
Advantages of Oracle ConText Option
The advantages of ConText Option include:
- powerful text handling capabilities
- extensible framework for languages and formats
- database-quality architecture for managing text
- standards-based development environment
Powerful Text Handling Capabilities
Using ConText Option's advanced indexing, retrieval, reduction, and classification features, users pinpoint and access required textual information quickly and easily from large volumes of text data.
Extensible Framework for Languages and Formats
ConText Option's extensible framework easily integrates new languages, formats, specialized search engines and text processing services. This adaptability to new requirements preserves an enterprise's investment in its text storage and retrieval applications and provides a healthy environment for long-term application development and business growth.
ConText Option currently recognizes, indexes, and retrieves text for most of the NLS-compliant, single-byte languages (7-bit and 8-bit character sets). All of these langauges can be processed by the basic lexer provided with ConText Option.
ConText Option also supports query expansion, in the form of stemming Soundex, and fuzzy matching, for English and the following Western European languages: French, Spanish, Italian, German, and Dutch.
For multi-byte languages, ConText Option provides the following lexers: Japanese, Korean (BETA), and Chinese (BETA). The Japanese lexer is provided recognizes three of the Japanese writing systems: Kanji, Hiragana and Katakana.
Database-quality Architecture for Managing Text
Because ConText Option is fully integrated with Oracle's Release 7.3 Universal Database, users can manage text with the same reliability, scalability, security, integrity, fault tolerance, and administrative ease they expect from an enterprise-caliber relational database system.
Standards-based Development Environment
ConText Option takes full advantage of Oracle's standard interfaces and third party tools--Power Builder, SQL*Windows, OLE Automation tools, and Visual Basic, for example. By installing ConText Option on one or more servers, client tools like SQL*Plus, Oracle Forms and Pro*C can be used to access and manipulate text just as easily and efficiently as structured data.
While standalone text-retrieval products often burden developers with separate development environments, ConText Option treats text and relational data as peers and uses standard SQL to locate and retrieve relevant text information.
Text and Linguistic Features
ConText Option features that facilitate text management and retrieval include:
- text retrieval using SQL and PL/SQL in the Oracle7 server
- seamless handling of text documents differing in size, language, format, content and style
- identification and retrieval of relevant text using both boolean logic and statistical evaluation methods
- parallel processing for high performance access to large volumes of information stored in vast collections of text documents
- in-depth linguistic analysis of English-language text for querying documents by theme and automatic consolidation of large volumes of text into easily readable, accurate, and relevant summaries
Linguistic Analysis
ConText Option provided a sophisticated natural language parser that can analyze English-language text and return detailed thematic information about the text. This theme information can be used in two very distinct and powerful ways to manipulate text:
- querying documents based on themes
- viewing documents by their themes and thematic content
Note: The Linguistic Services are only available for English-language documents.
Theme Queries
Theme queries provide a powerful alternative or extension to text queries. In a text query, the occurrence of a word in a document is sufficient for the document to be returned in the results of the query. However, this type of query may generate more hits than the user wants.
Theme queries let the user search for documents based on the main ideas or concepts in the documents. In a theme query, only those documents in which a particular topic was sufficiently developed to be classified as a document-level theme are returned.
Theme Viewing
Themes and thematic content (Gists) can be generated on a per document basis through the Linguistic Services. This information can then be used to view documents by their themes, as well as their thematically-relevant paragraphs.
The application developer uses the Linguistics Services to create various levels of shorter abstracts that the user can use to quickly review the essential content of documents and determine their relevance.
Who Are the Players?
The individuals involved in developing, supporting, maintaining and using ConText Option facilities are:
- database administrator (DBA)
- ConText Option administrator (if other than the DBA)
End User
An end user is the individual or organization that uses an application to locate, retrieve, and read text. The End User defines the data or information requirements that must be satisfied by the application. The End User also defines the document environment from which text will be selected.
Application Developer
The application developer designs the application, defines the environment required to support the application, works with the System Administrator to create the environment, and writes the programs and procedures that satisfy user requirements.
Database Administrator
The database administrator maintains the Oracle system facilities, the databases, and the system environment that supports a ConText application.
ConText Option Administrator
The ConText Option administrator maintains the ConText Option environment that supports text applications, for example the policies and preferences that define text columns and text indexes.
Creating the Text Processing Environment
The collection of text to be managed must be stored in an environment that is accessible to Oracle and ConText Option either as columns in an Oracle database or as pointers to system files outside the database.
Documents must be properly loaded into the database (or identified by external pointers) and indexed before text/theme queries can be executed.
In addition, linguistic output must be generated for each document before the linguistic information can be viewed for the documents.
To index a document or generate linguistic output for the document, the column storing the document must be defined as a text column. ConText Option recognizes a text column in a table if the column has one or more policies attached to it.
A table can contain more than one text column, but each text column requires a separate policy.
Figure 1. Overview of Text, Theme, and Linguistic Setup
The process of loading documents, defining text columns, and creating ConText indexes for the columns is documented in the Oracle ConText Option Administrator's Guide.
In particular, the Oracle ConText Option Administrator's Guide explains how to:
- specify document attributes
- register preferences and policies
- create a theme index (English-language text only)
The process for generating linguistic output through the Linguistic Services is documented in "Using the Linguistic Services (Chapter 7)."