Oracle Context Option Application Developer's Guide Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

CHAPTER 4. Theme Queries


This chapter describes how to perform theme queries. The following topics are covered:

Creating a Theme Index

To execute theme query on a document, you must create a theme index. To do so, specify the THEME_LEXER as the lexer preference when you create the policy for the text column. For example:

execute ctx_ddl.create_policy('THEME_POLICY',\
'table1.text', lexer_pref => 'CTXSYS.THEME_LEXER');

Note: ConText Option supports theme indexing and queries for English language documents only.

For more information about creating theme indexes, see Oracle ConText Option Administrator's Guide.

Document Signatures

When you create a theme index for a document, ConText Option creates a document signature that contains at most 16 themes. Each theme in the document signature has a theme vector associated with it that defines the theme as part of a hierarchy.

For example if two themes in a document are computer software and telephones, ConText Option might generate the corresponding theme vectors with the following theme tokens and weights:

	Theme Vector 1 				Weight
	science and technology			40
		computer industry				40
			computer software			40
	Theme Vector 2				Weight
	science and technology			30
		communications				30
			telecommunications industry	30
				telephones			30

Theme Token Names

When ConText interprets a document to create the theme index, theme token names are derived from the standard names and categories in the knowledge hierarchy. Theme tokens in the index represent concepts in the document that might appear exactly like the token, as alternate forms of the word, or as a semantically related concept. For example, the canonical form Oracle Corporation might represent Oracle and Oracle Corp in the document.

For more information about the Knowledge Catalog, see "Linguistic Core" in "Linguistic Concepts (Chapter 6)."

Theme Weight

The theme weight is a measure of the strength of a theme relative to the other themes in a document. Weights are associated with theme vectors, and thus theme tokens within the same theme vector have the same weight.

For example, the tokens telephones and communications in Theme Vector 2 have the same weight of 30. When you issue a theme query, ConText uses theme weights to score hits.

Using Theme Queries

For theme queries, you specify a query string, which can be a sentence or a phrase. ConText interprets your query, creating a normalized form of your query that it can use to match against document signatures. Context returns a list of documents that satisfy the query, based on certain rules, along with a score of how relevant each document is to the query.

Two-Step Query

To execute a theme query with the CTX_QUERY.CONTAINS procedure, you must specify a policy that has a theme lexer associated with it.

For example, you specify a theme query on computer software as follows:

execute ctx_query.contains('THEME_POL', 'computer software', 'CTX_TEMP');

In the above example, ConText generates theme vectors for the query computer software, which ConText attempts to match with document signatures in the theme index.

When a match is found, ConText uses the weight of the matched theme to compute a score that reflects how relevant the match is to the query; the higher the score, the more relevant the hit. ConText returns the matched document as part of the hitlist.

For example, if you issue a theme query with a token of computer software, ConText Option might return a match on a document that has a theme vector as follows:

	Science and Technology			40
		Computer Industry				40
			Computer Software			40

Likewise if you issued a query for the token science and technology, ConText Option would return the above document; however, performing a query on a broad term like science and technology would likely return a larger and more vague hitlist.

One-step Query

You can execute theme queries using the one-step method in SQL*Plus. The way in which ConText matches theme signatures, scores hits, and returns documents is the same as in a two-step query.

For example, to execute a theme query on computer software:

SELECT * FROM TEXTAB
WHERE CONTAINS (text, 'computer software') > 0

Multiple Policies

For a text column that has more than one policy associated with it, you must specify which policy to use in the CONTAINS clause. You might create two policies for a column when you want to perform both theme and text queries on the column.

For example, if the column text had a regular text policy and a theme policy THEMEPOL associated with it, you would do a theme query as follows:

SELECT ID, SCORE(0) FROM TEXTAB
WHERE CONTAINS (text, 'computer software', 0, 'THEMEPOL') > 0

When you need to specify policy in the CONTAINS function as in this example, you must also specify a placeholder, in this case 0, for the LABEL parameter.

For more information about using the policy hint parameter in the CONTAINS function, see "CONTAINS" in "SQL Functions (Chapter 10)".

Case-sensitivity

Unlike regular text queries, theme queries are case-sensitive. For example, doing a query on the common noun turkey, which describes a type of bird, will not produce a hit on the proper noun Turkey, which describes a country.

Ambiguous Queries

An ambiguous word or phrase is one that is vague or contains very little information. If your query contains an ambiguous term, ConText returns an error. An example of an ambiguous query term is the word images or the phrase good times.

Using Operators with Theme Queries

In theme queries, the following operators have the same semantics as with regular text queries:

Examples

Some valid query strings using operators are as follows:

contains(text, 'telephones & {computer industry}') > 0
contains(text, 'telephones*3 & {computer software}*.5 > 50') > 0

Thesaurus Operators

In a theme query, the thesaurus operators (SYNONYM, BROADER TERM, NARROWER TERM etc.) work the same way as in a regular text query, provided a thesaurus has been created/loaded.

Grouping Characters

In theme query expressions, the grouping characters (), {}, [] have the same semantics as with a regular text query.

Wildcard Characters

In theme query expressions, the wildcard characters (%, _) work the same way as in regular text queries.

Note: There is a risk of ambiguity when using the wildcard character. For example, doing a theme query on %court% might return documents that have a theme of 'court of law' or 'tennis court'.

Unsupported Operators

ConText does not support the following query expression operators with theme queries:




Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index