
CHAPTER 4. Theme Queries
This chapter describes how to perform theme queries. The following topics are covered:
Creating a Theme Index
To execute theme query on a document, you must create a theme index. To do so, specify the THEME_LEXER as the lexer preference when you create the policy for the text column. For example:
execute ctx_ddl.create_policy('THEME_POLICY',\
'table1.text', lexer_pref => 'CTXSYS.THEME_LEXER');
Note: ConText Option supports theme indexing and queries for English language documents only.
For more information about creating theme indexes, see Oracle ConText Option Administrator's Guide.
Document Signatures
When you create a theme index for a document, ConText Option creates a document signature that contains at most 16 themes. Each theme in the document signature has a theme vector associated with it that defines the theme as part of a hierarchy.
For example if two themes in a document are computer software and telephones, ConText Option might generate the corresponding theme vectors with the following theme tokens and weights:
Theme Vector 1 Weight
science and technology 40
computer industry 40
computer software 40
Theme Vector 2 Weight
science and technology 30
communications 30
telecommunications industry 30
telephones 30
Theme Token Names
When ConText interprets a document to create the theme index, theme token names are derived from the standard names and categories in the knowledge hierarchy. Theme tokens in the index represent concepts in the document that might appear exactly like the token, as alternate forms of the word, or as a semantically related concept. For example, the canonical form Oracle Corporation might represent Oracle and Oracle Corp in the document.
For more information about the Knowledge Catalog, see "Linguistic Core" in "Linguistic Concepts (Chapter 6)."
Theme Weight
The theme weight is a measure of the strength of a theme relative to the other themes in a document. Weights are associated with theme vectors, and thus theme tokens within the same theme vector have the same weight.
For example, the tokens telephones and communications in Theme Vector 2 have the same weight of 30. When you issue a theme query, ConText uses theme weights to score hits.
Using Theme Queries
For theme queries, you specify a query string, which can be a sentence or a phrase. ConText interprets your query, creating a normalized form of your query that it can use to match against document signatures. Context returns a list of documents that satisfy the query, based on certain rules, along with a score of how relevant each document is to the query.
Two-Step Query
To execute a theme query with the CTX_QUERY.CONTAINS procedure, you must specify a policy that has a theme lexer associated with it.
For example, you specify a theme query on computer software as follows:
execute ctx_query.contains('THEME_POL', 'computer software', 'CTX_TEMP');
In the above example, ConText generates theme vectors for the query computer software, which ConText attempts to match with document signatures in the theme index.
When a match is found, ConText uses the weight of the matched theme to compute a score that reflects how relevant the match is to the query; the higher the score, the more relevant the hit. ConText returns the matched document as part of the hitlist.
For example, if you issue a theme query with a token of computer software, ConText Option might return a match on a document that has a theme vector as follows:
Science and Technology 40
Computer Industry 40
Computer Software 40
Likewise if you issued a query for the token science and technology, ConText Option would return the above document; however, performing a query on a broad term like science and technology would likely return a larger and more vague hitlist.
One-step Query
You can execute theme queries using the one-step method in SQL*Plus. The way in which ConText matches theme signatures, scores hits, and returns documents is the same as in a two-step query.
For example, to execute a theme query on computer software:
SELECT * FROM TEXTAB
WHERE CONTAINS (text, 'computer software') > 0
Multiple Policies
For a text column that has more than one policy associated with it, you must specify which policy to use in the CONTAINS clause. You might create two policies for a column when you want to perform both theme and text queries on the column.
For example, if the column text had a regular text policy and a theme policy THEMEPOL associated with it, you would do a theme query as follows:
SELECT ID, SCORE(0) FROM TEXTAB
WHERE CONTAINS (text, 'computer software', 0, 'THEMEPOL') > 0
When you need to specify policy in the CONTAINS function as in this example, you must also specify a placeholder, in this case 0, for the LABEL parameter.
For more information about using the policy hint parameter in the CONTAINS function, see "CONTAINS" in "SQL Functions (Chapter 10)".
Case-sensitivity
Unlike regular text queries, theme queries are case-sensitive. For example, doing a query on the common noun turkey, which describes a type of bird, will not produce a hit on the proper noun Turkey, which describes a country.
Ambiguous Queries
An ambiguous word or phrase is one that is vague or contains very little information. If your query contains an ambiguous term, ConText returns an error. An example of an ambiguous query term is the word images or the phrase good times.
Using Operators with Theme Queries
In theme queries, the following operators have the same semantics as with regular text queries:
Examples
Some valid query strings using operators are as follows:
contains(text, 'telephones & {computer industry}') > 0
contains(text, 'telephones*3 & {computer software}*.5 > 50') > 0
Thesaurus Operators
In a theme query, the thesaurus operators (SYNONYM, BROADER TERM, NARROWER TERM etc.) work the same way as in a regular text query, provided a thesaurus has been created/loaded.
Grouping Characters
In theme query expressions, the grouping characters (), {}, [] have the same semantics as with a regular text query.
Wildcard Characters
In theme query expressions, the wildcard characters (%, _) work the same way as in regular text queries.
Note: There is a risk of ambiguity when using the wildcard character. For example, doing a theme query on %court% might return documents that have a theme of 'court of law' or 'tennis court'.
Unsupported Operators
ConText does not support the following query expression operators with theme queries: