Oracle7 Server Tuning Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

Transforming ORs into Compound Queries


Optimizer Concepts


This appendix discusses various optimization issues, to supplement "The Optimizer" chapter in Oracle7 Server Concepts. Topics include the following:

Evaluating Expressions and Conditions


The optimizer fully evaluates expressions whenever possible and translates certain syntactic constructs into equivalent constructs. This is done either because Oracle can more quickly evaluate the resulting expression than the original expression or because the original expression is merely a syntactic equivalent of the resulting expression. Since there are different SQL constructs that can operate identically (for example: = ANY (subquery) and IN (subquery)), Oracle maps these to a single construct.

Constants Any computation of constants is performed only once when the statement is optimized rather than each time the statement is executed. Consider these conditions that test for monthly salaries greater than 2000:

sal > 24000/12 

sal > 2000 

sal*12 > 24000 

If a SQL statement contains the first condition, the optimizer simplifies it into the second condition.

Note that the optimizer does not simplify expressions across comparison operators. The optimizer does not simplify the third expression into the second. For this reason, application developers should write conditions that compare columns with constants whenever possible, rather than conditions with expressions involving columns.

LIKE The optimizer simplifies conditions that use the LIKE comparison operator to compare an expression with no wildcard characters into an equivalent condition that uses an equality operator instead. For example, the optimizer simplifies the first condition below into the second:

ename LIKE 'SMITH' 

ename = 'SMITH' 

The optimizer can simplify these expressions only when the comparison involves variable-length datatypes. For example, if ENAME was of type CHAR(10), the optimizer cannot transform the LIKE operation into an equality operation due to the comparison semantics of fixed-length datatypes.

IN The optimizer expands a condition that uses the IN comparison operator to an equivalent condition that uses equality comparison operators and OR logical operators. For example, the optimizer expands the first condition below into the second:

ename IN ('SMITH', 'KING', 'JONES') 

ename = 'SMITH' OR ename = 'KING' OR ename = 'JONES' 

ANY or SOME The optimizer expands a condition that uses the ANY or SOME comparison operator followed by a parenthesized list of values into an equivalent condition that uses equality comparison operators and OR logical operators. For example, the optimizer expands the first condition below into the second:

sal > ANY (:first_sal, :second_sal) 

sal > :first_sal OR sal > :second_sal 

The optimizer transforms a condition that uses the ANY or SOME operator followed by a subquery into a condition containing the EXISTS operator and a correlated subquery. For example, the optimizer transforms the first condition below into the second:

x > ANY (SELECT sal 

		FROM emp 		

		WHERE job = 'ANALYST') 

EXISTS (SELECT sal 

		FROM emp 

		WHERE job = 'ANALYST' 		  

		  AND x > sal) 

ALL The optimizer expands a condition that uses the ALL comparison operator followed by a parenthesized list of values into an equivalent condition that uses equality comparison operators and AND logical operators. For example, the optimizer expands the first condition below into the second:

sal > ALL (:first_sal, :second_sal) 

sal > :first_sal AND sal > :second_sal 

The optimizer transforms a condition that uses the ALL comparison operator followed by a subquery into an equivalent condition that uses the ANY comparison operator and a complementary comparison operator. For example, the optimizer transforms the first condition below into the second:

x > ALL (SELECT sal 

		FROM emp 		
WHERE deptno = 10)

NOT (x <= ANY (SELECT sal 

			FROM emp 			
WHERE deptno = 10) )

The optimizer then transforms the second query into the following query using the rule for transforming conditions with the ANY comparison operator followed by a correlated subquery:

NOT EXISTS (SELECT sal 

			FROM emp 

			WHERE deptno = 10
AND x <= sal)

BETWEEN The optimizer always replaces a condition that uses the BETWEEN comparison operator with an equivalent condition that uses the >= and <= comparison operators. For example, the optimizer replaces the first condition below with the second:

sal BETWEEN 2000 AND 3000 

sal >= 2000 AND sal <= 3000 

NOT The optimizer simplifies a condition to eliminate the NOT logical operator. The simplification involves removing the NOT logical operator and replacing a comparison operator with its opposite comparison operator. For example, the optimizer simplifies the first condition below into the second one:

NOT deptno = (SELECT deptno FROM emp WHERE ename = 'TAYLOR') 

deptno <> (SELECT deptno FROM emp WHERE ename = 'TAYLOR') 

Often a condition containing the NOT logical operator can be written many different ways. The optimizer attempts to transform such a condition so that the subconditions negated by NOTs are as simple as possible, even if the resulting condition contains more NOTs. For example, the optimizer simplifies the first condition below into the second and then into the third.

NOT (sal < 1000 OR comm IS NULL) 

NOT sal < 1000 AND comm IS NOT NULL 

sal >= 1000 AND comm IS NOT NULL 

Transitivity If two conditions in the WHERE clause involve a common column, the optimizer can sometimes infer a third condition using the transitivity principle. The optimizer can then use the inferred condition to optimize the statement. The inferred condition could potentially make available an index access path that was not made available by the original conditions.

Imagine a WHERE clause containing two conditions of these forms:

WHERE column1 comp_oper constant   

  AND column1 = column2 

In this case, the optimizer infers the condition

	column2 comp_oper constant 

where:
comp_oper

Is any of the comparison operators =, !=, ^=, <, <>, <, >, <=, or >=.

constant

Is any constant expression involving operators, SQL functions, literals, bind variables, and correlation variables.

Note: Transitivity is used only by the cost-based approach.

Example: Consider this query in which the WHERE clause containing two conditions that each use the EMP.DEPTNO column:

SELECT * 

	FROM emp, dept 

	WHERE emp.deptno = 20 	  

	  AND emp.deptno = dept.deptno;

Using transitivity, the optimizer infers this condition:

dept.deptno = 20 

If there is an index on the DEPT.DEPTNO column, this condition makes available access paths using that index.

Note: The optimizer only infers conditions that relate columns to constant expressions, rather than columns to other columns. Imagine a WHERE clause containing two conditions of these forms:

WHERE column1 comp_oper column3   

  AND column1 = column2 

In this case, the optimizer does not infer this condition:

column2 comp_oper column3 

Transforming ORs into Compound Queries


Since SQL is such a flexible query language, there often are many statements you could formulate to achieve a given goal. Sometimes the optimizer transforms one such statement into another that achieves the same goal if the second statement can be executed more efficiently. This section discusses transforming queries containing ORs into compound statements containing UNION ALL.

If a query contains a WHERE clause with multiple conditions combined with OR operators, the optimizer transforms it into an equivalent compound query that uses the UNION ALL set operator if this will make it execute more efficiently:

For information on access paths and how indexes make them available, see "Choosing Access Paths" on page A-23 and the sections that follow it. For statements that use the cost-based approach, the optimizer may also use statistics to determine whether to make the transformation by estimating and then comparing the costs of executing the original statement versus the resulting statement.

Example: Consider this query with a WHERE clause that contains two conditions combined with an OR operator:

SELECT * 

	FROM emp 

	WHERE job = 'CLERK'

	  OR deptno = 10;

If there are indexes on both the JOB and DEPTNO columns, the optimizer may transform the query above into the equivalent query below:

SELECT * 

	FROM emp 

	WHERE job = 'CLERK' 

UNION ALL 

SELECT * 

	FROM emp 

	WHERE deptno = 10

	  AND job <> 'CLERK';

If you are using the rule-based approach, the optimizer makes this transformation because each component query of the resulting compound query can be executed using an index. The rule-based approach assumes that executing the compound query using two index scans is faster than executing the original query using a full table scan.

If you are using the cost-based approach, the optimizer compares the cost of executing the original query using a full table scan with that of executing the resulting query when deciding whether to make the transformation. The execution plan for the transformed statement might look like this:

Execution Plan for a Compound Query
To execute the transformed query, Oracle performs these steps:

If either of the JOB or DEPTNO columns is not indexed, the optimizer does not even consider the transformation because the resulting compound query would require a full table scan to execute one of its component queries. Executing the compound query with a full table scan in addition to an index scan could not possibly be faster than executing the original query with a full table scan.

Example: Consider this query and assume there is an index on the ENAME column:

SELECT * 

	FROM emp 

	WHERE ename = 'SMITH'

	  OR sal > comm;

Transforming the query above would result in the compound query below:

SELECT * 

	FROM emp 

	WHERE ename = 'SMITH' 

UNION ALL 

SELECT * 

	FROM emp 

	WHERE sal > comm;

Since the condition in the WHERE clause of the second component query (SAL > COMM) does not make an index available, the compound query requires a full table scan. For this reason, the optimizer does not consider making the transformation and chooses a full table scan to execute the original statement.

Transforming Complex Statements into Join Statements


The optimizer transforms a complex statement into a join statement whenever such a transformation results in a join statement that is guaranteed to return exactly the same rows as the complex statement. This transformation allows Oracle to execute the statement by taking advantage of join optimization techniques described in the section "Optimizing Join Statements" on page A-37.

Consider this complex statement that selects all rows from the ACCOUNTS table whose owners appear in the CUSTOMERS table:

SELECT * 

	FROM accounts 

	WHERE custno IN 

		(SELECT custno FROM customers);

If the CUSTNO column of the CUSTOMERS table is a primary key or has a UNIQUE constraint, the optimizer can transform the complex query into this join statement that is guaranteed to return the same data:

SELECT accounts.* 

	FROM accounts, customers

	WHERE accounts.custno = customers.custno;

The execution plan for this statement might look like the following figure:

Execution Plan for a Nested Loops Join
To execute this statement, Oracle performs a nested loops join operation. For information on nested loops joins, see the section "Join Operations" on page A-37.

If the optimizer cannot transform a complex statement into a join statement, the optimizer chooses execution plans for the parent statement and the subquery as though they were separate statements. Oracle then executes the subquery and uses the rows it returns to execute the parent query.

Consider this complex statement that returns all rows from the ACCOUNTS table that have balances that are greater than the average account balance:

SELECT * 

	FROM accounts 

	WHERE accounts.balance > 

		(SELECT AVG(balance) FROM accounts);

There is no join statement that can perform the function of this statement, so the optimizer does not transform the statement. Note that complex queries whose subqueries contain group functions such as AVG cannot be transformed into join statements.

Optimizing Statements that Access Views


To optimize a statement that accesses a view, the optimizer chooses one of these alternatives:


Transforming Statements that Access Views

To transform a statement that accesses a view into an equivalent statement that accesses the view's base tables, the optimizer can use one of these techniques:

The optimizer then optimizes the resulting statement.

Merging the View's Query into the Accessing Statement To merge the view's query into the accessing statement, the optimizer replaces the name of the view with the name of its base table in the accessing statement and adds the condition of the view's query's WHERE clause to the accessing statement's WHERE clause.

Example: Consider this view of all employees who work in department 10:

CREATE VIEW emp_10 
AS SELECT empno,ename, job, mgr, hiredate, sal, comm, deptno
FROM emp
WHERE deptno = 10;

Consider this query that accesses the view. The query selects the IDs greater than 7800 of employees who work in department 10:

SELECT empno 
FROM emp_10
WHERE empno > 7800;

The optimizer transforms the query into the following query that accesses the view's base table:

SELECT empno 
FROM emp
WHERE deptno = 10
AND empno > 7800;

If there are indexes on the DEPTNO or EMPNO columns, the resulting WHERE clause makes them available.

Merging the Accessing Statement into the View's Query The optimizer cannot always merge the view's query into the accessing statement. Such a transformation is not possible if the view's query contains

To optimize statements that access such views, the optimizer can merge the statement into the view's query.

Example: Consider the TWO_EMP_TABLES view, which is the union of two employee tables. The view is defined with a compound query that uses the UNION set operator:

CREATE VIEW two_emp_tables 

	(empno, ename, job, mgr, hiredate, sal, comm, deptno) AS 

	SELECT empno, ename, job, mgr, hiredate, sal, comm,
deptno FROM emp1 UNION

	SELECT empno, ename, job, mgr, hiredate, sal, comm,
deptno FROM emp2;

Consider this query that accesses the view. The query selects the IDs and names of all employees in either table who work in department 20:

SELECT empno, ename 

	FROM two_emp_tables

	WHERE deptno = 20;

Since the view is defined as a compound query, the optimizer cannot merge the view query into the accessing query. Instead, the optimizer transforms the query by adding its WHERE clause condition into the compound query. The resulting statement looks like this:

SELECT empno, ename FROM emp1 WHERE deptno = 20 

	UNION 

SELECT empno, ename FROM emp2 WHERE deptno = 20;

If there is an index on the DEPTNO column, the resulting WHERE clauses make it available. The following figure shows the execution plan of the resulting statement.

Accessing a View Defined with the UNION Set Operator
To execute this statement, Oracle performs these steps:

Example: Consider the view EMP_GROUP_BY_DEPTNO, which contains the department number, average salary, minimum salary, and maximum salary of all departments that have employees:

CREATE VIEW emp_group_by_deptno 

	AS SELECT deptno, 

	       AVG(sal) avg_sal, 

		MIN(sal) min_sal, 

	       MAX(sal) max_sal 

		  FROM emp

		  GROUP BY deptno;

Consider this query, which selects the average, minimum, and maximum salaries of department 10 from the EMP_GROUP_BY_DEPTNO view:

SELECT * 

	FROM emp_group_by_deptno

	WHERE deptno = 10;

The optimizer transforms the statement by adding its WHERE clause condition into the view's query. The resulting statement looks like this:

SELECT deptno, 

	AVG(sal) avg_sal, 

	MIN(sal) min_sal, 

	MAX(sal) max_sal, 

  FROM emp 

  WHERE deptno = 10

  GROUP BY deptno;

If there is an index on the DEPTNO column, the resulting WHERE clause makes it available.

The following figure shows the execution plan for the resulting statement. The execution plan uses an index on the DEPTNO column.

Accessing a View Defined with a GROUP BY Clause
To execute this statement, Oracle performs these operations:

Example: Consider this query, which accesses the EMP_GROUP_BY_DEPTNO view defined in the previous example. This query derives the averages for the average department salary, the minimum department salary, and the maximum department salary from the employee table:

SELECT AVG(avg_sal), AVG(min_sal), AVG(max_sal) 	

	FROM emp_group_by_deptno;

The optimizer transforms this statement by applying the AVG group function to the select list of the view's query:

SELECT AVG(AVG(sal)), AVG(MIN(sal)), AVG(MAX(sal)) 

	FROM emp

	GROUP BY deptno;

The following figure shows the execution plan of the resulting statement:

Applying Group Functions to a View Defined with a GROUP BY Clause
To execute this statement, Oracle performs these operations:


Optimizing Statements that Cannot Be Transformed

The optimizer cannot transform all statements that access views into equivalent statements that access base table(s). To execute a statement that cannot be transformed, Oracle issues the view's query, collects the resulting set of rows, and then accesses this set of rows with the original statement as though it were a table.

Example: Consider the EMP_GROUP_BY_DEPTNO view defined in the previous section:

CREATE VIEW emp_group_by_deptno 

	AS SELECT deptno, 

	      AVG(sal) avg_sal,
MIN(sal) min_sal,

	       MAX(sal) max_sal 

		  FROM emp 
GROUP BY deptno;

Consider this query, which accesses the view. The query joins the average, minimum, and maximum salaries from each department represented in this view and to the name and location of the department in the DEPT table:

SELECT emp_group_by_deptno.deptno, avg_sal, min_sal,
max_sal, dname, loc

	FROM emp_group_by_deptno, dept 
WHERE emp_group_by_deptno.deptno = dept.deptno;

Since there is no equivalent statement that accesses only base tables, the optimizer cannot transform this statement. Instead, the optimizer chooses an execution plan that issues the view's query and then uses the resulting set of rows as it would the rows resulting from a table access.

The following figure shows the execution plan for this statement.

Joining a View Defined with a Group BY Clause to a Table
To execute this statement, Oracle performs these operations:

For more information on how Oracle performs a nested loops join operation, see the section "Join Operations" on page A-37.

Choosing an Optimization Approach and Goal


The optimizer's behavior when choosing an optimization approach and goal for a SQL statement is affected by these factors:


OPTIMIZER_MODE Initialization Parameter

The OPTIMIZER_MODE initialization parameter establishes the default behavior for choosing an optimization approach for the instance. This parameter can have these values:
CHOOSE

This value causes the optimizer to choose between the rule-based approach and the cost-based approach based on whether the statistics used for the cost-based approach are present. If the data dictionary contains statistics for at least one of the accessed tables, the optimizer uses the cost-based approach and optimizes with a goal of best throughput. If data dictionary contains no statistics for any of the accessed tables, the optimizer uses the rule-based approach. This is the default value for the parameter.

RULE

This value causes the optimizer to choose the rule-based approach for all SQL statements issued to the instance regardless of the presence of statistics.

ALL_ROWS

This value causes the optimizer to use the cost-based approach for all SQL statements in the session regardless of the presence of statistics and to optimize with a goal of best throughput (minimum resource usage to complete the entire statement).

FIRST_ROWS

This value causes the optimizer to use the cost-based approach for all SQL statements in the session regardless of the presence of statistics and to optimize with a goal of best response time (minimum resource usage to return the first row of the result set).

Note that PL/SQL ignores the OPTIMIZER_MODE=FIRST_ROWS initialization parameter setting.

If the optimizer uses the cost-based approach for a SQL statement, and some tables accessed by the statement have no statistics, the optimizer uses internal information such as the number of data blocks allocated to these tables to estimate other statistics for these tables.


OPTIMIZER_GOAL Parameter of the ALTER SESSION Command

The OPTIMIZER_GOAL parameter of the ALTER SESSION command can override the optimization approach and goal established by the OPTIMIZER_MODE initialization parameter for an individual session. This parameter can have these values:
CHOOSE

This value causes the optimizer to choose between the rule-based approach and the cost-based approach based on whether the statistics used for the cost-based approach are present. If the data dictionary contains statistics for at least one of the accessed tables, the optimizer uses the cost-based approach and optimizes with a goal of best throughput. If the data dictionary contains no statistics for any of the accessed tables, the optimizer uses the rule-based approach.

ALL_ROWS

This value causes the optimizer to use the cost-based approach for all SQL statements in the session regardless of the presence of statistics and to optimize with a goal of best throughput (minimum resource usage to complete the entire statement).

FIRST_ROWS

This value causes the optimizer to use the cost-based approach for all SQL statements in the session regardless of the presence of statistics and to optimize with a goal of best response time (minimum resource usage to return the first row of the result set).

RULE

This value causes the optimizer to choose the rule-based approach for all SQL statements issued to the instance regardless of the presence of statistics.

The value of this parameter affects the optimization of SQL statements issued by stored procedures and functions called during the session, but it does not affect the optimization of recursive SQL statements that Oracle issues during the session. The optimization approach for recursive SQL statements is only affected by the value of the OPTIMIZER_MODE initialization parameter.

Use hints to determine the access path for SQL statements submitted from within PL/SQL. The ALTER SESSION OPTIMIZER_GOAL statement does not affect SQL that is run from within PL/SQL.


The FIRST_ROWS, ALL_ROWS, and RULE Hints

The FIRST_ROWS, ALL_ROWS, CHOOSE, and RULE hints can override the affects of the OPTIMIZER_MODE initialization parameter and the OPTIMIZER_GOAL parameter of the ALTER SESSION command for an individual SQL statement. For information on these hints, see Chapter 7, "Optimization Modes and Hints".

Choosing Access Paths


One of the most important choices the optimizer makes when formulating an execution plan is how to retrieve the data from the database. For any row in any table accessed by a SQL statement, there may be many access paths by which that row can be located and retrieved. The optimizer chooses one of them. For more information about access paths, see "The Optimizer" chapter in Oracle7 Server Concepts.

The optimizer can only choose to use a particular access path for a table if the statement contains a WHERE clause condition or other construct that makes that access path available. Each of the following sections describes an access path and discusses

Path 1: Single Row by ROWID This access path is only available if the statement's WHERE clause identifies the selected rows by ROWID or with the CURRENT OF CURSOR embedded SQL syntax supported by the Oracle Precompilers. To execute the statement, Oracle accesses the table by ROWID.

Example: This access path is available in the following statement:

SELECT * FROM emp WHERE ROWID = '00000DC5.0000.0001';

The EXPLAIN PLAN output for this statement might look like this:

OPERATION		OPTIONS	 	OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT   
TABLE ACCESS BY ROWID EMP

Path 2: Single Row by Cluster Join This access path is available for statements that join tables stored in the same cluster if both of these conditions are true:

These conditions must be combined with AND operators. To execute the statement, Oracle performs a nested loops operation. For information on the nested loops operation, see the section "Join Operations" on page A-37.

Example: This access path is available for the following statement in which the EMP and DEPT tables are clustered on the DEPTNO column and the EMPNO column is the primary key of the EMP table:

SELECT * 

	FROM emp, dept 

	WHERE emp.deptno = dept.deptno

	  AND emp.empno = 7900;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 		OPTIONS		 OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

	NESTED LOOPS 

	  TABLE ACCESS		BY ROWID	EMP 

		INDEX			UNIQUE SCAN	PK_EMP    

	  TABLE ACCESS		CLUSTER		DEPT 

PK_EMP is the name of an index that enforces the primary key.

Path 3: Single Row by Hash Cluster Key with Unique or Primary Key This access path is available if both of these conditions are true:

To execute the statement, Oracle applies the cluster's hash function to the hash cluster key value specified in the statement to obtain a hash value. Oracle then uses the hash value to perform a hash scan on the table.

Example: This access path is available in the following statement, in which the ORDERS and LINE_ITEMS tables are stored in a hash cluster, and the ORDERNO column is both the cluster key and the primary key of the ORDERS table:

SELECT * 

	FROM orders
WHERE orderno = 65118968;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION	 		OPTIONS 		OBJECT_NAME 
-----------------------------------------------------
SELECT STATEMENT
TABLE ACCESS HASH ORDERS

Path 4: Single Row by Unique or Primary Key This access path is available if the statement's WHERE clause uses all columns of a unique or primary key in equality conditions. For composite keys, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a unique scan on the index on the unique or primary key to retrieve a single ROWID and then accesses the table by that ROWID.

Example: This access path is available in the following statement in which the EMPNO column is the primary key of the EMP table:

SELECT * 
FROM emp
WHERE empno = 7900;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION	 	OPTIONS		 OBJECT_NAME 
-----------------------------------------------------
SELECT STATEMENT
TABLE ACCESS BY ROWID EMP
INDEX UNIQUE SCAN PK_EMP

PK_EMP is the name of the index that enforces the primary key.

Path 5: Clustered Join This access path is available for statements that join tables stored in the same cluster if the statement's WHERE clause contains conditions that equate each column of the cluster key in one table with the corresponding column in the other table. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a nested loops operation. For information on nested loops operations, see the section "Join Operations" on page A-37.

Example: This access path is available in the following statement in which the EMP and DEPT tables are clustered on the DEPTNO column:

SELECT * 
FROM emp, dept
WHERE emp.deptno = dept.deptno;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 			OPTIONS	 	OBJECT_NAME 
-----------------------------------------------------
SELECT STATEMENT
NESTED LOOPS
TABLE ACCESS FULL DEPT
TABLE ACCESS CLUSTER EMP

Path 6: Hash Cluster Key This access path is available if the statement's WHERE clause uses all the columns of a hash cluster key in equality conditions. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle applies the cluster's hash function to the hash cluster key value specified in the statement to obtain a hash value. Oracle then uses this hash value to perform a hash scan on the table.

Example: This access path is available for the following statement in which the ORDERS and LINE_ITEMS tables are stored in a hash cluster and the ORDERNO column is the cluster key:

SELECT * 
FROM line_items
WHERE orderno = 65118968;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 			OPTIONS		 OBJECT_NAME 
-----------------------------------------------------
SELECT STATEMENT
TABLE ACCESS HASH LINE_ITEMS

Path 7: Indexed Cluster Key This access path is available if the statement's WHERE clause uses all the columns of an indexed cluster key in equality conditions. For a composite cluster key, the equality conditions must be combined with AND operators. To execute the statement, Oracle performs a unique scan on the cluster index to retrieve the ROWID of one row with the specified cluster key value. Oracle then uses that ROWID to access the table with a cluster scan. Since all rows with the same cluster key value are stored together, the cluster scan requires only a single ROWID to find them all.

Example: This access path is available in the following statement in which the EMP table is stored in an indexed cluster and the DEPTNO column is the cluster key:

SELECT *  FROM emp 
WHERE deptno = 10;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION		 OPTIONS			 OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  TABLE ACCESS	CLUSTER			EMP     

	INDEX 		UNIQUE SCAN		 PERS_INDEX 

PERS_INDEX is the name of the cluster index.

Path 8: Composite Index This access path is available if the statement's WHERE clause uses all columns of a composite index in equality conditions combined with AND operators. To execute the statement, Oracle performs a range scan on the index to retrieve the ROWIDs of the selected rows and then accesses the table by those ROWIDs.

Example: This access path is available in the following statement in which there is a composite index on the JOB and DEPTNO columns:

SELECT * 

	FROM emp 

	WHERE job = 'CLERK'

	  AND deptno = 30;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION		 OPTIONS		 OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  TABLE ACCESS	BY ROWID	EMP     

	INDEX		 RANGE SCAN	 JOB_DEPTNO_INDEX 

JOB_DEPTNO_INDEX is the name of the composite index on the JOB and DEPTNO columns.

Path 9: Single-Column Indexes This access path is available if the statement's WHERE clause uses the columns of one or more single-column indexes in equality conditions. For multiple single-column indexes, the conditions must be combined with AND operators.

If the WHERE clause uses the column of only one index, Oracle executes the statement by performing a range scan on the index to retrieve the ROWIDs of the selected rows and then accessing the table by these ROWIDs.

Example: This access path is available in the following statement in which there is an index on the JOB column of the EMP table:

SELECT * 

	FROM emp 

	WHERE job = 'ANALYST';

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 		OPTION 		OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  TABLE ACCESS	BY ROWID	EMP
INDEX RANGE SCAN JOB_INDEX

JOB_INDEX is the index on EMP.JOB.

If the WHERE clauses uses columns of many single-column indexes, Oracle executes the statement by performing a range scan on each index to retrieve the ROWIDs of the rows that satisfy each condition. Oracle then merges the sets of ROWIDs to obtain a set of ROWIDs of rows that satisfy all conditions. Oracle then accesses the table using these ROWIDs.

Oracle can merge up to five indexes. If the WHERE clause uses columns of more than five single-column indexes, Oracle merges five of them, accesses the table by ROWID, and then tests the resulting rows to determine whether they satisfy the remaining conditions before returning them.

Example: This access path is available in the following statement in which there are indexes on both the JOB and DEPTNO columns of the EMP table:

SELECT * 

	FROM emp 

	WHERE job = 'ANALYST'
AND deptno = 20;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION	 	OPTIONS 			OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  TABLE ACCESS	BY ROWID		EMP 

	AND-EQUAL       

	  INDEX		 RANGE SCAN	 	JOB_INDEX 

	  INDEX		 RANGE SCAN		 DEPTNO_INDEX 

The AND-EQUAL operation merges the ROWIDs obtained by the scans of the JOB_INDEX and the DEPTNO_INDEX, resulting in a set of ROWIDs of rows that satisfy the query.

Path 10: Bounded Range Search on Indexed Columns This access path is available if the statement's WHERE clause contains a condition that uses either the column of a single-column index or one or more columns that make up a leading portion of a composite index:

column = expr 

column >[=] expr AND column <[=] expr 

column BETWEEN expr AND expr 

column LIKE 'c%' 

Each of these conditions specifies a bounded range of indexed values that are accessed by the statement. The range is said to be bounded because the conditions specify both its least value and its greatest value. To execute such a statement, Oracle performs a range scan on the index and then accesses the table by ROWID.

This access path is not available if expr references the indexed column.

Example: This access path is available in this statement in which there is an index on the SAL column of the EMP table:

SELECT * 

	FROM emp 

	WHERE sal BETWEEN 2000 AND 3000;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 		OPTIONS	 OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  TABLE ACCESS	BY ROWID	EMP   
INDEX RANGE SCAN SAL_INDEX

SAL_INDEX is the name of the index on EMP.SAL.

Example: This access path is also available in the following statement in which there is an index on the ENAME column of the EMP table:

SELECT * 

	FROM emp 	

	WHERE ename LIKE 'S%';

Path 11: Unbounded Range Search on Indexed Columns This access path is available if the statement's WHERE clause contains one of these conditions that use either the column of a single-column index or one or more columns of a leading portion of a composite index:

WHERE column >[=] expr 

WHERE column <[=] expr 

Each of these conditions specify an unbounded range of index values accessed by the statement. The range is said to be unbounded because the condition specifies either its least value or its greatest value, but not both. To execute such a statement, Oracle performs a range scan on the index and then accesses the table by ROWID.

Example: This access path is available in the following statement in which there is an index on the SAL column of the EMP table:

SELECT * 

	FROM emp 
WHERE sal > 2000;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 		OPTIONS 		OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  TABLE ACCESS	BY ROWID	EMP     
INDEX RANGE SCAN SAL_INDEX

Example: This access path is available in the following statement in which there is a composite index on the ORDER and LINE columns of the LINE_ITEMS table:

SELECT * 

	FROM line_items

	WHERE order > 65118968;

The access path is available because the WHERE clause uses the ORDER column, a leading portion of the index.

Example: This access path is not available in the following statement in which there is an index on the ORDER and LINE columns:

SELECT * 

	FROM line_items

	WHERE line < 4;

The access path is not available because the WHERE clause only uses the LINE column, which is not a leading portion of the index.

Path 12: Sort-Merge Join This access path is available for statements that join tables that are not stored together in a cluster if the statement's WHERE clause uses columns from each table in equality conditions. To execute such a statement, Oracle uses a sort-merge operation. Oracle can also use a nested loops operation to execute a join statement. For information on these operations and on when the optimizer chooses one over the other, see the section "Optimizing Join Statements" on page A-37.

Example: This access path is available for the following statement in which the EMP and DEPT tables are not stored in the same cluster:

SELECT * 

	FROM emp, dept

	WHERE emp.deptno = dept.deptno;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 			OPTIONS 		OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  MERGE JOIN 

	SORT			JOIN 

	  TABLE ACCESS		FULL		EMP 

	SORT			JOIN       
TABLE ACCESS FULL DEPT

Path 13: MAX or MIN of Indexed Column This access path is available for a SELECT statement for which all of these conditions are true:

To execute the query, Oracle performs a range scan of the index to find the maximum or minimum indexed value. Since only this value is selected, Oracle need not access the table after scanning the index.

Example: This access path is available for the following statement in which there is an index on the SAL column of the EMP table:

SELECT MAX(sal) FROM emp;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 			OPTIONS		 OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  AGGREGATE			GROUP BY     

	INDEX			RANGE SCAN	SAL_INDEX 

Path 14: ORDER BY on Indexed Column This access path is available for a SELECT statement for which all of these conditions are true:

To execute the query, Oracle performs a range scan of the index to retrieve the ROWIDs of the selected rows in sorted order. Oracle then accesses the table by these ROWIDs.

Example: This access path is available for the following statement in which there is a primary key on the EMPNO column of the EMP table:

SELECT * 

	FROM emp

	ORDER BY empno;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 		OPTIONS 		OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT 

  TABLE ACCESS	BY ROWID	EMP     

	INDEX		RANGE SCAN	PK_EMP 

PK_EMP is the name of the index that enforces the primary key. The primary key ensures that the column does not contain nulls.

Path 15: Full Table Scan This access path is available for any SQL statement, regardless of its WHERE clause conditions.

This statement uses a full table scan to access the EMP table:

SELECT *
FROM emp;

The EXPLAIN PLAN output for this statement might look like this:

OPERATION 		OPTIONS		 OBJECT_NAME 

----------------------------------------------------- 

SELECT STATEMENT   
TABLE ACCESS FULL EMP

Note that these conditions do not make index access paths available:

where column1 and column2 are in the same table.

regardless of whether column is indexed.

where expr is an expression that operates on a column with an operator or function, regardless of whether the column is indexed.

Any SQL statement that contains only these constructs and no others that make index access paths available must use full table scans.

Choosing Access Paths with the Cost-Based Approach


This section provides examples that show how the optimizer chooses among available access paths when using the cost-based approach. For more information, see "The Optimizer" chapter in Oracle7 Server Concepts.

Example: Consider this query, which uses a bind variable rather than a literal value for the boundary value in the WHERE clause condition:

SELECT * 

	FROM emp 

	WHERE empno < :e1;

The optimizer does not know the value of the bind variable E1. Indeed, the value of E1 may be different for each execution of the query. For this reason, the optimizer cannot use the means described in the previous example to determine selectivity of this query. In this case, the optimizer heuristically guesses a small value for the selectivity of the column (because it is indexed). The optimizer makes this assumption whenever a bind variable is used as a boundary value in a condition with one of the operators <, >, <=, or >=.

The optimizer's treatment of bind variables can cause it to choose different execution plans for SQL statements that differ only in the use of bind variables rather than constants. In one case in which this difference may be especially apparent, the optimizer may choose different execution plans for an embedded SQL statement with a bind variable in an Oracle Precompiler program and the same SQL statement with a constant in SQL*Plus.

Example: Consider this query, which uses two bind variables as boundary values in the condition with the BETWEEN operator:

SELECT * 

	FROM emp 

	WHERE empno BETWEEN :low_e AND :high_e;

The optimizer decomposes the BETWEEN condition into these two conditions:

empno >= :low_e 

empno <= :high_e 

The optimizer heuristically estimates a small selectiviy for indexed columns in order to favor the use of the index.

Example: Consider this query, which uses the BETWEEN operator to select all employees with employee ID numbers between 7500 and 7800:

SELECT * 

	FROM emp 

	WHERE empno BETWEEN 7500 AND 7800;

To determine the selectivity of this query, the optimizer decomposes the WHERE clause condition into these two conditions:

empno >= 7500 

empno <= 7800 

The optimizer estimates the individual selectivity of each condition using the means described in a previous example. The optimizer then uses these selectivities (S1 and S2) and the absolute value function (ABS) in this formula to estimate the selectivity (S) of the BETWEEN condition:

S = ABS( S1 + S2 - 1 )

Optimizing Join Statements


This section covers:


Optimizer Choices for Join Statements

To choose an execution plan for a join statement, the optimizer must make three related choices
access paths

As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement.

join operations

To join each pair of row sources, Oracle must perform one of these operations:
nested loops
sort-merge
cluster
hash join

join order

To execute a statement that joins more than two tables, Oracle joins two of the tables, and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result.


Join Operations

To join two row sources, Oracle must perform one of these operations:

Nested Loops Join To perform a nested loops join, Oracle follows these steps:

  1. The optimizer chooses one of the tables as the outer table, or the driving table. The other table is called the inner table.

  2. For each row in the outer table, Oracle finds all rows in the inner table that satisfy the join condition.

  3. Oracle combines the data in each pair of rows that satisfy the join condition and returns the resulting rows.

The following figure shows the execution plan for this statement using a nested loops join:

SELECT * 

	FROM emp, dept 

	WHERE emp.deptno = dept.deptno;

Nested Loops Join
To execute this statement, Oracle performs these steps:

Sort-Merge Join To perform a sort-merge join, Oracle follows these steps:

  1. Oracle sorts each row source to be joined if they have not been sorted already by a previous operation. The rows are sorted on the values of the columns used in the join condition.

  2. Oracle merges the two sources so that each pair of rows, one from each source, that contain matching values for the columns used in the join condition are combined and returned as the resulting row source.

Oracle can only perform a sort-merge join for an equijoin.

The following figure shows the execution plan for this statement using a sort-merge join:

SELECT * 

	FROM emp, dept 
WHERE emp.deptno = dept.deptno;

Sort-Merge Join
To execute this statement, Oracle performs these steps:

Cluster Join Oracle can perform a cluster join only for an equijoin that equates the cluster key columns of two tables in the same cluster. In a cluster, rows from both tables with the same cluster key values are stored in the same blocks, so Oracle only accesses those blocks. For information on clusters, including how to decide which tables to cluster for best performance, see Chapter 8, "Data Access Methods".

The following figure shows the execution plan for this statement in which the EMP and DEPT tables are stored together in the same cluster:

SELECT * 

	FROM emp, dept 
WHERE emp.deptno = dept.deptno;

Cluster Join
To execute this statement, Oracle performs these steps:

A cluster join is nothing more than a nested loops join involving two tables that are stored together in a cluster. Since each row from the DEPT table is stored in the same data blocks as the matching rows in the EMP table, Oracle can access matching rows most efficiently.

Hash Join To perform a hash join, Oracle follows these steps:

  1. Oracle performs a full table scan on each of the tables and splits each into as many partitions as possible based on the available memory.

  2. Oracle builds a hash table from one of the partitions (if possible, Oracle will select a partition that fits into available memory). Oracle then uses the corresponding partition in the other table to probe the hash table. All partitions pairs that do not fit into memory are placed onto disk.

  3. For each pair of partitions (one from each table), Oracle uses the smaller one to build a hash table and the larger one to probe the hash table.

Oracle can only perform a hash join for an equijoin.

The following figure shows the execution plan for this statement using a hash join:

SELECT * 

	FROM emp, dept 
WHERE emp.deptno = dept.deptno;

Hash Join
To execute this statement, Oracle performs these steps:

To enable the selection of the hash join algorithm you must use one of the following approaches: issue the statement ALTER SESSION SET HASH_JOIN_ENABLED = TRUE; or set the HASH_JOIN_ENABLED initialization parameter; or else employ the USE_HASH hint. See Oracle7 Server SQL Reference for more information on the syntax of the ALTER SESSION command.

The initialization parameter HASH_AREA_SIZE controls the memory to be used for hash join operations and the initialization parameter HASH_MULTIBLOCK_IO_COUNT controls the number of blocks a hash join operation should read and write concurrently. See Oracle7 Server Reference for more information about these initialization parameters.


Choosing Execution Plans for Join Statements

This section describes how the optimizer chooses an execution plan for a join statement:

Note these considerations that apply to the cost-based and rule-based approaches:

Choosing Execution Plans for Joins with the Rule-Based Approach With the rule-based approach, the optimizer follows these steps to choose an execution plan for a statement that joins R tables:

  1. The optimizer generates a set of R join orders, each with a different table as the first table. The optimizer generates each potential join order using this algorithm:

    1. a. To fill each position in the join order, the optimizer chooses the table with the most highly ranked available access path according to the ranks for access paths in "Choosing Access Paths" on page A-23. The optimizer repeats this step to fill each subsequent position in the join order.

    1. b. For each table in the join order, the optimizer also chooses the operation with which to join the table to the previous table or row source in the order. The optimizer does this by "ranking" the sort-merge operation as access path 12 and applying these rules:

  1. The optimizer then chooses among the resulting set of execution plans. The goal of the optimizer's choice is to maximize the number of nested loops join operations in which the inner table is accessed using an index scan. Since a nested loops join involves accessing the inner table many times, an index on the inner table can greatly improve the performance of a nested loops join. Usually, the optimizer does not consider the order in which tables appear in the FROM clause when choosing an execution plan.

    The optimizer makes this choice by applying the following rules in order:

    2. a. The optimizer chooses the execution plan with the fewest nested-loops operations in which the inner table is accessed with a full table scan.

    2. b. If there is a tie, the optimizer chooses the execution plan with the fewest sort-merge operations.

    2. c. If there is still a tie, the optimizer chooses the execution plan for which the first table in the join order has the most highly ranked access path:

Choosing Execution Plans for Joins with the Cost-Based Approach With the cost-based approach, the optimizer generates a set of execution plans based on the possible join orders, join operations, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in these ways:

With the cost-based approach, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for outer join, the optimizer ignores the hint and chooses the order. You can also override the optimizer's choice of join operations with hints. For information on using hints, see Chapter 7, "Optimization Modes and Hints".

Optimizing Star Queries



The Star Query Environment

One type of data warehouse design centers around what is known as a "star" schema. These schemas are characterized by the existence of one or more very large "fact" tables and a number of much smaller "dimension" tables. A typical fact table contains "keys" and "measures." For example, a simple fact table might contain the measure Sales, and keys Time, Product, and Market. In this case there would be corresponding dimension tables for Time, Product, and Market. The Product dimension table, for example, would typically contain information about each product number which appears in the fact table.

A "star join" refers to a primary key to foreign key join of the dimension tables to a fact table. The fact table normally has a concatenated index on the key columns to facilitate this type of join.

The Oracle cost-based optimizer recognizes such star queries and generates efficient execution plans for them.


Tuning Star Queries

This section discusses star query tuning, with reference to the following example:

SELECT SUM(dollars) FROM facts, time, product, market
WHERE market.stat = 'New York' AND
product.brand = 'MyBrand' AND
time.year = 1995 AND time.month = 'March' AND
/* Joins*/
time.key = facts.tkey AND
product.pkey = facts.pkey AND
market.mkey = facts.mkey;

A star query is thus a join between a very large table and a number of much smaller "lookup" tables. Each lookup table is joined to the large table using a primary key to foreign key join, but the small tables are not joined to each other.


The Cost-Based Optimizer

Star queries are not recognized by the rule based optimizer. To execute star queries efficiently, you must use the cost based optimizer. Begin by using the ANALYZE command to gather statistics for each of the tables.


Indexing

In the example above, you would construct a concatenated index on the columns tkey, pkey, and mkey. The order of the columns in the index is critical to performance. the columns in the index should take advantage of any ordering of the data. If rows are added to the large table in time order, then tkey should be the first key in the index. When the data is a static extract from another database, it is worthwhile to sort the data on the key columns before loading it.

If all queries specify predicates on each of the small tables, a single concatenated index suffices. If queries that omit leading columns of the concatenated index are frequent, additional indexes may be useful. In this example, if there are frequent queries that omit the time table, an index on pkey and mkey can be added.


Extended Star Schemas

Each of the small tables can be replaced by a join of several smaller tables. For example, the product table could be normalized into brand and manufacturer tables. Normalization of all of the small tables can cause performance problems. One problem is caused by the increased number of permutations that the optimizer must consider. The other problem is the result of multiple executions of the small table joins. Both problems can be solved by using denormalized views. For example,

CREATE VIEW prodview AS SELECT /*+ NO_MERGE */ *

FROM brands, mfgrs WHERE brands.mfkey = mfgrs.mfkey;

This hint will both reduce the optimizer's search space, and cause caching of the result of the view.


Hints

Usually, if you analyze the tables the optimizer will choose an efficient star plan. You can also use hints to improve the plan. The most precise method is to order the tables in the FROM clause in the order of the keys in the index, with the large table last. Then use the following hints:

/*+ ORDERED USE_NL(facts) INDEX(facts fact_concat) */

A more general method is to use the STAR hint /*+ STAR */.

Optimizing Compound Queries


To choose the execution plan for a compound query, the optimizer chooses an execution plan for each of its component queries and then combines the resulting row sources with the union, intersection, or minus operation, depending on the set operator used in the compound query.

The following figure shows the execution plan for this statement, which uses the UNION ALL operator to select all occurrences of all parts in either the ORDERS1 table or the ORDERS2 table:

SELECT part FROM orders1 

UNION ALL 

SELECT part FROM orders2;

Compound Query with UNION ALL Set Operator
To execute this statement, Oracle performs these steps:

The following figure shows the execution plan for this statement, which uses the UNION operator to select all parts that appear in either the ORDERS1 table or the ORDERS2 table:

SELECT part FROM orders1 

UNION 

SELECT part FROM orders2;

Compound Query with UNION Set Operator
This execution plan is identical to the one for the UNION ALL operator shown in Figure A-11, except that in this case Oracle uses the SORT operation to eliminate the duplicates returned by the UNION ALL operation.

Figure A-13 shows the execution plan for this statement, which uses the INTERSECT operator to select only those parts that appear in both the ORDERS1 and ORDERS2 tables:

SELECT part FROM orders1 

INTERSECT 

SELECT part FROM orders2;

Compound Query with INTERSECT Set Operator
To execute this statement, Oracle performs these steps:

Using Histograms



Height-Balanced Histograms

Oracle's histograms are height balanced (as opposed to width balanced). Width-balanced histograms divide the data into a fixed number of equal-width ranges and then count the number of values falling into each range. Height-balanced histograms place the same number of values into each range so that the endpoints of the range are determined by how many values are in that range.

For example, suppose that the values in a single column of a 1000-row table range between 1 and 100, and suppose that you want a 10-bucket histogram (ranges in a histogram are often referred to as "buckets"). In a width-balanced histogram, the buckets would be of equal width (1-10, 11-20, 21,-30, etc.) and each bucket would count the number of rows that fall into that range. In a height-balanced histogram, each bucket has the same height (in this case 100 rows), and then the endpoints for the buckets would be determined by the density of the distinct values in the column.


Advantages of Height-Balanced Histograms

The advantage of the height-balanced approach is clear when the data in the above example is highly skewed. Supposed that 800 rows of the example table had the value 5, and the remaining 200 rows are evenly distributed between 1 and 100. The width-balanced histogram would have 820 rows in the bucket labeled "1-10", and approximately 20 rows in the other buckets. The height-balanced histogram would have one bucket labeled "1-5", seven buckets labeled "5-5", one bucket labeled "5-50", and one bucket labeled "50-100".

If you wanted to know how many rows in the table contained the value "5", it is apparent from the height-balanced histogram that approximately 80% of the rows contain this value. However, the width-balanced histogram does not provide a mechanism for differentiating between the value "5" and the value "6". You would compute only 8% of the rows contain the value "5" in a width-balanced histogram. Thus, height-balanced histograms are more appropriate for determining the selectivity of column values.



Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index