Oracle7 Server Tuning

Library

Product

Contents

Index

Parallel Query Concepts

Parallel Query Concepts
Parallel Query Processing
Parallelizing SQL Statements
Setting the Degree of Parallelism
Managing the Query Servers

Parallel Query Concepts

This appendix describes the Oracle Server parallel query feature, covering the topics:

The information in this chapter applies only to the Oracle Server with the parallel query feature.

Note: Parallel query is not the same as the Parallel Server option of the Oracle Server. The Parallel Server option is not required to use this feature. However, some aspects of the parallel query feature apply only to an Oracle Parallel Server.

Parallel Query Processing

Without the parallel query feature, the processing of a SQL statement is always performed by a single server process. With the parallel query feature, multiple processes can work together simultaneously to process a single SQL statement. This capability is called parallel query processing. By dividing the work necessary to process a statement among multiple server processes, the Oracle Server can process the statement more quickly than if only a single server process processed it.

The parallel query feature can dramatically improve performance for data-intensive operations associated with decision support applications or very large database environments. Symmetric multiprocessing (SMP), clustered, or massively parallel systems gain the largest performance benefits from the parallel query feature because query processing can be effectively split up among many CPUs on a single system.

It is important to note that the query is parallelized dynamically at execution time. Thus, if the distribution or location of the data changes, Oracle automatically adapts to optimize the parallelization for each execution of a SQL statement.

The parallel query feature helps systems scale in performance when adding hardware resources. If your system's CPUs and disk controllers are already heavily loaded, you need to alleviate the system's load before using the parallel query feature to improve performance. Chapter 18, "Parallel Query Tuning" describes how your system can achieve the best performance with the parallel query feature.

The Oracle Server can use parallel query processing for any of these statements:

SELECT statements
subqueries in UPDATE, INSERT, and DELETE statements
CREATE TABLE ... AS SELECT statements
CREATE INDEX

Parallel Query Process Architecture

Without the parallel query feature, a server process performs all necessary processing for the execution of a SQL statement. For example, to perform a full table scan (for example, SELECT * FROM EMP), one process performs the entire operation. The following figure illustrates a server process performing a full table scan:

Full Table Scan without the Parallel Query feature

The parallel query feature allows certain operations (for example, full table scans or sorts) to be performed in parallel by multiple query server processes. One process, known as the query coordinator, dispatches the execution of a statement to several query servers and coordinates the results from all of the servers to send the results back to the user.

The following figure illustrates several query server processes simultaneously performing a partial scan of the EMP table. The results are then sent back to the query coordinator, which assembles the pieces into the desired full table scan.

Multiple Query Servers Performing a Full Table Scan in Parallel

The query coordinator process is very similar to the server processes in previous releases of the Oracle Server. The difference is that the query coordinator can break down execution functions into parallel pieces and then integrate the partial results produced by the query servers. Query servers get assigned to each operation in a SQL statement (for example, a table scan or a sort operation), and the number of query servers assigned to a single operation is the degree of parallelism for a query.

The query coordinator calls upon the query servers during the execution of the SQL statement (not during the parsing of the statement). Therefore, when using the parallel query feature with the multi-threaded server, the server processing the EXECUTE call of a user's statement becomes the query coordinator for the statement.

CREATE TABLE ... AS SELECT in Parallel

Decision support applications often require large amounts of data to be summarized or "rolled up" into smaller tables for use with ad hoc, decision support queries. Rollup often must occur regularly (such as nightly or weekly) during a short period of system inactivity. Because the summary table is derived from other tables' data, the recoverability from media failure for the smaller table may or may not be important and can be turned off. The parallel query feature allows you to parallelize the operation of creating a table as a subquery from another table or set of tables.

The following figure illustrates creating a table from a subquery in parallel.

Creating a Summary Table in Parallel

If you disable recoverability during parallel table creation, you should take a backup of the tablespace containing the table once the table is created to avoid loss of the table due to media failure. For more information about recoverability of tables created in parallel, see the Oracle7 Server Administrator's Guide.

Clustered tables cannot be created and populated in parallel.

For a discussion of the syntax of the CREATE TABLE command, see the Oracle7 Server SQL Reference.

When creating a table in parallel, each of the query server processes uses the values in the STORAGE clause. Therefore, a table created with an INITIAL of 5M and a PARALLEL DEGREE of 12 consumes at least 60M of storage during table creation because each process starts with an extent of 5M. When the query coordinator process combines the extents, some of the extents may be trimmed, and the resulting table may be smaller than the requested 60M.

For more information on how extents are allocated when using the parallel query feature, see Oracle7 Server Concepts.

Parallelizing SQL Statements

When a statement is parsed, the optimizer determines the execution plan of a statement. Optimization is discussed in Chapter 7, "Optimization Modes and Hints". After the optimizer determines the execution plan of a statement, the query coordinator process determines the parallelization method of the statement. Parallelization is the process by which the query coordinator determines which operations can be performed in parallel and then enlists query server processes to execute the statement. This section tells you how the query coordinator process decides to parallelize a statement and how the user can specify how many query server processes can be assigned to each operation in an execution plan (that is, the degree of parallelism).

To decide how to parallelize a statement, the query coordinator process must decide whether to enlist query server processes and, if so, how many query server processes to enlist. When making these decisions, the query coordinator uses information specified in hints of a query, the table's definition, and initialization parameters. The precedence for selecting the degree of parallelism is described later in this section. It is important to note that the optimizer attempts to parallelize a query only if it contains at least one full table scan operation.

Each query undergoes an optimization and parallelization process when it is parsed. Therefore, when the data changes, if a more optimal execution plan or parallelization plan becomes available, Oracle can automatically adapt to the new situation.

In the case of creating a table in parallel, the subquery in the CREATE TABLE statement is parallelized and the actual population of the table is parallelized, as well as any enforcement of NOT NULL or CHECK constraints.

Parallelizing Operations

Before enlisting query server processes, the query coordinator process examines the operations in the execution plan to determine whether the individual operations can be parallelized. The Oracle Server can parallelize these operations:

sorts
joins
table scans
table population
index creation

Partitioning Rows to Each Query Server

The query coordinator process also examines the partitioning requirements of each operation. An operation's partitioning requirement is the way in which the rows operated on by the operation must be divided, or partitioned, among the query server processes. The partitioning can be any of the following:

range
hash
round robin
random

After determining the partitioning requirement for each operation in the execution plan, the query coordinator determines the order in which the operations must be performed. With this information, the query coordinator determines the data flow of the statement. Figure C-4 illustrates the data flow of the following query:

SELECT dname, MAX(sal), AVG(sal)
	FROM emp, dept
		WHERE emp.deptno = dept.deptno
		GROUP BY dname;

Data Flow Diagram for a Join of the EMP and DEPT Tables

Operations that require the output of other operations are known as parent operations. In Figure C-4 the GROUP BY SORT operation is the parent of the MERGE JOIN operation because GROUP BY SORT requires the MERGE JOIN output.

Parallelism Between Operations

Parent operations can begin processing rows as soon as the child operations have produced rows for the parent operation to consume. In the previous example, while the query servers are producing rows in the FULL SCAN DEPT operation, another set of query servers can begin to perform the MERGE JOIN operation to consume the rows. When the FULL SCAN DEPT operation is complete, the FULL SCAN EMP operation can begin to produce rows.

Inter-Operator Parallelism and Dynamic Partitioning

Each of the two operations performed concurrently is given its own set of query server processes. Therefore, both query operations and the data flow tree itself have degrees of parallelism. The degree of parallelism of an individual operation is called intra-operation parallelism and the degree of parallelism between operations in a data flow tree is called inter-operation parallelism.

Due to the producer/consumer nature of the Oracle Server's query operations, only two operations in a given tree need to be performed simultaneously to minimize execution time.

To illustrate intra-operation parallelism and inter-operator parallelism, consider the following statement:

SELECT * FROM emp ORDER BY ename;

The execution plan consists of a full scan of the EMP table followed by a sorting of the retrieved rows based on the value of the ENAME column. For the sake of this example, assume the ENAME column is not indexed. Also assume that the degree of parallelism for the query is set to four, which means that four query servers can be active for any given operation. Figure C-5 illustrates the parallel execution of our example query.

As you can see from Figure C-5, there are actually eight query servers involved in the query even though the degree of parallelism is four. This is because a parent and child operator can be performed at the same time. Also note that all of the query servers involved in the scan operation send rows to the appropriate query server performing the sort operation. If a row scanned by a query server contains a value for the ENAME column between A and G, that row gets sent to the first ORDER BY query server. When the scan operation is complete, the sorting query servers can return the sorted results to the query coordinator, which in turn returns the complete query results to the user.

Note: When a set of query servers completes its operation, it moves on to operations higher in the data flow. For example, in the previous diagram, if there was another ORDER BY operation after the ORDER BY, the query servers performing the table scan perform the second ORDER BY operation after completing the table scan.