CREATE TABLE fact_1 (dim_1 NUMBER, dim_2 DATE, ...
				meas_1 NUMBER, meas_2 NUMBER, ... )TABLESPACE TSfacts PARALLEL;
			CREATE TABLE fact_2 (dim_1 NUMBER, dim_2 DATE,
				meas_1 NUMBER, meas_2 NUMBER, ... ) TABLESPACE TSfacts PARALLEL;
			. . .
			CREATE TABLE fact_12 (dim_1 NUMBER, dim_2 DATE, ... 
				meas_1 NUMBER, meas_2 NUMBER, ... ) 
			 TABLESPACE TSfacts PARALLEL;
			CREATE OR REPLACE VIEW facts AS  SELECT * FROM fact_1 UNION ALL
	
				SELECT * FROM fact_2 UNION ALL ...  
				SELECT * FROM fact_12;

Load the table in parallel.
We will use 10 processes loading into 30 disks. To accomplish this, the DBA must split the input into 120 files beforehand. The 10 processes will load the first partition in parallel on the first 10 disks, then the second partition in parallel on the second 10 disks, and so on through the twelfth partition. We run the following commands concurrently as background processes:
```
			SQLLDR DATA=fact_1.file_1 DIRECT=TRUE PARALLEL=TRUE 
				FILE=/dev/D1 
			...
			SQLLDR DATA=fact_1.file_10 DIRECT=TRUE PARALLEL=TRUE 
				FILE=/dev/D10 
			WAIT;
			...
			SQLLDR DATA=fact_12.file_1 DIRECT=TRUE PARALLEL=TRUE 
				FILE=/dev/D21 
			SQLLDR DATA=fact_12.file_10 DIRECT=TRUE PARALLEL=TRUE 
				FILE=/dev/D30 
```
For Oracle Parallel Server, divide the loader session evenly among the nodes. The data file being read should always reside on the same node as the loader session. NFS mount of the data file on a remote node is not an optimal approach.
Note: Although this example shows parallel load used with partition views, the two features can be used independent of one another.

Define partitioning criteria.
The final step in creating a partition view is to add check constraints to the partitions so that the optimizer can skip partitions that are not needed to answer specific queries. For example, if a query scans rows in which dim_2 is between January 1 and February 13, you only need to scan fact_1 and fact_2. The constraints also prevent erroneous inserts and updates.
We run the following commands concurrently in the background.
```
			ALTER TABLE fact_1 ADD CONSTRAINT month_1 
				CHECK (dim_2 BETWEEN `01-01-1995' AND `01-31-1995') 
			...
			ALTER TABLE fact_12 ADD CONSTRAINT month_12 
				CHECK (dim_2 BETWEEN `12-01-1995' AND `12-31-1995') 
```

Setting Up Temporary Tablespaces for Parallel Sort and Hash Join

For optimal space management performance you can use the dedicated temporary tablespaces available in release 7.3. As with the TSfacts tablespace, we first add a single datafile and later add 29 more in parallel.

CREATE TABLESPACE TStemp TEMPORARY DATAFILE '/dev/D31' 
	SIZE 4096MB REUSE
DEFAULT STORAGE (INITIAL 10MB NEXT 10MB PCTINCREASE 0);

Size of Temporary Extents

Temporary extents should normally be all the same size (to avoid fragmentation), and smaller than permanent extents. As a general rule, temporary extents should be in the range of 1MB to 10MB.

Temporary extents should be smaller than permanent extents because there are more demands for temporary space, and parallel processes or other queries running concurrently must share the temporary tablespace. Once you allocate an extent it is yours for the duration of your operation. If you allocate a large extent but only need to use a small amount of space, the unused space in the extent is tied up.

At the same time, temporary extents should be large enough that processes do not have to spend all their time waiting for space. Although temporary tablespaces use less overhead than permanent tablespaces when allocating and freeing a new extent, obtaining a new temporary extent is not completely free of overhead.

Operating System Striping of Temporary Tablespaces

Operating system striping is an alternative technique you can use with temporary tablespaces. Media recovery, however, offers subtle challenges for large temporary tablespaces. It does not make sense to mirror, use RAID, or back up a temporary tablespace. If you lose a disk in an OS striped temporary space, you will probably have to drop and recreate the tablespace. This could take several hours for our 120 GB example. With Oracle striping, simply remove the bad disk from the tablespace. For example, if /dev/D50 fails, enter:

ALTER DATABASE DATAFILE `/dev/D50' RESIZE 1K;
ALTER DATABASE DATAFILE `/dev/D50' OFFLINE;

Because the dictionary sees the size as 1K, which is less than the extent size, the bad file will never be accessed. Eventually, you may wish to recreate the tablespace because Oracle allows a maximum of 1022 files in the database.

Be sure to make your temporary tablespace available for use:

ALTER USER scott TEMPORARY TABLESPACE TStemp;

Creating Indexes in Parallel

After loading data and setting up for sorts, the next step is to create indexes. The considerations for creating the index tablespace are similar to those for other tablespaces. Because indexes are accessed much more randomly than tables and temporary space, OS striping with a small stripe width is often the best choice. Because of the performance advantage of the UNRECOVERABLE option, consider mirroring or recreation in the event of media failure.

Create at least as many files in the tablespace as the degree of parallelism used to create indexes in the tablespace. This will reduce fragmentation.

Because we used partition views, we must create indexes on each partition. Our options are summarized in the following table:

Each Index, One at a Time

All Indexes Concurrently

Serial

Slow

Use scripting language. I/O interference is possible: processes could interfere with each other, repeatedly scanning the table.

Parallel

Recommended: Run each statement, one at a time, parallelized. Spreads data over files.

May overload system capacity

If OS striping is used, we can choose to create indexes one at a time using parallel index creation for each one. Creating all indexes concurrently in parallel would probably overload the capacity of the machine. If Oracle striping is used, we should use parallel index creation for each index so that each index is spread over many disks for high I/O bandwidth (unless partial availability after media failure is the primary goal). To do this we enter:

CREATE TABLESPACE TSidx DATAFILE 'dev/D51' SIZE 4096M
	DEFAULT STORAGE (INITIAL 40MB NEXT 40MB PCTINCREASE 0);
CREATE INDEX I1 ON fact_1(dim_1, dim_2, dim_3) 
	TABLESPACE TSidx PARALLEL UNRECOVERABLE;
CREATE INDEX I2 ON fact_2(dim_1, dim_2, dim_3) 
	TABLESPACE TSidx PARALLEL UNRECOVERABLE;
...
CREATE INDEX I12 ON fact_12(dim_1, dim_2, dim_3) 
	TABLESPACE TSidx PARALLEL UNRECOVERABLE;

The PARALLEL clause directs use of the default parallelism (10) to scan facts and to sort and build the index. Twenty query server processes will be used in total.

The UNRECOVERABLE clause specifies that no redo log records are to be written when building the index. Although this speeds up index creation significantly, a media recovery strategy that relies on backups and archived log files will require the DBA to re-issue the CREATE INDEX commands if a disk in Tsidx fails after indexes are created but before they are backed up.

Step 3: Analyzing Data

After the data is loaded and indexed, analyze it. The ANALYZE command does not execute in parallel against a single object. However, many different objects (such as all partitions of a partition view) can be analyzed in parallel.

Note: Cost-based optimization is always used with parallel query and with partition views. You must therefore perform ANALYZE at the partition level with partitioned tables and with parallel query

Queries with many joins are quite sensitive to the accuracy of the statistics. Use the COMPUTE option of the ANALYZE command if possible (it may take quite some time and a large amount of temporary space). If you must use the ESTIMATE option, sample as large a percentage as possible. Use histograms for data which is not uniformly distributed. Note that a great deal of data falls into this classification.

When you analyze a table, the indexes that are defined on that table are also analyzed. To analyze all partitions of facts (including indexes) in parallel, run the following commands concurrently as background processes:

ANALYZE TABLE fact_1 COMPUTE STATISTICS
ANALYZE TABLE fact_2 COMPUTE STATISTICS
...
ANALYZE TABLE fact_12 COMPUTE STATISTICS

It is worthwhile computing or estimating with a larger sample size the indexed columns and indexes themselves, rather than the measure data. The measure data is not used as much: most of the predicates and critical optimizer information comes from the dimensions. A DBA or application designer should know which columns are the most frequently used in predicates.

For example, you might analyze the data in two passes. In the first pass you could obtain some statistics by analyzing one percent of the data. Run the following commands concurrently as background processes:

ANALYZE TABLE fact_1 ESTIMATE STATISTICS SAMPLE 1 PERCENT 
ANALYZE TABLE fact_2 ESTIMATE STATISTICS SAMPLE 1 PERCENT 
...
ANALYZE TABLE fact_12 ESTIMATE STATISTICS SAMPLE 1 PERCENT

In a second pass, you could refine statistics for the indexed columns and the index (but not the non-indexed columns):

ANALYZE TABLE fact_1 COMPUTE STATISTICS FOR ALL INDEXED COLUMNS SIZE 1 
ANALYZE TABLE fact_1 COMPUTE STATISTICS FOR ALL INDEXES 
ANALYZE TABLE fact_2 COMPUTE STATISTICS FOR ALL INDEXED COLUMNS SIZE 1
ANALYZE TABLE fact_2 COMPUTE STATISTICS FOR ALL INDEXES

...
ANALYZE TABLE fact_12 COMPUTE STATISTICS FOR ALL INDEXED COLUMNS SIZE 1 
ANALYZE TABLE fact_12 COMPUTE STATISTICS FOR ALL INDEXES

The result will be a faster plan because you have targeted more important information. You are spending more resources to get good statistics on high value columns (indexes and join columns), and getting baseline statistics for the rest of the data.

Understanding Parallel Query Performance Issues

The Formula for Memory, Users, and Query Servers

Key to parallel query tuning is an understanding of the relationship between memory requirements, number of users (processes) a system can support, and maximum number of query servers. The goal is to obtain the dramatic performance enhancement made possible by parallelizing certain operations, and by using hash joins rather than sort merge joins. This performance goal must often be balanced with the need to support multiple users.

In considering the maximum number of processes a system can support, it is useful to divide the processes into three classes, based on their memory requirements. The maximum number of processes which can fit in memory can then be analyzed as follows:

+ (# low memory processes * low memory required)
+ (# medium memory processes * medium memory required)
+ (# high memory processes * high memory required)
_________________________________
total memory required

Formula for Memory/Users/Server Relationship

In general, if max_processes is much bigger than the number of users, you can consider running parallel queries. If max_processes is considerably less than the number of users, you must consider other alternatives, such as those described in the following section.

Low Memory Processes: 100K to 1MB

These processes include table scans; index lookups; index nested loop joins; single row aggregates (such as sum or average with no GROUP BYs, or very few groups); sorts that return only a few rows; and direct loading.
This class of DSS process is similar to OLTP processes in the amount of memory required. Process memory could be as low as a few hundred kilobytes of fixed overhead. You could potentially support thousands of users performing this kind of operation. You can take this requirement even lower by using the multi-threaded server, and support even more users.

Medium Memory Processes: 1MB to 10MB

This class of process includes large sorts; sort merge join; GROUP BYs or ORDER BYs returning a large number of rows; and index creation.
These processes require the fixed overhead needed by a low memory process, plus one or more sort areas, depending on the query. For example, a typical sort merge join would sort both its inputs--resulting in two sort areas. Group by or order by with many groups or rows also requires sort areas.
Look at the EXPLAIN PLAN output for the query to identify the number and type of joins, and the number and type of sorts. Optimizer statistics in the plan show the size of the operations. When planning joins, remember that you do have a number of choices.

High Memory Processes: 10MB to 100MB

High memory processes include one or more hash joins; or a combination of one or more hash joins with large sorts.
These processes require the fixed overhead needed by a low memory process, plus hash area. The hash area size required might range from 8MB to 32MB, and you might need two of them. If you are performing 2 or more serial hash joins, each process uses 2 hash areas. In a parallel query, since each query server process does at most 1 hash join at a time, you would need 1 hash area size per server.
In summary, the amount of hash join memory for a query equals parallel degree multiplied by hash area size, multiplied by the minimum of either 2, or the number of hash joins in the query.

See Also: "Optimizing Join Statements" on page A-37 for a comparison of hash joins and sort merge joins.

How to Balance the Formula

You can use the following techniques to balance the memory/user/server formula:

Oversubscribe, with Attention to Paging
Reduce the Number of Processes
Decrease DSS Memory per Process
Decrease Parallelism for Multiple Users
Examples of Balancing Memory, Users, and Query Servers

Oversubscribe, with Attention to Paging

You can permit the potential workload to exceed the limits recommended in the formula. Total memory required, minus the SGA size, can be multiplied by a factor of 1.2, to allow for 20% oversubscription. Thus, if you have 1G of memory, you might be able to support 1.2G of demand: the other 20% could be handled by the paging system.

Your system may be able to perform acceptably even if oversubscribed by 60%, if on average not all of the processes are performing hash joins concurrently. Users might then try to use more than the available memory, so you must monitor paging activity in such a situation. If paging goes up dramatically, consider another alternative.

On average, no more than 5% of the time should be spent simply waiting in the operating system on page faults. To spend more wait time than this indicates an I/O bound condition of the paging subsystem. Use your operating system monitor to check wait time: The sum of time waiting and time running equals 100%. If you are running close to 100% CPU, then you are not waiting. If you are waiting, it should not be on account of paging.

If wait time for paging devices exceeds 5%, it is a strong indication that you must reduce memory requirements. This could mean reducing the memory required for each class of process, or reducing the number of processes in memory-intensive classes. It could mean adding memory. Or it could indicate an I/O bottleneck in the paging subsystem that you could resolve by striping.

Note: You must verify that a particular degree of oversubscription will be viable on your system by monitoring the paging rate and making sure you are not spending more than a very small percent of the time waiting for the paging subsystem.

Reduce the Number of Memory-intensive Processes

Adjusting the Degree of Parallelism. Not only can you adjust the number of queries that run in parallel, but you can also adjust the degree of parallelism with which queries run. To do this you would issue an ALTER TABLE statement with a PARALLEL clause, or use a hint. See the Oracle7 Server SQL Reference for more information.

You can limit the parallel pool by reducing the value of PARALLEL_MAX_SERVERS. This places a system-level limit on the total amount of parallelism, and is easy to administer. More processes will then be forced to run in serial mode.

Scheduling Parallel Jobs. Rather than reducing parallelism for all operations, you may be able to schedule large parallel batch jobs to run with full parallelism one at a time, rather than concurrently. Queries at the head of the queue would have a fast response time, those at the end of the queue would have a slow response time. Queueing jobs is thus another way to reduce the number of processes but not reduce parallelism; its disadvantage, however, is a certain amount of administrative overhead.

Decrease DSS Memory per Process

Note: The following discussion focuses upon the relationship of HASH_AREA_SIZE to memory, but all the same considerations apply to SORT_AREA_SIZE. The lower bound of SORT_AREA_SIZE, however, is not as critical as the 8 MB recommended minimum HASH_AREA_SIZE.

If every query performs a hash join and a sort, the high memory requirement limits the number of processes you can have. To allow more users to run concurrently you may need to reduce the DSS process memory.

Moving Processes from High to Medium Memory Requirements. You can move a process from the high-memory class to moderate-memory by changing from hash join to merge join. You can use initialization parameters to limit available memory and thus force the optimizer to stay within certain bounds.

You can disable the hash join capability and explicitly enable it for important hash joins you want to run in batch. On a per-instance basis you could set HASH_JOIN_ENABLED to false, and set it to true only on a per-session basis. Conversely, you could set HASH_JOIN_ENABLED to true on a per-instance basis, and make it false for particular sessions.

Alternatively, you can reduce HASH_AREA_SIZE to well below the recommended minimum (for example, to 1-2MB). Then you can let the optimizer choose sort merge join more often (as opposed to telling the optimizer never to use hash joins). In this way, hash join can still be used for small tables: the optimizer has a memory budget within which it can make decisions about which join method to use.

Remember that the recommended parameter values provide the best response time. If you severely limit these values you may see a significant affect on response time.

Moving Processes from High or Medium Memory Requirements to Low Memory Requirements. If you need to support thousands of users, you must create access paths such that queries do not touch much data. Decrease the demand for index joins by creating indexes and/or summary tables. Decrease the demand for GROUP BY sorting by creating summary tables and encouraging users and applications to reference summaries rather than detailed data. Decrease the demand for ORDER BY sorts by creating indexes on frequently sorted columns.

Decrease Parallelism for Multiple Users

In general there is a trade-off between parallelism for fast single user response time, and efficient use of resources for multiple users. For example, a system with 2G of memory and a HASH_AREA_SIZE of 32MB can support about 60 query servers. A 10 CPU machine can support up to 3 concurrent parallel queries (2 * 10 * 3 = 60). In order to support 12 concurrent parallel queries, the DBA could override the default parallelism (reduce it); decrease HASH_AREA_SIZE; buy more memory, or use some combination of these three strategies. For example, the DBA could ALTER TABLE t PARALLEL (DEGREE 5) for all parallel tables t, set HASH_AREA_SIZE to 16M, and increase PARALLEL_MAX_SERVERS to 120. By reducing the memory of each query server by a factor of 2, and reducing the parallelism of a single query by a factor 2, the system can accommodate 2*2 = 4 times more concurrent queries.

The penalty for taking such an approach is that when a single query happens to be running, the system will use just half the CPU resource of the 10 CPU machine. The other half will be idle until another query is started.

To determine whether your system is being fully utilized, you can use one of the graphical system monitors which are available on most operating systems. These monitors often give you a better idea of CPU utilization and system performance than monitoring the execution time of a query. Consult your operating system documentation to determine whether your system supports graphical system monitors.

Examples of Balancing Memory, Users, and Query Servers

The examples in this section show how to evaluate the relationship between memory, users, and query servers, and balance the formula given in Figure 18-8. They show concretely how you might adjust your system workload so as to accommodate the necessary number of processes and users.

Example 1

Assume your system has 1G of memory, and that your users perform ad hoc joins with 3 or more tables. If you need 300MB for the SGA, that leaves 700MB to accommodate processes. If you allow a generous hash area size (32MB) for best performance, then you can support one of the following:

1 parallel query (32MB * 10 * 2 = 640MB)
1 serial query (32MB * 2 = 64MB)
for a total of 704MB. (Note that the memory is not significantly oversubscribed.)
11 serial queries

Remember that every parallel, hash, or sort merge join query takes a number of query servers equal to twice the degree of parallelism, and often each individual process of a parallel query uses a lot of memory. Thus you can support many more users by having them run serially, or by having them run with less parallelism.

To server more users, you can drastically reduce hash area size to 2MB. You may then find that the optimizer switches some queries to sort merge join. This configuration can support 17 parallel queries, or 170 serial queries, but response times may be significantly higher than if you were using hash joins.

Notice the trade-off above: by reducing memory per process by a factor of 16, you can increase the number of concurrent users by a factor of 16. Thus the amount of physical memory on the machine imposes another limit on total number of parallel queries you can run involving hash joins and sorts.

Example 2

In a mixed workload example, consider the following user population:

several hundred users doing simple lookups of individual customer accounts, making reports on already joined, partially summarized data
interactive users who pull data into their spreadsheets
a DBA who runs nightly batch jobs, and occasional batch jobs during the day. These might be parallel queries that do hash joins which use a lot of memory.

In this situation, you would have to make some choices. You could not allow everyone to run hash joins, even though they outperform sort merge joins--because you do not have the memory to support this level of workload.

You might take 20 query servers, and set HASH_AREA_SIZE to a midrange value, perhaps 20MB, for a single powerful batch job in the high memory class. Twenty servers multiplied by 20MB equals 400MB of memory. (This might be a big GROUP BY with join to produce a summary of data.)

You might plan for 10 analysts running sequential queries that use complex hash joins accessing a large amount of data. (You would not allow them to do parallel queries because of memory requirements.) Ten such sequential processes at 40MB apiece equals 400MB of memory.

Finally, to support hundreds of users doing low memory processes at about 0.5MB apiece, you might reserve 200MB.

You might consider it safe to oversubscribe at 50% because of the infrequent batch jobs during the day. This would give you enough virtual memory for the workload described above (700MB * 1.5 = 1.05GB).

Example 3

Suppose there are 200 query servers and 100 users doing heavy DSS involving hash joins. You decide to leave such tasks as index retrievals and small sorts out of the picture, concentrating on the high memory processes. You might have 300 processes, of which 200 must come from the parallel pool and 100 are single threaded. One quarter of the total 2G of memory might be used by the SGA, leaving 1.5 G of memory to handle all the processes. You could apply the formula considering only the high memory requirements, including a factor of 20% oversubscription:

Formula for Memory/User/Server Relationship: DSS Process Memory

Here, 5 MB = 1.8G/300. Less than 5MB of hash area would be available for each process, whereas 8 MB is the recommended minimum. If you must have 300 processes, you may need to force them to use other join methods in order to change them from the highly memory intensive class to the moderately memory intensive class. Then they may fit within your system's constraints.

Example 4

Consider a system with 2 gigabytes of memory and 10 users who want to run intensive DSS parallel queries concurrently and still have good performance. If you choose parallelism of degree 10, then the 10 users will require 200 processes. (Processes running big joins need twice the number of query servers as the degree of parallelism, so you would set PARALLEL_MAX_SERVERS to 10 * 10 * 2.) In this example each process would get 1.8G/200, or about 9MB of hash area--which should be adequate.

With only 5 users doing large hash joins, each process would get over 16 MB of hash area, which would be fine. But if you want 32 MB available for lots of hash joins, the system could only support 2 or 3 users. By contrast, if users are just computing aggregates the system needs adequate sort area size--and can have many more users.

Example 5

If such a system needs to support 1000 users who must all run big queries you must evaluate the situation carefully. Here, the per user memory budget is only 1.8MB (that is, 1.8G divided by 1,000). Since this figure is at the low end of the medium weight query class, you must rule out parallel query operations, which use even more resources. You must also rule out big hash joins. Each sequential process could require up to 2 hash areas plus sort area, so you would have to set HASH_AREA_SIZE to the same value as SORT_AREA_SIZE, which would be 600K (1.8MB/3). Such a small hash area size is likely to be ineffective, so you may opt to disable hash joins altogether.

Given the organization's resources and business needs, is it reasonable for you to upgrade your system's memory? If memory upgrade is not an option, then you must change your expectations. To adjust the balance you might:

Accept the fact that the system will actually support a limited number of users doing big hash joins.
Expect to support the 1000 users doing index lookups and joins that do not require large amounts of memory. Sort merge joins require less memory, but throughput will go down because they are not as efficient as hash joins.
Give the users access to summary tables, rather than to the whole database.
Classify users into different groups, and give some groups more memory than others. Instead of all users doing sorts with a small sort area, you could have a few users doing high-memory hash joins, and most users doing low-memory index joins and using summary tables.

Parallel Query Space Management Issues

This section describes space management issues that come into play when using the parallel query.

ST Enqueue
Sorts and Temporary Data
Permanent Data in Tables, Indexes, Clusters

These issues become particularly important for parallel query running on a parallel server, where tuning becomes more critical the more nodes involved.

ST Enqueue

Every space management transaction in the database is controlled by a single ST enqueue. A high transaction rate (more than 2 or 3 per minute) on the ST enqueue may result in poor scalability on OPS systems with many nodes, or a timeout waiting for space management resources.

Sorts and Temporary Data

Try to minimize the number of space management transactions, in particular:

the number of sort space management transactions
the creation and removal of objects
transactions caused by fragmentation in a tablespace. Do this by setting INITIAL and NEXT to the same value, and PCTINCREASE to zero.

Permanent Data in Tables, Indexes, Clusters

Try to minimize the number of sort space management transactions.

Use dedicated temporary tablespaces to optimize space management for sorts. This is particularly beneficial on a parallel server. You can monitor this using V$SORT_SEGMENT.

Set initial and next extent size to a value in the range of 1MB to 10MB. Processes may use temporary space at a rate of up to 1MB per second. Do not accept the default value of 40K for next extent size, because this will result in many requests for space per second.

If you are unable to allocate extents for various reasons, you can recoalesce the space by using the ALTER TABLESPACE COALESCE SPACE command. This should be done on a regular basis for temporary tablespaces in particular.

Optimizing Parallel Query on Oracle Parallel Server

Lock Allocation

This section provides parallel query tuning guidelines for optimal lock management on the parallel server:

To optimize the parallel query on a parallel server, you need to correctly set GC_FILES_TO_LOCKS. On a parallel server a certain number of parallel cache management (PCM) locks are assigned to each data file. DBA locking in its default behavior assigns one lock to each block. During a full table scan a PCM lock must then be acquired for each block read into the scan. To speed up full table scans, you have three possibilities:

For data files containing truly read-only data, set the tablespace to read only. Then there will be no PCM locking at all.
Alternatively, for data which is mostly read-only, assign very few hashed PCM locks (for example, 2 shared locks) to each data file. Then these will be the only locks you have to acquire when you read the data.
If you want DBA or fine grain locking, group together the blocks which are controlled by each lock, using the ! option. This has advantages over default DBA locking because with the default, you would need to acquire a million locks in order to read 1 million blocks. When you group the blocks you reduce the number of locks allocated by the grouping factor. Thus a grouping of !10 would mean that you would only have to acquire one tenth as many PCM locks as with the default. Performance improves due to the dramatically reduced amount of lock allocation. As a rule of thumb, performance with a grouping of !10 might be comparable to the speed of hashed locking.

The following guidelines impact memory usage, and thus indirectly affect performance:

Never allocate PCM locks for data files of temporary tablespaces.
Never allocate PCM locks for datafiles which only contain rollback segments. These are protected by GC_ROLLBACK_LOCKS and GC_ROLLBACK_SEGMENTS.
Allocate specific PCM locks for the SYSTEM tablespace. This is a good practice which ensures that data dictionary activity such as space management never interferes with the data tablespaces at a cache management level (error 1575).
For example, on a read-only database with a data warehousing application's query-only workload, you might create 500 PCM locks on the SYSTEM tablespace in file 1, then create 50 more locks to be shared for all the data in the other files. Space management work will then never interfere with the rest of the database.

Allocation of Processes and Instances

Oracle computes a target degree of parallelism by examining the maximum of the degree for each table and other factors, before runtime. At runtime, a parallel query will be executed sequentially if insufficient query servers are available. PARALLEL_MIN_PERCENT sets the minimum percentage of the target number of query servers which must be available, if the query is to run in parallel. When PARALLEL_MIN_PERCENT is set to n, an error message will be sent if n percent query server processes are not available. If no parallel query processes are available, a parallel query will be executed sequentially.

The parallel query feature assigns each instance a unique number, which is determined by the INSTANCE_NUMBER initialization parameter. The instance number regulates the order of instance startup.

Load Balancing for Multiple Concurrent Parallel Queries

Load balancing is an effort to distribute the parallel query processes to achieve even CPU and memory utilization, and to minimize remote I/O and communication between nodes.

When multiple concurrent queries are running on a single node, load balancing is done by the operating system. For example, if there are 10 CPUs and 5 query servers, the operating system distributes the 5 processes among the CPUs. If a second user is added, the operating system still distributes the workload.

For a parallel server, however, no single operating system performs the load balancing: instead, the parallel query feature performs this function.

ALTER TABLE tablename PARALLEL (INSTANCES 1)

If a query requests more than one instance, allocation priorities involve table caching and disk affinity.

ALTER TABLE tablename PARALLEL (INSTANCES 2)

Thus, if there are 5 query servers, it is advantageous for them to run on as many nodes as possible.

Disk Affinity

Some Oracle Parallel Server platforms use disk affinity: processes are allocated on instances that are closest to the requested data. Without disk affinity, Oracle tries to balance the allocation evenly across instances. Thus, with 10 nodes and 2 users, the parallel query feature will run query 1 on the first 5 nodes and query 2 on the second 5 nodes. The two will not overlap.

With disk affinity, Oracle tries to allocate query servers for parallel table scans on the instances which own the data. Disk affinity exploits a "shared nothing" architecture by minimizing data shipping and internode communication. It can significantly increase parallel query throughput and response time.

Disk affinity is used for parallel table scans and parallel temporary tablespace allocation, but is not used for parallel table creation or parallel index creation. Temporary tablespaces internally try to use storage that is local to an instance. It guarantees optimal space management extent allocation. Optimization is the calculation of disk affinity to achieve best performance. Operating system striping disables disk affinity.

In the following example of disk affinity, table T is distributed across 3 nodes, and a full table scan on table T is being performed.

Disk Affinity Example

If a query requires 2 instances, then two instances from the set 1, 2, and 3 are used.
If a query requires 3 instances, then instances 1, 2, and 3 are used.
If a query requires 4 instances, then all four instances are used.
If there are two concurrent queries against table T, each requiring 3 instances (and enough processes are available on the instances for both queries), then both queries will use instances 1, 2, and 3. Instance 4 will not be used. In contrast, without disk affinity instance 4 would be used.

Detecting Parallel Query Performance Problems

This section summarizes common tools and techniques you can use to obtain performance feedback on parallel queries.

Diagnosing Problems

Use the following decision tree to diagnose parallel query performance problems. Some key issues are the following:

Quantify your performance expectations, to determine if there is a problem.
Determine whether a problem pertains to optimization (such as an inefficient plan, which may require re-analyzing tables or adding hints) or to execution (such as simple operations like scanning, loading, grouping, or indexing running much slower than published guidelines).
Determine whether it is a problem when running in parallel (such as load imbalance or resource bottleneck) or whether the problem is also present when running serially.

Parallel Query Performance Checklist

Is There Regression?

Does the parallel query's actual performance deviate from what you expected? If performance is as you expected, can you justify the notion that there is a performance problem? Perhaps you have a desired outcome in mind, to which you are comparing the current outcome. Perhaps you have a justifiable performance expectation which the system is not achieving, You might have achieved this level of performance or particular execution plan in the past, but now, with a similar environment and operation, this is not being met.

If performance is not as you expected, can you quantify the deviation? For decision support queries, the execution plan is key. For critical DSS queries, save the EXPLAIN PLAN results. Then, as you analyze the data, reanalyze, upgrade Oracle, and load in new data over the course of time, you can compare any new execution plan with the old plan. You can take this approach either proactively or reactively.

Alternatively, you may find that you get a plan that works better if you use hints. You may want to understand why hints were necessary, and figure out how to get the optimizer to generate the desired plan without the hints. Try increasing the statistical sample size: better statistics may give you a better plan. If you had to use a PARALLEL hint, look to see whether you had OPTIMIZER_PERCENT_PARALLEL set to 100%.

Is There a Plan Change?

If there has been a change in the execution plan, determine whether the plan is (or should be) parallel or serial.

Is There a Parallel Plan?

If the execution plan is (or should be) parallel:

Study the EXPLAIN PLAN output. Did you analyze all the tables? Perhaps you need to use hints in a few cases. Verify that the hint gives better results.
If you want a parallel plan, but the optimizer has not given you one, try increasing OPTIMIZER_PERCENT_PARALLEL to 100 and see if this improves performance.

If the execution plan is (or should be) serial, consider the following strategies:

Use an index. Sometimes adding the right index can greatly improve performance. Consider adding an extra column to the index: perhaps your query could obtain all its data from the index, and not require a table scan. Perhaps you need to use hints in a few cases. Verify that the hint gives better results.
Compute statistics, especially if there are many joins. Or estimate with a sample of 20% or more.
Note: Using different sample sizes can cause the plan to change. Generally, the higher the sample size, the better the plan.
Use histograms for non-uniform distributions
Check initialization parameters to be sure the values are reasonable.
Replace bind variables with literals.
Note whether execution is I/O or CPU bound, then check the OPTIMIZER COST MODEL.
Convert subqueries to joins.
Use CREATE TABLE AS SELECT to break a complex query into smaller pieces. With a large query referencing five or six tables, it may be difficult to determine which part of the query is taking the most time. You can isolate the troublesome parts of the query by breaking it into steps and analyzing each step.

Is There Parallel Execution?

If the cause of regression cannot be traced to problems in the plan, then the problem must be an execution issue.

For DSS queries, both serial and parallel, consider memory. Check the paging rate and make sure the system is using memory as effectively as possible. Check buffer, sort, and hash area sizing.

Is There Skew?

If parallel execution is occurring, is there unevenness in workload distribution? For example, if there are 10 CPUs and a single user, you can see whether the workload is evenly distributed across CPUs. This may vary over time, with periods that are more or less I/O intensive, but in general each CPU should have roughly the same amount of activity.

The statistics in V$PQ_TQSTAT show rows produced and consumed per query server process. This is a good indication of skew, and does not require single user operation.

Operating system statistics show you the per-processor CPU utilization and per-disk I/O activity. Concurrently running tasks make it harder to see what is going on, however. It would be useful to run in single user mode and check operating system monitors which show system level CPU and I/O activity.

When workload distribution is unbalanced, a common culprit is the presence of skew in the data. For a hash join, this may be the case if the number of distinct values is less than the degree of parallelism. When joining two tables on a column with only 4 distinct values, you will not get scaling on more than 4. If you have 10 CPUs, 4 of them will be saturated but 6 will be idle. To avoid this problem, change the query: use temporary tables to change the join order such that all queries have more values in the join column than the number of CPUs.

If I/O problems occur you may need to reorganize your data, spreading it over more devices. If parallel execution problems occur, check to be sure you have followed the recommendation to spread data over at least as many devices as CPUs.

If there is no skew in workload distribution, check for the following conditions:

Is there device contention?
Is the system I/O bound, with too little parallelism? If so, consider increasing parallelism up to the number of devices.
Is the system CPU bound, with too much parallelism? Check the operating system CPU monitor to see whether a lot of time is being spent in system calls. The resource may be overcommitted, and too much parallelism may cause processes to compete with themselves.
Are there more concurrent users than the system can support?

Executing Parallel SQL Statements

After analyzing your tables and indexes you should be able to run queries and see speedup that scales linearly with the degree of parallelism used. The following operations should scale:

table scans
nested loop join
sort merge join
hash join
"not in"
group by
select distinct
union and union all
aggregation
PL/SQL functions called from SQL
order by
create table as select

Start with simple parallel queries. Evaluate total I/O throughput with SELECT COUNT(*) FROM facts. Evaluate total CPU power by adding a complex WHERE clause. I/O imbalance may suggest a better physical database layout. After you understand how simple scans work, add aggregation, joins, and other operations that reflect individual aspects of the overall workload. Look for bottlenecks.

Query performance is not the only thing you must monitor. You should also monitor parallel load and parallel index creation, and look for good utilization of I/O and CPU resources.

Using EXPLAIN PLAN to See How a Query is Parallelized

Use an EXPLAIN PLAN statement to view the sequential and parallel query plan. The OTHER_TAG column of the plan table summarizes how each plan step is parallelized, and describes the text of the query that is used by query servers for each operation. Optimizer cost information for selected plan steps is given in the cost, bytes, and cardinality columns. Table 20-1 on page 20-5 summarizes the meaning of the OTHER_TAG column.

EXPLAIN PLAN thus provides detailed information as to where specific operations are being performed. You can then change the execution plan for better performance. For example, if many steps are serial (where OTHER_TAG is blank, serial to parallel, or parallel to serial), then the query controller could be a bottleneck.

Consider the following example SQL statement, which summarizes sales data by region:

EXPLAIN PLAN SET STATEMENT_ID = `Jan_Summary' FOR
SELECT dim_1 SUM(meas1) FROM facts WHERE dim_2 < `02-01-1995'
GROUP BY dim_1

The following SQL script extracts a compact hierarchical plan from the output of EXPLAIN PLAN:

SELECT
SUBSTR( lpad(' ',2*(level-1)) ||
decode(id, 0, statement_id, operation)  ||
' ' || options    || ' ' || object_name ||
' ('|| cardinality|| ',' || bytes       ||
' ,'|| cost       || ')' || other_tag ,
1, 79) "step (card,bytes,cost) par"
from plan_table
start with id = 0
connect by prior id = parent_id
and prior nvl(statement_id,' ') =
nvl(statement_id,' ');

Following is the query plan for "Jan_Summary":

Jan_Summary   (32921, 5695333, 7309)
	SORT GROUP BY  (,,) PARALLEL_TO_SERIAL
	VIEW  facts (834569, 134008732, 6360) PARALLEL_TO_PARALLEL
	      UNION-ALL PARTITION  (,,) PARALLEL_COMBINED_WITH_PARENT
	        TABLE ACCESS FULL fact_1 (815044,119570335, 6000) PARALLEL_COMBINED_WITH_PARENT
	        FILTER   (,,) PARALLEL_COMBINED_WITH_PARENT
	          TABLE ACCESS FULL fact_2 (,,) PARALLEL_COMBINED_WITH_PARENT
	        . . .
	        FILTER   (,,) PARALLEL_COMBINED_WITH_PARENT
	          TABLE ACCESS FULL fact_12 (,,) PARALLEL_COMBINED_WITH_PARENT

As shown by the PARALLEL_TO_PARALLEL keyword, a parallel partition view shows up as a subtree of operations that are all executed as a unit in each parallel query server. In the preceding example, the union-all partition and full scans for all partitions execute as a unit in the same set of parallel processes.

Each parallel query server potentially scans a portion of each partition, rather than each server scanning all of one partition. This provides better load balancing. Some of the full scans are skipped, as shown by the presence of filter nodes.

The following figure illustrates how, with the PARALLEL_TO_PARALLEL keyword, data from the partition view is redistributed to a second set of parallel query servers for parallel grouping. Query server set 1 executes the partition view and its subtree of operations from the preceding example. The rows coming out are repartitioned through the table queue to query server set 2, which executes the group by operation. Because the GROUP BY operation indicates PARALLEL_TO_SERIAL, another table queue collects its results and sends it to the query coordinator, and then to the user.

Data Redistribution among Parallel Query Servers

As a rule, if the PARALLEL_TO_PARALLEL keyword exists, there will be two sets of query servers. This means that for grouping, sort merge, or hash joins, twice the number of parallel query servers will be assigned to the query. This requires redistribution of data or rows from set 1 to set 2. If there is no PARALLEL_TO__TO_PARALLEL keyword, then the query will get just one set of servers. Such serial processes include aggregations, such as COUNT * FROM facts or SELECT facts WHERE DATE = '7/1/94'.

For non-distributed queries, the OBJECT_NODE column gives the name of the table queue. If the PARALLEL_TO_PARALLEL keyword exists, then the EXPLAIN PLAN of the parent operation should have SQL that references the child table queue in its FROM clause. In this way it describes the order in which the output from operations is consumed.

Dynamic Performance Tables

Dynamic performance tables are views of internal Oracle7 data structures and statistics that you can query periodically to monitor progress of a long-running operation. When used in conjunction with data dictionary views, these tables provide a wealth of information. The challenge is visualizing the data and then acting upon it.

V$FILESTAT

This view sums read and write requests, number of blocks, and service times for every datafile in every tablespace. It can help you diagnose I/O problems and workload distribution problems.

The file numbers listed in V$FILESTAT can be joined to those in the DBA_DATA_FILES view to group I/O by tablespace or to find the file name for a given file number. By doing ratio analysis you can find what percentage of the total tablespace activity for each file in the tablespace If you make a practice of putting just one large, heavily accessed object in a tablespace, you can use this technique to identify objects that have a poor physical layout.

You can further diagnose disk space allocation problems using the DBA_EXTENTS view. Ensure that space is allocated evenly from all files in the tablespace. Monitoring V$FILESTAT during a long running operation and correlating I/O activity to the explain plan output is a good way to follow progress.

V$PARAMETER

This view lists the name, current value, and default value of all system parameters. In addition, the view indicates whether the parameter may be modified online with an ALTER SYSTEM or ALTER SESSION command.

V$PQ_SESSTAT

This view is valid only when queried from a session that is executing parallel SQL statements. Thus it cannot be used to monitor a long running operation. It gives summary statistics about the parallel statements executed in the session, including total number of messages exchanged between server processes and the actual degree of parallelism used.

V$PQ_SLAVE

This view tallies the current and total CPU time and number of messages sent and received per query server process. It can be monitored during a long running operation. Verify that there is little variance among processes in CPU usage and number of messages processed. A variance may indicate a load balancing problem. Attempt to correlate the variance to a variance in the base data distribution. Extreme imbalance could indicate that the number of distinct values in a join column is much less than the degree of parallelism. See "Parallel CREATE TABLE AS SELECT" on page C-4 for a possible workaround.

V$PQ_SYSSTAT

The V$PQ_SYSSTAT view aggregates session statistics from all query server processes. It sums the total query server message traffic, and gives the status of the pool of query servers.

This view can help you determine the appropriate number of query server processes for an instance. The statistics that are particularly useful are "Servers Busy", "Servers Idle", "Servers Started", and "Servers Shutdown".

Periodically examine V$PQ_SYSSTAT to determine if the query servers for the instance are actually busy. To determine whether the instance's query servers are active, issue the following query:

SELECT * FROM V$PQ_SYSSTAT 
	WHERE statistic = "Servers Busy";

STATISTIC             VALUE
--------------------- -----------
Servers Busy          70

V$PQ_TQSTAT

This view provides a detailed report of message traffic at the level of the table queue. It is valid only when queried from a session that is executing parallel SQL statements. A table queue is the pipeline between query server groups or between the query coordinator and a query server group or between a query server group and the coordinator. These table queues are represented in the query plan by the tags PARALLEL_TO_PARALLEL, SERIAL_TO_PARALLEL, or PARALLEL_TO_SERIAL, respectively.

The view contains a row for each query server process that reads or writes each table queue. A table queue connecting 10 consumers to 10 producers will have 20 rows in the view. Sum the bytes column and group by TQ_ID for the total number of bytes sent through each table queue. Compare this with the optimizer estimates; large variations may indicate a need to analyze the data using a larger sample.

Compute the variance of bytes grouped by TQ_ID. Large variances indicate workload imbalance. Drill down on large variances to determine if the producers start out with unequal distributions of data, or whether the distribution itself is skewed. The latter may indicate a low number of distinct values.

For many of the dynamic performance tables, the system parameter TIMED_STATISTICS must be set to TRUE in order to get the most useful information. You can use ALTER SYSTEM to turn TIMED_STATISTICS on and off dynamically.

Operating System Statistics

There is considerable overlap between information available in Oracle and information available though operating system utilities (such as sar and vmstat, on UNIX-based systems). Operating systems provide performance statistics on I/O, communication, CPU, memory and paging, scheduling, and synchronization primitives. The Oracle V$SESSTAT view provides the major categories of OS statistics as well.

Typically it is harder to map OS information about I/O devices and semaphore operations back to database objects and operations; on the other hand the OS may have better visualization tools and more efficient means of collecting the data.

OS information about CPU and memory usage is very important for assessing performance. Probably the most important statistic is CPU usage. The goal of low level performance tuning is to become CPU bound on all CPUs. Once this is achieved, you can pop up a level and work at the SQL level to find an alternate plan that is perhaps more I/O intensive but uses less CPU.

OS memory and paging information is valuable for fine tuning the many system parameters that control how memory is divided among memory-hungry warehouse subsystems like parallel communication, sort, and hash join.

Solving Parallel Query Performance Problems

Overriding the Default Degree of Parallelism

The default degree of parallelism is appropriate for reducing response time while guaranteeing use of CPU and I/O resources for any arbitrary query. If an operation is I/O bound, you should consider increasing the default degree of parallelism. If it is memory bound, or there are several concurrent parallel queries, consider decreasing the default degree.

Note: The default degree of parallelism is used for tables that have PARALLEL attributed to them in the data dictionary, or via the PARALLEL hint. If a table does not have parallelism attributed to it, or has NOPARALLEL (the default) attributed to it, then that table is never scanned in parallel--regardless of the default degree of parallelism that would be indicated by the number of CPUs, instances, and devices storing that table.

To override the default degree of parallelism:

Determine the maximum number of query servers your system can support.
Divide the query servers among the estimated number of concurrent queries.

Note the following general guidelines:

You can adjust the degree of parallelism either by using ALTER TABLE or by using hints.
To increase the number of concurrent queries, reduce the degree of parallelism.
For I/O bound queries, first spread the data over more disks than there are CPUs. Then, increase parallelism in stages. Stop when the query becomes CPU bound.
For example, assume a parallel indexed nested loop join is I/O bound performing the index lookups, with #CPUs=10 and #disks=36. Default degree of parallelism is 10, and this is I/O bound. You could first try parallel degree 12. If still I/O bound, you could try parallel degree 24; if still I/O bound, you could try 36.

Rewriting the SQL

The most important issue for parallel query execution is ensuring that all parts of the query plan that process a substantial amount of data execute in parallel. Use EXPLAIN PLAN and see that all plan steps have an OTHER_TAG of PARALLEL_TO_PARALLEL, PARALLEL_TO_SERIAL, PARALLEL_COMBINED_WITH_PARENT, or PARALLEL_COMBINED_WITH_CHILD. Any other keyword (or null) indicates serial execution, and a possible bottleneck.

By making the following changes you can increase the optimizer's ability to generate parallel plans:

Convert subqueries, especially correlated subqueries, into joins. Oracle can parallelize joins more efficiently than subqueries.
Use a PL/SQL function in the WHERE clause of the main query, instead of a correlated subquery

Rewrite queries with distinct aggregates as nested queries. For example, rewrite

			SELECT COUNT(DISTINCT C) FROM T

			SELECT COUNT(*)FROM (SELECT DISTINCT C FROM T)

Creating and Populating Tables in Parallel

Oracle cannot return results to a user process in parallel. If a query returns a large number of rows, execution of the query may indeed be faster; however, the user process can only receive the rows serially. To optimize parallel query performance with queries that retrieve large result sets, use PARALLEL CREATE TABLE AS SELECT to store the result set in the database. At a later time, users can view the result set in serial.

To avoid I/O bottlenecks, specify a tablespace with at least as many devices as CPUs. To avoid fragmentation in allocating space, the number of files in a tablespace should be a multiple of the number of CPUs.

Creating Temporary Tables in Parallel

When combined with the UNRECOVERABLE option in Oracle7, the parallel version of the CREATE TABLE AS SELECT statement provides a very efficient temporary table facility. For example,

CREATE TABLE summary PARALLEL UNRECOVERABLE  
	AS SELECT dim_1, dim_2 ..., SUM (meas_1) FROM facts
GROUP BY dim_1, dim_2;

You can take advantage of intermediate tables using the following techniques:

Common subqueries can be computed once and referenced many times. This may be much more efficient than referencing a complex view many times.
Complex queries can be decomposed into simpler steps in order to provide application-level checkpoint/restart. For example, a complex multi-table join on a database 1 Terabyte in size could run for dozens of hours. A crash during this query would mean starting over from the beginning. Using CREATE TABLE AS SELECT, the query can be rewritten as a sequence of simpler queries that run for a few hours each. After a system failure, the query can be restarted from the last completed step.
Materialize a Cartesian product. This may allow queries against star schemas to execute in parallel. It may also increase scalability of parallel hash joins by increasing the number of distinct values in the join column.
Consider a huge table of retail sales data that is joined to region and to department lookup tables. There are 5 regions and 25 departments. If the huge table is joined to regions using parallel hash partitioning, the maximum speedup is 5. Similarly, if the huge table is joined to departments, the maximum speedup is 25. But if a temporary table containing the Cartesian product of regions and departments is joined with the huge table, the maximum speedup is 125.
Index range scans can be used to create a table, and that table can be scanned in parallel. This may allow queries that have no large table scans to execute in parallel.
Parallel deletes can be implemented efficiently by creating a new table that omits the unwanted rows from the original table. The original table can then be dropped.
Summary tables for multidimensional drill-down analysis can be created efficiently. For example, a summary table might store the sum of revenue grouped by month, brand, region, and salesperson.
Tables can be reorganized (chained rows eliminated, free space compressed, and so on) by copying the old table to a new table much faster than export/import and easier than reloading.
Note: Be sure to use the ANALYZE command on newly created tables. Also consider creating indexes.

Creating Indexes in Parallel

Multiple processes can work together simultaneously to create an index. By dividing the work necessary to create an index among multiple server processes, the Oracle Server can create the index more quickly than if a single server process created the index sequentially.

Parallel index creation works in much the same way as a table scan with an ORDER BY clause. The table is randomly sampled and a set of index keys is found that equally divides the index into the same number of pieces as the degree of parallelism. A first set of query processes scans the table, extracts key,rowid pairs, and sends each pair to a process in a second set of query processes based on key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the coordinator simply concatenates the pieces (which are ordered) to form the final index. .

You can optionally specify that no redo logging should occur during index creation. This can significantly improve performance, but temporarily renders the index unrecoverable. Recoverability is restored after the new index is backed up. If your application can tolerate this window where recovery of the index requires it to be recreated, then you should consider using the UNRECOVERABLE option.

By default, the Oracle Server uses the table definition's PARALLEL clause value to determine the number of server processes to use when creating an index. You can override the default number of processes by using the PARALLEL clause in the CREATE INDEX command.

Attention: When creating an index in parallel, the STORAGE clause refers to the storage of each of the subindexes created by the query server processes. Therefore, an index created with an INITIAL of 5MB and a PARALLEL DEGREE of 12 consumes at least 60MB of storage during index creation because each process starts with an extent of 5MB. When the query coordinator process combines the sorted subindexes, some of the extents may be trimmed, and the resulting index may be smaller than the requested 60MB.

When you add or enable a UNIQUE key or PRIMARY KEY constraint on a table, you cannot automatically create the required index in parallel. Instead, manually create an index on the desired columns using the CREATE INDEX command and an appropriate PARALLEL clause and then add or enable the constraint. Oracle then uses the existing index when enabling or adding the constraint.

Refer to the Oracle7 Server SQL Reference for the complete syntax of the CREATE INDEX command.

Using Direct Disk I/O for Sorts with Parallel Query

Setting SORT_DIRECT_WRITES to AUTO causes each process performing large sorts to write its own sort runs to disk. When SORT_DIRECT_WRITES is FALSE, the DBWR process writes sort runs to disk via the buffer cache. Using AUTO provides better scalability for large parallel sorts because it removes contention for shared buffers.

Using Hints with the Cost Based Optimization Approach

Cost-based optimization is a highly sophisticated approach to finding the best execution plan for SQL statements. Cost-based optimization is automatically used with parallel query.

Attention: You must use ANALYZE to gather current statistics for cost-based optimization. In particular, tables used in parallel should always be analyzed.

Use discretion in employing hints. With the parallel aware optimizer cost-based optimization generates such good plans that hints should rarely be necessary. If used at all, hints should come as a final step in tuning, and only when they demonstrate a necessary and significant performance advantage. In such cases, begin with the execution plan recommended by cost-based optimization, and only go on to test the effect of hints after you have quantified your performance expectations.

Remember that hints are powerful; if you use them and the underlying data changes you may need to change the hints. Otherwise, the effectiveness of your execution plans may deteriorate.

Always use cost-based optimization unless you have an existing application that has been hand-tuned for rule-based optimization. If you must use rule-based optimization, rewriting a SQL statement can give orders of magnitude improvements.

See Also: "OPTIMIZER_PERCENT_PARALLEL" on page 18-5. This parameter controls parallel awareness.

Prev Next

Library

Product

Contents

Index

	Each Index, One at a Time	All Indexes Concurrently
Serial	Slow	Use scripting language. I/O interference is possible: processes could interfere with each other, repeatedly scanning the table.
Parallel	Recommended: Run each statement, one at a time, parallelized. Spreads data over files.	May overload system capacity