Oracle7 Server Tuning

Library

Product

Contents

Index

Tuning I/O

Tuning I/O
What Are I/O Problems?
How to Detect I/O Problems
How to Solve I/O Problems
Reducing Disk Contention by Distributing I/O
Striping Disks
Avoiding Dynamic Space Management
Tuning Sorts
Tuning Checkpoints
Tuning LGWR and DBWR I/O

Tuning I/O

This chapter explains how to avoid I/O bottlenecks that could prevent Oracle from performing at its maximum potential. It covers the following topics:

What Are I/O Problems?

The performance of many software applications is inherently limited by disk I/O. Often, CPU activity must be suspended while I/O activity completes. Such an application is said to be "I/O bound". Oracle is designed so that performance need not be limited by I/O.

Tuning I/O can help performance if a disk containing database files is operating at its capacity. However, tuning I/O cannot help performance in "CPU bound" cases, or cases in which your computer's CPUs are operating at their capacity.

It is important to tune I/O after following the recommendations presented in Chapter 13, "Tuning Memory Allocation". That chapter explains how to allocate memory so as to reduce I/O to a minimum. After reaching this minimum, follow the instructions in the present chapter to perform the necessary I/O as efficiently as possible.

Tuning I/O: Top Down and Bottom Up

When you are designing a new system, you should analyze I/O needs from the top down, determining what resources you will require in order to achieve the desired performance.

For an existing system, you should approach I/O tuning from the bottom up:

Determine the number of disks on the system.
Determine the number of disks that are being used by Oracle.
Determine the type of I/Os that your system performs.
Ascertain whether the I/Os are going to the file system or to raw devices.
Determine how to spread objects over multiple disks, using either manual striping or striping software.
Calculate the level of performance you can expect.

How to Analyze I/O Requirements

This section explains how to determine your system's I/O requirements.

Figure the total throughput your application will require.
Begin by figuring out the number of reads and writes involved in each transaction, and distinguish the objects against which each operation is performed.
In an OLTP application, for example, each transaction might involve:
One transaction in this example thus requires 2 reads and 1 write, all to different objects.
Define the I/O performance target for this application by specifying the number of transactions per second (or "tps") which the system must support.
In this example, the designer might specify that 100 tps would constitute an acceptable level of performance. To achieve this, the system must be able to perform 300 I/Os per second:
Figure the number of disks needed to achieve this performance.
To do this, ascertain the number of I/Os that each disk can perform per second. This will depend on three factors:
In general, disk speed tends to have the following characteristics:

Disk Speed:

File System

Raw Devices

Reads per second

fast

slow

Writes per second

slow

fast

Lay out in a table like the following the relative speed per operation of your disks:

Disk Speed:

File System

Raw Devices

Reads per second

Writes per second

The disks in this example have the following characteristics:

Disk Speed:

File System

Raw Devices

Reads per second

50

45

Writes per second

20

50

Figure the number of disks needed to achieve your I/O performance target. Use a table like this:

Object	If Stored on File System			If Stored on Raw Devices
Object	R/W Needed per Sec.	Disk R/W Capabil. per Sec.	Disks Needed	R/W Needed per Sec.	Disk R/W Capabil. per Sec.	Disks Needed
A
B
C
Disks Req'd

The following table shows the values from this example.

Object

If Stored on File System

If Stored on Raw Devices

R/W
Needed per Sec.

Disk R/W Capabil. per Sec.

Disks Needed

R/W
Needed per Sec.

Disk R/W Capabil. per Sec.

Disks Needed

A

100 reads

50 reads

2 disks

100 reads

45 reads

2 disks

B

100 reads

50 reads

2 disks

100 reads

45 reads

2 disks

C

100 writes

20 writes

5 disks

100 writes

50 writes

2 disks

Disks Req'd

9 disks

6 disks

How to Plan File Storage

This section explains how to determine whether your application will run best on the disks you have available, if you store the data on raw devices, block devices, or directly on the file system.

Design Approach

Use the following approach to design file storage:

Identify the operations required by your application.
Test the performance of your system's disks for the different operations required by your application.
Finally, evaluate what kind of disk layout will give you the best performance for the operations which predominate in your application.

Identifying the Required Read/Write Operations

Evaluate your application to determine how often it requires each type of I/O operation. The following table shows the types of read and write operations performed by each of the background processes, by foreground processes, and by parallel query slaves.

Operation	Process
Operation	LGWR	DBWR	ARCH	SMON	PMON	CKPT	Fore- ground	PQ Slave
Sequential Read			X	X		X	X	X
Sequential Write	X		X			X
Random Read				X			X
Random Write		X

In this discussion, a sample application might involve 50% random reads, 25% sequential reads, and 25% random writes.

Testing Performance of Your Disks

This section illustrates relative performance of read/write operations by a particular test system. On raw devices, reads and writes are done on the character level; on block devices, these operations are done on the block level. (Note also that many concurrent processes may generate overhead due to head and arm movement of the disks.)

Attention: The figures provided in this example do not constitute a rule of thumb. They were generated by a given UNIX-based test system using particular disks. These figures will differ significantly for different platforms and different disks! To make accurate judgments, you must test your own system using an approach like the one demonstrated in this section. Alternatively, contact your system vendor for information on relative disk performance for the different operations.

The following table and graph show speed of sequential read in milliseconds per I/O, for each of the three disk layout options on a test system.

Block Size

Speed of Sequential Read on

Raw Device

Block Device

UNIX File System

512 bytes

1.4

0.6

0.4

1K

1.4

0.6

0.3

2K

1.5

1.1

0.6

4K

1.6

1.8

1.0

8K

2.7

3.0

1.5

16K

5.1

5.3

3.7

32K

10.1

10.3

8.1

64K

20.0

20.3

18.0

128K

40.4

41.3

36.1

256K

80.7

80.3

61.3

Doing research like this helps you pick the right stripe size. In this example, if the data is striped over 16 disks at a stripe size of 16K, you can read in chunks of 256K, and the average time to read would be 5 to 10 milliseconds. By contrast, if all the data were on one disk, read time would be 80 milliseconds. Thus the test results show that on this particular set of disks, things look quite different from what might be expected: it is sometimes beneficial to have a smaller stripe size, depending on the size of the I/O.

Block Size and Speed of Sequential Read (Sample Data)

The following table and graph show speed of sequential write in milliseconds per I/O, for each of the three disk layout options on the test system.

Block Size	Speed of Sequential Write on
Block Size	Raw Device	Block Device	UNIX File System
512 bytes	11.2	11.8	17.9
1K	11.7	11.9	18.3
2K	11.6	13.0	19.0
4K	12.3	13.8	19.8
8K	13.5	13.8	21.8
16K	16.0	27.8	35.3
32K	19.3	55.6	62.2
64K	31.5	111.1	115.1
128K	62.5	224.5	221.8
256K	115.6	446.1	429.0

Block Size and Speed of Sequential Write (Sample Data)

The following table and graph show speed of random read in milliseconds per I/O, for each of the three disk layout options on the test system.

Block Size	Speed of Random Read on
Block Size	Raw Device	Block Device	UNIX File System
512 bytes	12.3	13.8	15.5
1K	12.0	14.3	14.1
2K	13.4	13.7	15.0
4K	13.9	14.1	15.3
8K	15.4	86.9	14.4
16K	19.1	86.1	39.7
32K	25.7	88.8	39.9
64K	38.1	106.4	40.2
128K	64.3	128.2	62.2
256K	115.7	176.1	91.2

Block Size and Speed of Random Read (Sample Data)

The following table and graph show speed of random write in milliseconds per I/O, for each of the three disk layout options on the test system.

Block Size	Speed of Random Write on
Block Size	Raw Device	Block Device	UNIX File System
512 bytes	12.3	25.2	40.7
1K	12.0	24.5	41.4
2K	12.6	25.6	41.6
4K	13.8	25.0	41.4
8K	14.8	15.5	32.8
16K	17.7	30.7	45.6
32K	24.8	59.8	71.6
64K	38.0	118.7	123.8
128K	74.4	235.9	230.3
256K	137.4	471.0	441.5

Block Size and Speed of Random Write (Sample Data)

Evaluate Disk Layout Options

Knowing the types of operation which predominate in your application, and the speed with which your particular system can process the corresponding I/Os, you can choose the disk layout which will maximize performance.

For example, with the sample application and test system described previously, the UNIX file system would be a good choice. With random reads predominating (50% of all I/O operations), 8K would be a good block size. Raw devices and UNIX file system provide comparable performance of random reads at this block size. Furthermore, the UNIX file system in this example processes sequential reads almost twice as fast as raw devices, given an 8K block size.

Attention: Figures shown in the preceding example will differ significantly on different platforms, and with different disks! To plan effectively you must test I/O performance on your own system!

How to Choose Data Block Size

Table data in the database is stored in data blocks. This section describes how to allocate space within data blocks for best performance. With single block I/O (random read), for best performance you want to get all the desired data from a single block in one read. How you store the data determines whether or not this performance objective will be achieved. It depends on two factors: storage of the rows, and block size.

The operating system I/O size should be equal to or greater than the database block size. Sequential read performance will improve if operating system I/O size is twice or three times the database block size (as in the example in "Testing Performance of Your Disks"). This assumes that the operating system can buffer the I/O so that the next block will be read from that particular buffer.

The following figure illustrates the suitability of various block sizes to online transaction processing or decision support applications.

Block Size and Application Type

Block Size Advantages and Disadvantages

This section describes advantages and disadvantages of different block sizes.

Block Size	Advantages	Disadvantages
Small (2K-4K)	Reduces block contention Good for small rows, or lots of random access	Has relatively large overhead You may end up storing only a small number of rows, depending on the size of the row.
Medium (8K)	If rows are of medium size, you can bring a number of rows into the buffer cache with a single I/O. With 2K or 4K block size, you may only bring in a single row.	Space in the buffer cache will be wasted if you are doing random access to small rows and have a large block size. For example, with an 8K block size and 50B row size, you would be wasting 7,950B in the buffer cache when doing random access.
Large (16K-32K)	There is relatively less overhead, thus more room to store useful data. Good for sequential access, or very large rows.	Large block size is not good for index root block caching.

How to Evaluate Device Bandwidth

The number of I/Os a disk can do depends on whether the operations involve reading or writing to objects stored on raw devices or on the file system. This affects the number of disks you must use to achieve the desired level of performance.

How to Detect I/O Problems

If you suspect a problem with I/O usage, you must evaluate two areas:

Oracle compiles file I/O statistics that reflect disk access to database files. Note, however, that these statistics only report the I/O utilization of Oracle sessions--yet every process running on your system affects the available I/O resources. Tuning non-Oracle factors can thus result in better Oracle performance.

Checking System I/O Utilization

Use operating system monitoring tools to see what processes are running on the system as a whole, and to monitor disk access to all files. Remember that disks holding datafiles and redo log files may also hold files that are not related to Oracle. Try to reduce any heavy access to disks that contain database files. Access to non-Oracle files can only be monitored through operating system facilities rather than through the V$FILESTAT table.

Tools such as sar -d on many UNIX systems enable you to examine the iostat I/O statistics for your entire system. (Some UNIX-based platforms have an iostat command.) On NT systems, use Performance Monitor.

Attention: For information on other platforms, please check your operating system documentation.

Checking Oracle I/O Utilization

This section identifies the views and processes which provide Oracle I/O statistics, and shows how to check statistics using V$FILESTAT.

Which Dynamic Performance Tables Contain I/O Statistics

The following table shows which dynamic performance tables to check for I/O statistics relating to Oracle database files, log files, archive files, and control files.

File Type

Where to Find Statistics

Database Files

V$FILESTAT

Log Files

V$SYSSTAT, V$SYSTEM_EVENT, V$SESSION_EVENT

Archive Files

V$SYSTEM_EVENT, V$SESSION_EVENT

Control Files

V$SYSTEM_EVENT, V$SESSION_EVENT

Where to Find Statistics about Oracle Files

Which Processes Reflect Oracle File I/O

The following table lists processes whose statistics reflect I/O throughput for the different Oracle file types.

File	Process
File	LGWR	DBWR	ARCH	SMON	PMON	CKPT	Fore- ground	PQ Slave
Database Files		X		X	X	X	X	X
Log Files	X
Archive Files			X
Control Files	X	X	X	X	X	X	X	X

V$SYSTEM_EVENT, for example, shows the total number of I/Os and average duration, by type of I/O. In this way you can determine which types of I/O are too slow.

If there are Oracle-related I/O problems, tune them. But if your process is not consuming the available I/O resources, then some other process is. Go back to the system to identify the process which is using up so much I/O, and determine why. See if you can tune this process.

Note: Different types of I/O in Oracle require different tuning approaches. Tuning I/O for DSS applications is different from tuning I/O for OLTP applications. DSS applications characteristically do large sequential reads, whereas OLTP is characterized by random reads and writes.

How to Check Oracle Datafile I/O with V$FILESTAT

Examine disk access to database files through the dynamic performance table V$FILESTAT. This view shows the following for database I/O (but not for log file I/O):

number of physical reads and writes
number of blocks read and written
total I/O time for reads and writes

By default, this table is only available to the user SYS and to users granted SELECT ANY TABLE system privilege, such as SYSTEM. The following column values reflect the number of disk accesses for each datafile:

PHYRDS	The value of this column is the number of reads from each database file.
PHYWRTS	The value of this column is the number of writes to each database file.

Use the following query to monitor these values over some period of time while your application is running:

SELECT name, phyrds, phywrts
   FROM v$datafile df, v$filestat fs
   WHERE df.file# = fs.file#;

This query also retrieves the name of each datafile from the dynamic performance table V$DATAFILE. Sample output might look like this:

NAME                                             PHYRDS    PHYWRTS
-------------------------------------------- ---------- ----------
/oracle/ora70/dbs/ora_system.dbf                   7679       2735
/oracle/ora70/dbs/ora_temp.dbf                       32        546

The PHYRDS and PHYWRTS columns of V$FILESTAT can also be obtained through SNMP.

The total I/O for a single disk is the sum of PHYRDS and PHYWRTS for all the database files managed by the Oracle instance on that disk. Determine this value for each of your disks. Also determine the rate at which I/O occurs for each disk by dividing the total I/O by the interval of time over which the statistics were collected.

How to Solve I/O Problems

The rest of the sections in this chapter provide strategies for solving I/O problems. These include:

Reducing Disk Contention by Distributing I/O
Striping Disks
Avoiding Dynamic Space Management
Tuning Sorts
Tuning Checkpoints
Tuning LGWR and DBWR I/O

Reducing Disk Contention by Distributing I/O

This section describes how to reduce disk contention.

What Is Disk Contention?

Disk contention occurs when multiple processes try to access the same disk simultaneously. Most disks have limits on both the number of accesses and the amount of data they can transfer per second. When these limits are reached, processes may have to wait to access the disk.

In general, consider the statistics in the V$FILESTAT table and your operating system facilities. Consult your hardware documentation to determine the limits on the capacity of your disks. Any disks operating at or near full capacity are potential sites for disk contention. For example, 40 or more I/Os per second is excessive for most disks on VMS or UNIX operating systems.

To reduce the activity on an overloaded disk, move one or more of its heavily accessed files to a less active disk. Apply this principle to each of your disks until they all have roughly the same amount of I/O. This is referred to as distributing I/O.

Separating Datafiles and Redo Log Files

Oracle processes constantly access datafiles and redo log files. If these files are on common disks, there is potential for disk contention. Place each datafile on a separate disk. Multiple processes can then access different files concurrently without disk contention.

Place each set of redo log files on a separate disk with no other activity. Redo log files are written by the Log Writer process (LGWR) when a transaction is committed. Information in a redo log file is written sequentially. This sequential writing can take place much faster if there is no concurrent activity on the same disk.

Dedicating a separate disk to redo log files usually ensures that LGWR runs smoothly with no further tuning attention. Performance bottlenecks related to LGWR are rare. For information on tuning LGWR, see the section "Detecting Contention for Redo Log Buffer Latches" on page 16-13.

Note: Mirroring redo log files, or maintaining multiple copies of each redo log file on separate disks, does not slow LGWR considerably. LGWR writes to each disk in parallel and waits until each part of the parallel write is complete. Since the time required for your operating system to perform a single-disk write may vary, increasing the number of copies increases the likelihood that one of the single-disk writes in the parallel write will take longer than average. A parallel write will not take longer than the longest possible single-disk write. There may also be some overhead associated with parallel writes on your operating system.

Dedicating separate disks and mirroring redo log files are important safety precautions. Dedicating separate disks to datafiles and redo log files ensures that both the datafiles and the redo log files cannot be lost in a single disk failure. Mirroring redo log files ensures that a redo log file cannot be lost in a single disk failure.

Striping Table Data

Striping, or spreading a large table's data across separate datafiles on separate disks, can also help to reduce contention. This strategy is fully discussed in the section "Striping Disks" on page 14-23.

Separating Tables and Indexes

It is not necessary to separate a frequently used table from its index. During the course of a transaction, the index is read first, and then the table is read. Because these I/Os occur sequentially, the table and index can be stored on the same disk without contention.

Reducing Disk I/O Unrelated to Oracle

If possible, eliminate I/O unrelated to Oracle on disks that contain database files. This measure is especially helpful in optimizing access to redo log files. Not only does this reduce disk contention, it also allows you to monitor all activity on such disks through the dynamic performance table V$FILESTAT.

Striping Disks

This section describes:

What Is Striping?

"Striping" is the practice of dividing a large table's data into small portions and storing these portions in separate datafiles on separate disks. This permits multiple processes to access different portions of the table concurrently without disk contention. Striping is particularly helpful in optimizing random access to tables with many rows. Striping can either be done manually (described below), or through operating system striping utilities.

I/O Balancing and Striping

Benchmark tuners in the past tried hard to ensure that the I/O load was evenly balanced across the available devices. At present, operating systems are providing the ability to stripe a heavily used container file across many physical devices. However, such techniques are only productive where the load re-distribution eliminates or reduces some form of queue.

If I/O queues exist or are suspected, then load distribution across the available devices is a natural tuning step. Where larger numbers of physical drives are available, consider dedicating two drives to carrying redo logs (two because redo logs should always be mirrored either by the operating system or using Oracle redo log group features). Since redo logs are written serially, any drive dedicated to redo log activity will normally require very little head movement. This will significantly speed up log writing.

If archiving, it is beneficial to use extra disks so that LGWR and ARCH do not compete for the same read/write head. This is achieved by placing logs on alternating drives.

Note that mirroring can also be a cause of I/O bottlenecks. The process of writing to each mirror is normally done in parallel, and does not cause a bottleneck. However, if each mirror is striped differently, then the I/O is not completed until the slowest member is finished. To avoid I/O problems, striping should be done on the same number of disks as the data itself.

For example, if you have 160K of data striped over 8 disks, but the data is mirrored onto only one disk, then regardless of how quickly the data is processed on the 8 disks, the I/O is not completed until 160K has been written onto the mirror disk. It might thus take 20.48 milliseconds to write the database, but 137 milliseconds to write the mirror.

How to Stripe Disks Manually

To stripe disks manually, you need to relate the object's storage requirements to its I/O requirements.

Begin by evaluating the object's disk storage requirements. You need to know
- the size of the object
- the size of the disk
For example, if the object requires 5G in Oracle storage space, you would need one 5G disk or two 4G disks to accommodate it. On the other hand, if the system is configured with 1G or 2G disks, the object may require 5 or 3 disks, respectively.
Compare to this the application's I/O requirements, as described in "How to Analyze I/O Requirements" on page 14-3. You must take the larger of the two values.
For example, if the storage requirement is 5 disks (1G each), and the I/O requirement is 2 disks, then your application requires the higher value: 5 disks.

Create a tablespace with the CREATE TABLESPACE command. Specify the datafiles in the DATAFILE clause. Each of the files should be on a different disk.

			CREATE TABLESPACE stripedtabspace
			   DATAFILE 'file_on_disk_1' SIZE 1GB,
			      'file_on_disk_2' SIZE 1GB,
			      'file_on_disk_3' SIZE 1GB,
			      'file_on_disk_4' SIZE 1GB,
			      'file_on_disk_5' SIZE 1GB;

Then create the table with the CREATE TABLE command. Specify the newly created tablespace in the TABLESPACE clause.
Also specify the size of the table extents in the STORAGE clause. Store each extent in a separate datafile. The table extents should be slightly smaller than the datafiles in the tablespace to allow for overhead. For example, when preparing for datafiles of 1G (1024MB), you can set the table extents to be 1023MB:
```
			CREATE TABLE stripedtab (
			   col_1  NUMBER(2),
			   col_2  VARCHAR2(10) )
			   TABLESPACE stripedtabspace
			   STORAGE ( INITIAL 1023MB  NEXT 1023MB
			      MINEXTENTS 5  PCTINCREASE 0 );
```

These steps result in the creation of table STRIPEDTAB. STRIPEDTAB has 5 initial extents, each of size 1023MB. Each extent takes up one of the datafiles named in the DATAFILE clause of the CREATE TABLESPACE statement. These files are all on separate disks. These 5 extents are all allocated immediately, since MINEXTENTS is 5.

How to Stripe Disks with Operating System Software

Alternatively, you can use operating system striping software, such as a logical volume manager (LVM), to stripe disks.

With striping software, the biggest concern is choosing the right stripe size. This depends on Oracle block size and type of disk access.

Disk Access

Minimum Stripe Size

Random reads and writes

The minimum stripe size is twice the Oracle block size.

Sequential reads

The minimum stripe size is twice the value of DB_FILE_MULTIBLOCK_READ_COUNT.

In striping, uniform access to the data is assumed. If the stripe size is too big you can end up with a hot spot on one disk or on a small number of disks. You can avoid this problem by making the stripe size smaller, thus spreading the data over more disks.

Consider an example in which 100 rows of fixed size are evenly distributed over 5 disks, with each disk containing 20 sequential rows. If access is needed only to rows 35 through 55, then only 2 disks must handle all the I/O. At this rate, the system cannot achieve the desired level of performance.

You can correct this problem by spreading the rows of interest across more disks. In the current example, if there were two rows per block, then rows 35 and 36 would be on the same disk, and row 37 would be on a different disk. The data of interest would be spread over all the disks and I/O throughput would be much improved.

How to Do Hardware Striping with RAID

Redundant arrays of inexpensive disks (RAID) can offer significant advantages in their failure resilience features. They also permit striping to be quite easily achieved, but do not appear to provide any significant performance advantage. In fact, they may impose a higher cost in I/O overhead.

In some instances, performance can be improved by not using the full features of RAID technology. In other cases, RAID technology's resilience to single component failure may justify its cost in terms of performance.

Avoiding Dynamic Space Management

When an object such as a table or rollback segment is created, space is allocated in the database for the data. This space is called a segment. If subsequent database operations cause the data to grow and exceed the space allocated, Oracle extends the segment. Dynamic extension can reduce performance. This section discusses

Detecting Dynamic Extension

Dynamic extension causes Oracle to execute SQL statements in addition to those SQL statements issued by user processes. These SQL statements are called recursive calls because Oracle issues these statements itself. Recursive calls are also generated by these activities:

misses on the data dictionary cache
firing of database triggers
execution of Data Definition Language statements
execution of SQL statements within stored procedures, functions, packages, and anonymous PL/SQL blocks
enforcement of referential integrity constraints

Examine the recursive calls statistic through the dynamic performance table V$SYSSTAT. By default, this table is only available to the user SYS and to users granted the SELECT ANY TABLE system privilege, such as SYSTEM. Monitor this statistic over some period of time while your application is running, with this query:

SELECT name, value
   FROM v$sysstat
   WHERE name = 'recursive calls';

The output of this query might look like this:

NAME                                                         VALUE
------------------------------------------------------- ----------
recursive calls                                             626681

If Oracle continues to make excessive recursive calls while your application is running, determine whether these recursive calls are due to one of the activities that generate recursive calls other than dynamic extension. If you determine that these recursive calls are caused by dynamic extension, you should try to reduce this extension by allocating larger extents.

Allocating Extents

Follow these steps to avoid dynamic extension:

Determine the maximum size of your object. For formulas to predict how much space to allow for a table, see the Oracle7 Server Administrator's Guide.
Choose storage parameter values so that Oracle allocates extents large enough to accommodate all of your data when you create the object.

Larger extents tend to benefit performance for these reasons:

Since blocks in a single extent are contiguous, one large extent is more contiguous than multiple small extents. Oracle can read one large extent from disk with fewer multi-block reads than would be required to read many small extents.
Segments with larger extents are less likely to be extended.

However, since large extents require more contiguous blocks, Oracle may have difficulty finding enough contiguous space to store them. To determine whether to allocate few large extents or many small extents, consider the benefits and drawbacks of each in light of your plans for the growth and use of your tables.

Automatically resizable datafiles can also cause a problem with dynamic extension. Avoid using the automatic extension. Instead, manually allocate more space to a datafile during times when the system is relatively inactive.

Unlimited Extents

Even though an object may have unlimited extents, this does not mean that having a large number of small extents is acceptable. For optimal performance you may decide to reduce the number of extents.

Extent maps list all the extents for a certain segment. The number of extents per block depends on block size and platform. Although an extent is a data structure inside Oracle, the size of this data structure depends on the operating system. Accordingly, this affects the number of extents which can be stored in a single block. Typically, this value is as follows:

Block Size

Max. Number of Extents

2

121

4

255

8

504

16

1032

32

2070

For best performance, you should be able to read the extent map with a single I/O. Performance will degrade if multiple I/Os are necessary for a full table scan to get the extent map.

Furthermore, a large number of extents can degrade data dictionary performance. Performance would suffer, for example, if you had 8,000 extents, and had to bring them all into the dictionary cache.

Avoiding Dynamic Space Management in Rollback Segments

The size of rollback segments can also affect performance. Rollback segment size is determined by the rollback segment's storage parameter values. Your rollback segments must be large enough to hold the rollback entries for your transactions. As with other objects, you should avoid dynamic space management in rollback segments.

Use the SET TRANSACTION command to assign transactions to rollback segments of the appropriate size based on the recommendations in the following sections. If you do not explicitly assign a rollback segment to a transaction, Oracle randomly chooses a rollback segment for it.

Warning: If you are running multiple concurrent copies of the same application, be careful not to assign the transactions for all copies to the same rollback segment. This leads to contention for that rollback segment.

Also monitor the shrinking, or dynamic deallocation, of rollback segments based on the OPTIMAL storage parameter. For information on choosing values for this parameter and monitoring rollback segment shrinking, and adjusting OPTIMAL accordingly, see the Oracle7 Server Administrator's Guide.

Example: This statement assigns the current transaction to the rollback segment OLTP_13:

SET TRANSACTION USE ROLLBACK SEGMENT oltp_13

Reducing Migrated and Chained Rows

If an UPDATE statement increases the amount of data in a row so that the row no longer fits in its data block, Oracle tries to find another block with enough free space to hold the entire row. If such a block is available, Oracle moves the entire row to the new block. This is called migrating a row. If the row is too large to fit into any available block, Oracle splits the row into multiple pieces and stores each piece in a separate block. This is called chaining a row. Rows can also be chained when they are inserted.

Dynamic space management, especially migration and chaining, is detrimental to performance:

UPDATE statements that cause migration and chaining perform poorly.
Queries that select migrated or chained rows must perform more I/O.

You can identify migrated and chained rows in a table or cluster by using the ANALYZE command with the LIST CHAINED ROWS option. This command collects information about each migrated or chained row and places this information into a specified output table. The definition of a sample output table named CHAINED_ROWS appears in a SQL script available on your distribution medium. The common name of this script is UTLCHAIN.SQL, although its exact name and location may vary depending on your operating system. Your output table must have the same column names, datatypes, and sizes as the CHAINED_ROWS table.

To reduce migrated and chained rows in an existing table, follow these steps:

Use the ANALYZE command to collect information about migrated and chained rows. For example:
```
			ANALYZE TABLE order_hist LIST CHAINED ROWS;
```

Query the output table:

			SELECT *
			   FROM chained_rows
			   WHERE table_name = 'ORDER_HIST';
			
			OWNER_NAME  TABLE_NAME  CLUST... HEAD_ROWID          TIMESTAMP
			----------  ----------  -----... ------------------  ---------
			SCOTT       ORDER_HIST       ... 0000186A.0003.0001  04-AUG-92
			SCOTT       ORDER_HIST       ... 0000186A.0002.0001  04-AUG-92
			SCOTT       ORDER_HIST       ... 0000186A.0001.0001  04-AUG-92

If the output table shows that you have many migrated or chained rows, follow these steps to eliminate migrated rows:

Create an intermediate table with the same columns as the existing table to hold the migrated and chained rows:

					CREATE TABLE int_order_hist
					   AS SELECT *
					      FROM order_hist
					      WHERE ROWID IN
					         (SELECT head_rowid
					            FROM chained_rows
					            WHERE table_name = 'ORDER_HIST');

Delete the migrated and chained rows from the existing table:

					DELETE FROM order_hist
					   WHERE ROWID IN
					      (SELECT head_rowid
					         FROM chained_rows
					         WHERE table_name = 'ORDER_HIST');

Insert the rows of the intermediate table into the existing table:

					INSERT INTO order_hist
					   SELECT *
					      FROM int_order_hist;

Drop the intermediate table:
```
					DROP TABLE int_order_history;
```

Delete the information collected in step 1 from the output table:

			DELETE FROM chained_rows
		   WHERE table_name = 'ORDER_HIST';

Use the ANALYZE command again and query the output table.
Any rows that appear in the output table are chained rows. You can eliminate chained rows by increasing your data block size. It may not be possible to avoid chaining in all situations. If your table has a LONG column or long CHAR or VARCHAR2 columns, chaining is often unavoidable.

Retrieval of migrated rows is resource intensive, thus all tables subject to UPDATE should have their distributed free space set to allow enough space within the block for the likely update.

Set PCTFREE to avoid migrated rows. If you leave more free space available in the block, the row will have room to grow. You can detect migrated rows by checking the "table fetch continued row" statistic in V$SYSSTAT. You can also reorganize (recreate) tables and indexes with high deletion rate.

Note: PCTUSED is not the opposite of PCTFREE; it concerns space management.

Modifying the SQL.BSQ File

The SQL.BSQ file is run when you issue the CREATE DATABASE statement. This file contains the actual table definitions that make up the Oracle Server. The views that you use as a DBA are based on these tables. Oracle Corporation recommends that users strictly limit their modifications to SQL.BSQ.

If necessary, you can increase the value of the following storage parameters: INITIAL, NEXT, MINEXTENTS, MAXEXTENTS, PCTINCREASE, FREELISTS, FREELIST GROUPS, and OPTIMAL.
With the exception of PCTINCREASE, you should not decrease the setting of a storage parameter to a value below the default. (If the value of MAXEXTENTS is large, you can make PCTINCREASE small--or even zero.)
No other changes to SQL.BSQ are supported. In particular, you should not add, drop, or rename a column.

Note: Internal data dictionary tables may be added, deleted, or changed from release to release. For this reason, user modifications will not be carried forward when the dictionary is migrated to later releases.

Tuning Sorts

There is a trade-off between performance and memory usage. For best performance, most sorts should occur in memory. Sorts to disk are the ones which impact performance. If the sort area size is too big, too much memory may be utilized; if the sort area size is too small, there may be lots of sorts to disk and correspondingly worse performance. This section describes:

Sorting to Memory: Allocating Sort Area

The default sort area size is adequate to hold all the data for most sorts. However, if your application often performs large sorts on data that does not fit into the sort area, then you may want to increase the sort area size. Large sorts can be caused by any SQL statement that performs a sort which operates on many rows.

Recognizing Large Sorts

Oracle collects statistics that reflect sort activity and stores them in the dynamic performance table V$SYSSTAT. By default, this table is only available to the user SYS and to users granted the SELECT ANY TABLE system privilege. These statistics reflect sort behavior:

sorts(memory)	The number of sorts small enough to be performed entirely in sort areas without I/O to temporary segments on disk.
sorts(disk)	The number of sorts too large to be performed entirely in the sort area requiring I/O to temporary segments on disk.

Use the following query to monitor these statistics over a period of time while your application is running:

SELECT name, value
   FROM v$sysstat
   WHERE name IN ('sorts(memory)', 'sorts(disk)');

The output of this query might look like this:

NAME                                                         VALUE
------------------------------------------------------- ----------
sorts(memory)                                                  965
sorts(disk)                                                      8

The information in V$SYSSTAT can also be obtained through SNMP.

Increasing SORT_AREA_SIZE to Avoid Sorting to Disk

SORT_AREA_SIZE is a dynamically modifiable initialization parameter which specifies the maximum amount of Program Global Area (PGA) memory to use for a sort. There is a trade-off between SORT_AREA_SIZE and PGA.

If a significant number of sorts require disk I/O to temporary segments, then your application's performance may benefit from increasing the size of the sort area. In this case, increase the value of the initialization parameter SORT_AREA_SIZE. The maximum value of this parameter varies depending on your operating system. You need to determine how large a SORT_AREA_SIZE makes sense. If it is big enough, most sorts should not go to disk (unless, for example, you are sorting a 10 Gigabyte table). If the sort does not go to disk, you have the option of writing or not writing to the buffer cache.

"SORT_AREA_SIZE" on page 18-9

Performance Benefits of Large Sort Areas

Increasing the sort area increases the size of each run and decreases the total number of runs. Reducing the total number of runs may reduce the number of merges Oracle must perform to obtain the final sorted result.

Performance Trade-offs for Large Sort Areas

Increasing the size of the sort area causes each Oracle process that sorts to allocate more memory. This increase reduces the amount of memory available for private SQL and PL/SQL areas. It can also affect operating system memory allocation and may induce paging and swapping. Before increasing the size of the sort area, be sure enough free memory is available on your operating system to accommodate a larger sort area.

If you increase the size of your sort area, you may consider decreasing the retained size of the sort area, or the size to which Oracle reduces the sort area if its data is not expected to be referenced soon. To decrease the retained size of the sort area, decrease the value of the initialization parameter SORT_AREA_RETAINED_SIZE. A smaller retained sort area reduces memory usage but causes additional I/O to write and read data to and from temporary segments on disk.

If You Do Sort to Disk

If you do sort to disk, make sure that PCTINCREASE is set to zero for the tablespace used for sorting. Also, INITIAL and NEXTEXTENT should be the same size. This reduces fragmentation of the tablespaces used for sorting.

Optimizing Sort Performance with Temporary Tablespaces

You can optimize sort performance of sorts by specifying a tablespace as TEMPORARY upon creation (or subsequently altering that tablespace) and performing the sort in that tablespace. Normally, a sort may require many space allocation calls to allocate and deallocate temporary segments. If a tablespace is specified as TEMPORARY, one sort segment in that tablespace is cached and used for each instance requesting a sort operation. This scheme bypasses the normal space allocation mechanism and can greatly improve performance of medium-sized sorts that cannot be done completely in memory.

To specify a tablespace as temporary, use the TEMPORARY keyword of the CREATE TABLE or ALTER TABLE commands. TEMPORARY cannot be used with tablespaces that contain permanent objects (such as tables or rollback segments).

The temporary tablespace should be striped over many disks, preferably with some operating system striping tool. For example, if the temporary tablespace is only striped over 2 disks with a maximum of 50 I/Os per second each, then you can only do 100 I/Os per second. This restriction could become a problem, making sort operations take a very long time.

You could speed up sorts fivefold if you were to stripe the temporary tablespace over 10 disks. This would enable 500 I/Os per second.

Change the SORT_READ_FAC parameter, which is a ratio that describes the amount of time necessary to read a single database block divided by the block transfer rate. The value is operating system specific; the default value is typically 5, but the parameter should usually be set to 16 or 32. This allows the system to read more blocks per pass from a temporary table. For temporary tables, SORT_READ_FAC plays a role similar to the parameter DB_FILE_MULTIBLOCK_READ_COUNT.

Using NOSORT to Create Indexes without Sorting

One cause of sorting is the creation of indexes. Creating an index for a table involves sorting all the rows in the table based on the values of the indexed column or columns. Oracle also allows you to create indexes without sorting. If the rows in the table are loaded in ascending order, you can create the index faster without sorting.

The NOSORT Option

To create an index without sorting, load the rows into the table in ascending order of the indexed column values. Your operating system may provide a sorting utility to sort the rows before you load them.

When you create the index, use the NOSORT option on the CREATE INDEX command. For example, this CREATE INDEX statement creates the index EMP_INDEX on the ENAME column of the EMP table without sorting the rows in the EMP table:

CREATE INDEX emp_index
   ON emp(ename)
   NOSORT;

Note: Specifying NOSORT in a CREATE INDEX statement negates the use of PARALLEL INDEX CREATE, even if PARALLEL (DEGREE n) is specified.

When to Use the NOSORT Option

Presorting your data and loading it in order may not always be the fastest way to load a table. If you have a multiple-CPU computer, you may be able to load data faster using multiple processors in parallel, each processor loading a different portion of the data. To take advantage of parallel processing, load the data without sorting it first. Then create the index without the NOSORT option.

On the other hand, if you have a single-CPU computer, you should sort your data before loading, if possible. Then create the index with the NOSORT option.

GROUP BY NOSORT

Sorting can be avoided when performing a GROUP BY operation when you know that the input data is already ordered so that all rows in each group are clumped together. This may be the case, for example, if the rows are being retrieved from an index that matches the grouped columns, or if a sort-merge join produces the rows in the right order. ORDER BY sorts can be avoided in the same circumstances. When no sort takes place, the EXPLAIN PLAN output indicates GROUP BY NOSORT.

You must set the V733_PLANS_ENABLED initialization parameter to TRUE for GROUP BY NOSORT to be available.

Optimizing Large Sorts with SORT_DIRECT_WRITES

If memory and temporary space are abundant on your system, and you perform many large sorts to disk, you can set the initialization parameter SORT_DIRECT_WRITES to increase sort performance.

Behavior of SORT_DIRECT_WRITES

When this parameter is set to TRUE, each sort will allocate several large buffers in memory for direct disk I/O. You can set the initialization parameters SORT_WRITE_BUFFERS and SORT_WRITE_BUFFER_SIZE to control the number and size of these buffers. The sort will write an entire buffer for each I/O operation. The Oracle process performing the sort writes the sort data directly to the disk, bypassing the buffer cache.

The default value of SORT_DIRECT_WRITES is AUTO. When the parameter is unspecified or set to AUTO, Oracle automatically allocates direct write buffers if the SORT_AREA_SIZE is at least ten times the minimum direct write buffer configuration.

The memory for the direct write buffers is subtracted from the sort area, so the total amount of memory used for each sort is still SORT_AREA_SIZE. Setting SORT_WRITE_BUFFERS and SORT_WRITE_BUFFER_SIZE has no effect when SORT_DIRECT_WRITES is AUTO.

Performance Trade-offs of Direct Disk I/O for Sorts

Setting SORT_DIRECT_WRITES to TRUE causes each Oracle process that sorts to allocate memory in addition to that already allocated for the sort area. The additional memory allocated is calculated as follows:

SORT_WRITE_BUFFERS * SORT_WRITE_BUFFER_SIZE

The minimum direct write configuration on most platforms is two 32K buffers (2 * 32K), so direct write is generally allocated only if the sort area is 640K or greater. With a sort area smaller than this, direct write will not be performed.

Ensure that your operating system has enough free memory available to accommodate this increase. Also, sorts that use direct writes will tend to consume more temporary segment space on disk.

One way to avoid increasing memory usage is to decrease the sort area by the amount of memory allocated for direct writes. Note that reducing the sort area may increase the number of sorts to disk, which will decrease overall performance. A good rule of thumb is that the total memory allocated for direct write buffers should be less than one-tenth of the memory allocated for the sort area. If the minimum configuration of the direct write buffers is greater than one-tenth of your sort area, then you should not trade sort area for direct write buffers.

Tuning Checkpoints

A checkpoint is an operation that Oracle performs automatically. Checkpoints can momentarily reduce performance. This section explains

How Checkpoints Affect Performance

Checkpoints affect

recovery time performance
runtime performance

Frequent checkpoints can reduce recovery time in the event of an instance failure. If checkpoints are relatively frequent, then relatively few changes to the database are made between checkpoints. In this case, relatively few changes must be rolled forward for recovery.

However, a checkpoint can momentarily reduce runtime performance for these reasons:

Checkpoints cause DBWR to perform I/O.
If CKPT is not enabled, checkpoints cause LGWR to update data files and may momentarily prevent LGWR from writing redo entries.

The overhead associated with a checkpoint is usually small and only affects performance while Oracle performs the checkpoint.

Choosing Checkpoint Frequency

You should choose a checkpoint frequency based on your performance concerns. If you are more concerned with efficient runtime performance than recovery time, choose a lower checkpoint frequency. If you are more concerned with fast recovery time than runtime performance, choose a higher checkpoint frequency.

Because checkpoints on log switches are necessary for redo log maintenance, you cannot eliminate checkpoints entirely. However, you can reduce checkpoint frequency to a minimum by setting these parameters:

Set the value of the initialization parameter LOG_CHECKPOINT_INTERVAL larger than the size of your largest redo log file.
Set the value of the initialization parameter LOG_CHECKPOINT_TIMEOUT to 0. This value eliminates time-based checkpoints.

Such settings eliminate all checkpoints except those that occur on log switches.

You can further reduce checkpoints by reducing the frequency of log switches. To reduce log switches, increase the size of your redo log files so that the files do not fill as quickly.

Reducing the Performance Impact of a Checkpoint

You may notice a momentary drop in performance as a checkpoint completes. This drop may be due to an accumulation of redo entries in the redo log buffer. LGWR may be too busy updating data file headers to write these entries to a redo log file. In this case, you can reduce the performance impact of a checkpoint by enabling the Checkpoint process (CKPT).

CKPT updates datafile headers when a checkpoint occurs, leaving LGWR free to write redo entries. To enable CKPT, set the value of the initialization parameter CHECKPOINT_PROCESS to TRUE. To disable CKPT, set this value to FALSE. CKPT is disabled by default. Before enabling CKPT, be sure that your operating system can support an additional process. You may also need to increase the value of the initialization parameter PROCESSES.

Note that only current or consistent read blocks are checkpointed. By contrast, sort blocks are not checkpointed.

To reduce the performance impact of checkpoints, make sure that DBWR writes enough during periods of normal (non-peak) activity. DBWR activity sometimes has a pattern of sharp peaks and valleys. If DBWR is tuned to be more aggressive in writing, then the average level of its activity will be raised, and it will not fall behind.

DB_BLOCK_CHECKPOINT_BATCH specifies the number of blocks the DBWR process must write out as part of a checkpoint. If the internal Oracle write batch size is 512 buffers, and 8 of those are required for a checkpoint, then the checkpoint may take a very long time. Because you only write 8 blocks a t a time, the start and end of a checkpoint will take a long time, going through 512 buffers, 8 at a time.

Tuning LGWR and DBWR I/O

This section describes how to tune I/O for two background processes:

Tuning LGWR I/O

Applications with a lot of INSERTs, or with LONG/RAW activity may benefit from tuning LGWR I/O. The size of each I/O write depends on the size of the log buffer, which is set by the initialization parameter LOG_BUFFER. It is thus important to choose the right log buffer size. LGWR starts writing if the buffer is one third full, or when it is posted by a foreground process such as a COMMIT. Too large a log buffer size might delay the writes. Too small a log buffer might also be inefficient, resulting in frequent, small I/Os.

If the average size of the I/O becomes quite large, the log file could become a bottleneck. To avoid this problem, you can stripe the redo log files, going in parallel to several disks. You must use an operating system striping tool, because manual striping is not possible in this situation.

Stripe size is likewise important. You can figure an appropriate value by dividing the average redo I/O size by the number of disks over which you want to stripe the buffer.

If you have a large number of datafiles or are in a high OLTP environment, you should always have the CKPT initialization parameter set to TRUE. Using the CKPT process ensures that during a checkpoint LGWR keeps on writing redo information, while CKPT updates the datafile headers.

Tuning DBWR I/O

DBWR uses the internal write batch size, which is set to the lowest of the following three values.

Value A is calculated as follows:
Value B is the port-specific limit. (See your Oracle platform-specific documentation.)
Value C is one-fourth the value of DB_BLOCK_BUFFERS.

Setting the write batch too large may result in very uneven response times.

For best results, take the following approach:

Determine the files to which you must write, and the number of disks on which those files reside.
Determine the number of I/Os you can perform against these disks.
Determine the number of writes that your transactions require.
Make sure you have enough disks to sustain this rate.
Adjust the current write batch size so that it equals the number of writes you need plus the number of writes needed for checkpointing (specified by DB_BLOCK_CHECKPOINT_BATCH). You can calculate the desired internal write batch size (IWBS) as follows:
newIWBS = DB_BLOCK_CHECKPOINT_BATCH + currentIWBS

Prev Next

Library

Product

Contents

Index

Disk Speed:	File System	Raw Devices
Reads per second	fast	slow
Writes per second	slow	fast

Block Size	Speed of Sequential Read on
Block Size	Raw Device	Block Device	UNIX File System
512 bytes	1.4	0.6	0.4
1K	1.4	0.6	0.3
2K	1.5	1.1	0.6
4K	1.6	1.8	1.0
8K	2.7	3.0	1.5
16K	5.1	5.3	3.7
32K	10.1	10.3	8.1
64K	20.0	20.3	18.0
128K	40.4	41.3	36.1
256K	80.7	80.3	61.3

File Type	Where to Find Statistics
Database Files	V$FILESTAT
Log Files	V$SYSSTAT, V$SYSTEM_EVENT, V$SESSION_EVENT
Archive Files	V$SYSTEM_EVENT, V$SESSION_EVENT
Control Files	V$SYSTEM_EVENT, V$SESSION_EVENT

Disk Access	Minimum Stripe Size
Random reads and writes	The minimum stripe size is twice the Oracle block size.
Sequential reads	The minimum stripe size is twice the value of DB_FILE_MULTIBLOCK_READ_COUNT.

Tuning I/O

Disk Speed:

File System

Raw Devices

Disk Speed:

File System

Raw Devices

Disk Speed:

File System

Raw Devices

Object

If Stored on File System

If Stored on Raw Devices

R/W Needed per Sec.

Disk R/W Capabil. per Sec.

Disks Needed

R/W Needed per Sec.

Disk R/W Capabil. per Sec.

Disks Needed

Object

If Stored on File System

If Stored on Raw Devices

R/W Needed per Sec.

Disk R/W Capabil. per Sec.

Disks Needed

R/W Needed per Sec.

Disk R/W Capabil. per Sec.

Disks Needed

Operation

Process

LGWR

DBWR

ARCH

SMON

PMON

CKPT

Fore- ground

PQ Slave

Block Size

Speed of Sequential Read on

Raw Device

Block Device

UNIX File System

Block Size

Speed of Sequential Write on

Raw Device

Block Device

UNIX File System

Block Size

Speed of Random Read on

Raw Device

Block Device

UNIX File System

Block Size

Speed of Random Write on

Raw Device

Block Device

UNIX File System

Block Size

Advantages

Disadvantages

Small (2K-4K)

Medium (8K)

Large (16K-32K)

File Type Where to Find Statistics Database Files V$FILESTAT Log Files V$SYSSTAT, V$SYSTEM_EVENT, V$SESSION_EVENT Archive Files V$SYSTEM_EVENT, V$SESSION_EVENT Control Files V$SYSTEM_EVENT, V$SESSION_EVENT Where to Find Statistics about Oracle Files

File Type

Where to Find Statistics

File

Process

LGWR

DBWR

ARCH

SMON

PMON

CKPT

Fore- ground

PQ Slave

Disk Access

Minimum Stripe Size

Block Size

R/W
Needed per Sec.

R/W
Needed per Sec.

R/W
Needed per Sec.

R/W
Needed per Sec.

File Type

Where to Find Statistics

Database Files

V$FILESTAT

Log Files

V$SYSSTAT, V$SYSTEM_EVENT, V$SESSION_EVENT

Archive Files

V$SYSTEM_EVENT, V$SESSION_EVENT

Control Files

V$SYSTEM_EVENT, V$SESSION_EVENT

Where to Find Statistics about Oracle Files