Oracle7 Server Concepts Manual

Library

Product

Contents

Index

Recovery Structures

An Introduction to Database Recovery and Recovery Structures
The Online Redo Log
The Archived Redo Log
Control Files
Survivability

These unhappy times call for the building of plans...

Franklin Delano Roosevelt

This chapter introduces the structures of Oracle that are used during database recovery. It includes:

An Introduction to Database Recovery and Recovery Structures

The Online Redo Log

The Archived Redo Log

Control Files

Survivability

The procedures necessary to create and maintain these structures are discussed in the Oracle7 Server Administrator's Guide.

An Introduction to Database Recovery and Recovery Structures

A major responsibility of the database administrator is to prepare for the possibility of hardware, software, network, process, or system failure. If such a failure affects the operation of a database system, you must usually recover the databases and return to normal operations as quickly as possible. Recovery should protect the databases and associated users from unnecessary problems and avoid or reduce the possibility of having to duplicate work manually.

Recovery processes vary depending on the type of failure that occurred, the structures affected, and the type of recovery that you perform. If no files are lost or damaged, recovery may amount to no more than restarting an instance. If data has been lost, recovery requires additional steps, as described in Chapter 24, "Database Recovery".

Errors and Failures

Several problems can halt the normal operation of an Oracle database or affect database I/O to disk. The following sections describe the most common types. For some of these problems, recovery is automatic and requires little or no action on the part of the database user or database administrator.

A network failure might interrupt normal execution of a client application and cause a process failure to occur. In this case, the Oracle background process PMON detects and resolves the aborted server process for the disconnected user process, as described in the previous section.

A network failure might interrupt the two-phase commit of a distributed transaction. Once the network problem is corrected, the Oracle background process RECO of each involved database server automatically resolves any distributed transactions not yet resolved at all nodes of the distributed database system. Distributed database systems are discussed in Chapter 21, "Distributed Databases".

Database Instance Failure

Database instance failure occurs when a problem arises that prevents an Oracle database instance (SGA and background processes) from continuing to work. An instance failure can result from a hardware problem, such as a power outage, or a software problem, such as an operating system crash.

Recovery from instance failure is relatively automatic. For example, in configurations that do not use the Oracle Parallel Server, the next instance startup automatically performs instance recovery. When using the Oracle Parallel Server, other instances perform instance recovery. For additional information about instance recovery, see Chapter 24, "Database Recovery".

Media (Disk) Failure

An error can arise when trying to write or read a file that is required to operate an Oracle database. This occurrence is called media failure because there is a physical problem reading or writing physical files needed for normal database operation.

A common example of a media failure is a disk head crash, which causes the loss of all files on a disk drive. All files associated with a database are vulnerable to a disk crash, including datafiles, redo log files, and control files. The appropriate recovery from a media failure depends on the files affected; see Chapter 24, "Database Recovery", for a discussion of media recovery.

How Media Failures Affect Database Operation Media failures can affect one or all types of files necessary for the operation of an Oracle database, including datafiles, online redo log files, and control files.

Database operation after a media failure of online redo log files or control files depends on whether the online redo log or control file is multiplexed, as recommended. A multiplexed online redo log or control file simply means that a second copy of the file is maintained. If a media failure damages a single disk, and you have a multiplexed online redo log, the database can usually continue to operate without significant interruption. Damage to a non-multiplexed online redo log causes database operation to halt and may cause permanent loss of data. Damage to any control file, whether it is multiplexed or non-multiplexed, halts database operation once Oracle attempts to read or write the damaged control file.

Media failures that affect datafiles can be divided into two categories: read errors and write errors. In a read error, Oracle discovers it cannot read a datafile and an operating system error is returned to the application, along with an Oracle error indicating that the file cannot be found, cannot be opened, or cannot be read. Oracle continues to run, but the error is returned each time an unsuccessful read occurs. At the next checkpoint, a write error will occur when Oracle attempts to write the file header as part of the standard checkpoint process.

If Oracle discovers that it cannot write to a datafile and Oracle archives filled online redo log files, Oracle returns an error in the DBWR trace file, and Oracle takes the datafile offline automatically. Only the datafile that cannot be written to is taken offline; the tablespace containing that file remains online.

If the datafile that cannot be written to is in the SYSTEM tablespace, the file is not taken offline. Instead, an error is returned and Oracle shuts down the database. The reason for this exception is that all files in the SYSTEM tablespace must be online in order for Oracle to operate properly. For the same reason, the datafiles of a tablespace containing active rollback segments must remain online.

If Oracle discovers that it cannot write to a datafile, and Oracle is not archiving filled online redo log files, DBWR fails and the current instance fails. If the problem is temporary (for example, the disk controller was powered off), instance recovery usually can be performed using the online redo log files, in which case the instance can be restarted. However, if a datafile is permanently damaged and archiving is not used, the entire database must be restored using the most recent backup.

Structures Used for Database Recovery

Several structures of an Oracle database safeguard data against possible failures. The following sections briefly introduce each of these structures and its role in database recovery.

Database Backups

A database backup consists of operating system backups of the physical files that constitute an Oracle database. To begin database recovery from a media failure, Oracle uses file backups to restore damaged datafiles or control files.

Oracle offers several options in performing database backups; see Chapter 23, "Database Backup", for more information.

The Redo Log

The redo log, present for every Oracle database, records all changes made in an Oracle database. The redo log of a database consists of at least two redo log files that are separate from the datafiles (which actually store a database's data). As part of database recovery from an instance or media failure, Oracle applies the appropriate changes in the database's redo log to the datafiles, which updates database data to the instant that the failure occurred.

A database's redo log can be comprised of two parts: the online redo log and the archived redo log, discussed in the following sections.

The Online Redo Log Every Oracle database has an associated online redo log. The online redo log works with the Oracle background process LGWR to immediately record all changes made through the associated instance. The online redo log consists of two or more pre-allocated files that are reused in a circular fashion to record ongoing database changes; see "The Online Redo Log" for more information.

The Archived (Offline) Redo Log Optionally, you can configure an Oracle database to archive files of the online redo log once they fill. The online redo log files that are archived are uniquely identified and make up the archived redo log. By archiving filled online redo log files, older redo log information is preserved for more extensive database recovery operations, while the pre-allocated online redo log files continue to be reused to store the most current database changes; see "The Archived Redo Log" page 22-16 for more information.

Rollback Segments

Rollback segments are used for a number of functions in the operation of an Oracle database. In general, the rollback segments of a database store the old values of data changed by ongoing transactions (that is, uncommitted transactions). Among other things, the information in a rollback segment is used during database recovery to "undo" any "uncommitted" changes applied from the redo log to the datafiles. Therefore, if database recovery is necessary, the data is in a consistent state after the rollback segments are used to remove all uncommitted data from the datafiles; see "Rollback Segments"

for more information.

Control Files

In general, the control file(s) of a database store the status of the physical structure of the database. Certain status information in the control file (for example, the current online redo log file, the names of the datafiles, and so on) guides Oracle during instance or media recovery; see "Control Files"

for more information.

The Online Redo Log

Every instance of an Oracle database has an associated online redo log to protect the database in case the database experiences an instance failure. An online redo log consists of two or more pre-allocated files that store all changes made to the database as they occur.

Online Redo Log File Contents

Online redo log files are filled with redo entries. Redo entries record data that can be used to reconstruct all changes made to the database, including the rollback segments stored in the database buffers of the SGA. Therefore, the online redo log also protects rollback data.

Note: Redo entries store low-level representations of database changes that cannot be mapped to user actions. Therefore, the contents of an online redo log file should never be edited or altered, and cannot be used for any application purposes such as auditing.

Redo entries are buffered in a "circular" fashion in the redo log buffer of the SGA and are written to one of the online redo log files by the Oracle background process Log Writer (LGWR). Whenever a transaction is committed, LGWR writes the transaction's redo entries from the redo log buffer of the SGA to an online redo log file, and a system change number (SCN) is assigned to identify the redo entries for each committed transaction.

However, redo entries can be written to an online redo log file before the corresponding transaction is committed. If the redo log buffer fills, or another transaction commits, LGWR flushes redo log entries in the redo log buffer to an online redo log file, of which some redo entries may not be committed. See "The Redo Log Buffer" for more information.

How Online Redo Log Files Are Written

The online redo log of a database consists of two or more online redo log files. Oracle requires two files to guarantee that one is always available for writing while the other is being archived, if desired.

LGWR writes to online redo log files in a circular fashion; when the current online redo log file is filled, LGWR begins writing to the next available online redo log file. When the last available online redo log file is filled, LGWR returns to the first online redo log file and writes to it, starting the cycle again. Figure 22 - 1 illustrates the circular writing of the online redo log file. The numbers next to each line indicate the sequence in which LGWR writes to each online redo log file.

Filled online redo log files are "available" to LGWR for reuse depending on whether archiving is enabled:

If archiving is disabled, a filled online redo log file is available once the checkpoint involving the online redo log file has completed.

If archiving is enabled, a filled online redo log file is available to LGWR once the checkpoint involving the online redo log file has completed and once the file has been archived.

Figure 22 - 1. Circular Use of Online Redo Log Files by LGWR

Active (Current) and Inactive Online Redo Log Files

At any given time, Oracle uses only one of the online redo log files to store redo entries written from the redo log buffer. The online redo log file actively being written by LGWR is called the current online redo log file.

Online redo log files that are required for instance recovery are called active online redo log files. Online redo log files that are not required for instance recovery are called inactive.

If archiving is enabled, an active online log file cannot be reused or overwritten until its contents are archived. If archiving is disabled, when the last online redo log file fills, writing continues by overwriting the first available active file. For more information about archiving options for the redo log, see "Database Archiving Modes" on page 22-18.

Log Switches and Log Sequence Numbers

The point at which Oracle ends writing to one online redo log file and begins writing to another is called a log switch. A log switch always occurs when the current online redo log file is completely filled and writing must continue to the next online redo log file. The database administrator can also force log switches if the current redo log file is closed for some operation (for example, archiving).

Oracle assigns each online redo log file a new log sequence number every time that a log switch occurs and LGWR begins writing to it. If online redo log files are archived, the archived redo log file retains its log sequence number. The online redo log file that is cycled back for use is given the next available log sequence number.

Each redo log file (including online and archived) is uniquely identified by its log sequence number. During instance or media recovery, Oracle properly applies redo log files in ascending order by using the log sequence number of necessary archived and online redo log files.

Checkpoints

An event called a checkpoint occurs when an Oracle background process, DBWR, writes all the modified database buffers in the SGA, including committed and uncommitted data, to the data files. Checkpoints are implemented for the following reasons:

Checkpoints ensure that data segment blocks in memory that change frequently are written to datafiles regularly. Because of the least-recently-used algorithm of DBWR, a data segment block that changes frequently might never qualify as the least recently used block and thus might never be written to disk if checkpoints did not occur.

Because all database changes up to the checkpoint have been recorded in the datafiles, redo log entries before the checkpoint no longer need to be applied to the datafiles if instance recovery is required. Therefore, checkpoints are useful because they can expedite instance recovery.

Though some overhead is associated with a checkpoint, Oracle does not halt activity nor are current transactions affected. Because DBWR continuously writes database buffers to disk, a checkpoint does not necessarily require many data blocks to be written all at once. Rather, the completion of a checkpoint simply guarantees that all data blocks modified since the previous checkpoint are actually written to disk.

Checkpoints occur whether or not filled online redo log files are archived. If archiving is disabled, a checkpoint affecting an online redo log file must complete before the online redo log file can be reused by LGWR. If archiving is enabled, a checkpoint must complete and the filled online redo log file must be archived before it can be reused by LGWR.

Checkpoints can occur for all datafiles of the database (called database checkpoints) or can occur for only specific datafiles. The following list explains when checkpoints occur and what type happens in each situation:

A database checkpoint automatically occurs at every log switch. If a previous database checkpoint is currently in progress, a checkpoint forced by a log switch overrides the current checkpoint.

An initialization parameter, LOG_CHECKPOINT_INTERVAL, can be set to force a database checkpoint when a predetermined number of redo log blocks have been written to disk relative to the last database checkpoint. You can set another parameter, LOG_CHECKPOINT_TIMEOUT, to force a database checkpoint a specific number of seconds after the previous database checkpoint started. These options are useful when extremely large redo log files are used and additional checkpoints are desired between log switches. Database checkpoints signaled to start by these initialization parameters are not performed until the previous checkpoint has completed.

When the beginning of an online tablespace backup is indicated, a checkpoint is forced only on the datafiles that constitute the tablespace being backed up. A checkpoint at this time overrides any previous checkpoint still in progress. Also, since this type of checkpoint only affects the datafiles being backed up, it does not reduce the amount of redo that would be needed for instance recovery.

If the administrator takes a tablespace offline with normal or temporary priority, a checkpoint is forced only on the online datafiles of the associated tablespace.

If the database administrator shuts down an instance (NORMAL or IMMEDIATE), Oracle forces a database checkpoint to complete before the instance is shut down. A database checkpoint forced by instance shutdown overrides any previously running checkpoint.

The database administrator can force a database checkpoint to happen on demand. A checkpoint forced on demand overrides any previously running checkpoint.

Note: Checkpoints also occur at other times if the Oracle Parallel Server is used; see Oracle7 Parallel Server Concepts & Administration for more information.

The Mechanics of a Checkpoint When a checkpoint occurs, the checkpoint background process (CKPT) remembers the location of the next entry to be written in an online redo log file and signals the database writer background process (DBWR) to write the modified database buffers in the SGA to the datafiles on disk. CKPT then updates the headers of all control files and datafiles to reflect the latest checkpoint.

When a checkpoint is not happening, DBWR only writes the least-recently-used database buffers to disk to free buffers as needed for new data. However, as a checkpoint proceeds, DBWR writes data to the data files on behalf of both the checkpoint and ongoing database operations. DBWR writes a number of modified data buffers on behalf of the checkpoint, then writes the least recently used buffers, as needed, and then writes more dirty buffers for the checkpoint, and so on, until the checkpoint completes.

Depending on what signals a checkpoint to happen, the checkpoint can be either "normal" or "fast". With a normal checkpoint, DBWR writes a small number of data buffers each time it performs a write on behalf of a checkpoint. With a fast checkpoint, DBWR writes a large number of data buffers each time it performs a write on behalf of a checkpoint.

Therefore, by comparison, a normal checkpoint requires more I/Os to complete than a fast checkpoint. Because a fast checkpoint requires fewer I/Os, the checkpoint completes very quickly. However, a fast checkpoint can also detract from overall database performance if DBWR has a lot of other database work to complete. Events that trigger normal checkpoints include log switches and checkpoint intervals set by initialization parameters; events that trigger fast checkpoints include online tablespace backups, instance shutdowns, and database administrator-forced checkpoints.

Until a checkpoint completes, all online redo log files written since the last checkpoint are needed in case a database failure interrupts the checkpoint and instance recovery is necessary. Additionally, if LGWR cannot access an online redo log file for writing because a checkpoint has not completed, database operation suspends temporarily until the checkpoint completes and an online redo log file becomes available. In this case, the normal checkpoint becomes a fast checkpoint, so it completes as soon as possible.

For example, if only two online redo log files are used, and LGWR requires another log switch, the first online redo log file is unavailable to LGWR until the checkpoint for the previous log switch completes.

Note: The information that is recorded in the datafiles and control files as part of a checkpoint varies if the Oracle Parallel Server configuration is used; see Oracle7 Parallel Server Concepts & Administration.

You can set the initialization parameter LOG_CHECKPOINTS_TO_ALERT to determine if checkpoints are occurring at the desired frequency. The default value of NO for this parameter does not log checkpoints. When you set the parameter to YES, information about each checkpoint is recorded in the ALERT file.

Multiplexed Online Redo Log Files

Oracle provides the capability to multiplex an instance's online redo log files to safeguard against damage to its online redo log files. With multiplexed online redo log files, LGWR concurrently writes the same redo log information to multiple identical online redo log files, thereby eliminating a single point of online redo log failure. Figure 22 - 2 illustrates duplexed (two sets of) online redo log files.

Figure 22 - 2. Multiplexed Online Redo Log Files

The corresponding online redo log files are called groups. Each online redo log file in a group is called a member. Notice that all members of a group are concurrently active (concurrently written to by LGWR), as indicated by the identical log sequence numbers assigned by LGWR. If a multiplexed online redo log is used, each member in a group must be the exact same size.

The Mechanics of a Multiplexed Online Redo Log LGWR always addresses all members of a group, whether the group contains one or many members. For example, LGWR always tries to write to all members of a given group concurrently, then to switch and concurrently write to all members of the next group, and so on. LGWR never writes concurrently to one member of a given group and one member of another group.

LGWR reacts differently when certain online redo log members are unavailable, depending on the reason for the file(s) being unavailable:

If LGWR can successfully write to at least one member in a group (either at a log switch or as writing to the group is proceeding), writing to the accessible members of the group proceeds as normal; LGWR simply writes to the available members of a group and ignores the unavailable members.

If LGWR cannot access the next group at a log switch because the group needs to be archived, database operation is temporarily halted until the group becomes available (in other words, until the group is archived).

If all members of the next group are inaccessible to LGWR at a log switch because of one or more disk failures, an error is returned and the database instance is immediately shut down. In this case, the database may need media recovery from the loss of an online redo log file; see Chapter 24, "Database Recovery", for more information about such recovery. However, if the database checkpoint has moved beyond the lost log (this is not the current log in this example), media recovery is not necessary. Simply drop the inaccessible log group. If the log was not archived, archiving might need to be disabled before the log can be dropped.

If all members of a group suddenly become inaccessible to LGWR as they are being written, an error is returned and the database instance is immediately shut down. In this case, the database might need media recovery from the loss of an online redo log file; see Chapter 24, "Database Recovery", for more information about such recovery. If the media containing the log is not actually lost -- for example, if the drive for the log was inadvertently turned off -- media recovery might not be needed. In this example, all that is necessary is to turn the drive back on and do instance recovery.

Whenever LGWR cannot write to a member of a group, Oracle marks that member as stale and writes an error message to the LGWR trace file and to the database's ALERT file to indicate the problem with the inaccessible file(s).

To always safeguard against a single point of online redo log failure, a multiplexed online redo log should be completely symmetrical: all groups of the online redo log should have the same number of members. However, Oracle does not require that a multiplexed online redo log be symmetrical. For example, one group can have only one member, while other groups can have two members. Oracle allows this behavior to provide for situations that temporarily affect some online redo log members but leave others unaffected (for example, a disk failure). The only requirement for an instance's online redo log, multiplexed or non-multiplexed, is that it be comprised of at least two groups. Figure 22 - 3 shows a legal and illegal multiplexed online redo log configuration.

Figure 22 - 3. Legal and Illegal Multiplexed Online Redo Log Configuration

"Threads" of Online Redo Log and the Oracle Parallel Server

Each database instance has its own online redo log groups. These online redo log groups, multiplexed or not, are called an instance's "thread" of online redo. In typical configurations, only one database instance accesses an Oracle database, thus only one thread is present. However, when running the Oracle Parallel Server, two or more instances concurrently access a single database; each instance in this type of configuration has its own thread.

This manual describes how to configure and manage the online redo log when the Oracle Parallel Server is not used. The thread number can be assumed to be 1 in all discussions and examples of commands. For complete information about configuring the online redo log with the Oracle Parallel Server, see Oracle7 Parallel Server Concepts & Administration.

The Archived Redo Log

Oracle allows the optional archiving of filled groups of online redo log files, which creates archived (offline) redo logs. The archiving of filled groups has two key advantages relating to database backup and recovery options:

A database backup, together with online and archived redo log files, guarantees that all committed transactions can be recovered in the event of an operating system or disk failure.

Online backups, taken while the database is open and in normal system use, can be used if an archived log is kept permanently.

The choice of whether or not to enable the archiving of filled groups of online redo log files depends on the backup and recovery scheme for an Oracle database. If you cannot afford to lose any data in your database in the event of a disk failure, an archived log must be present. However, the archiving of filled online redo log files can require the database administrator to perform extra administrative operations.

The Mechanics of Archiving

Depending on how you configure archiving, the mechanics of archiving redo log groups are performed by either the optional Oracle background process ARCH (when automatic archiving is used) or the user process that issues a statement to archive a group manually. For more information, see "Automatic Archiving and the ARCH (Archiver) Background Process"

and "Manual Archiving"

Note: For simplicity, the remainder of this section assumes that archiving is enabled and the ARCH background process is responsible for archiving filled online redo log groups.

ARCH can archive a group once the group becomes inactive and the log switch to the next group has completed. The ARCH process can access any members of the group, as needed, to complete the archiving of the group. For example, if ARCH attempts to open a member of a group and it is not accessible (for example, due to a disk failure), ARCH automatically tries to use another member of the group, and so on, until it finds a member of the group that is available for archiving. If ARCH is archiving a member of a group, and the information in the member is detected as invalid or a disk failure occurs as archiving proceeds, ARCH automatically switches to another member of the group to continue archiving the group where it was interrupted.

A group of online redo log files does not become available to LGWR for reuse until ARCH has archived the group. This restriction is important because it guarantees that LGWR cannot accidentally write over a group that has not been archived, which would prevent the use of all subsequent archived redo log files during a database recovery.

When archiving is used, an archiving destination is specified. This destination is usually a storage device separate from the disk drives that hold the datafiles, online redo log files, and control files of the database. Typically, the archiving destination is another disk drive of the database server This way, archiving does not contend with other files required by the instance and completes quickly so the group can become available to LGWR. Ideally, archived redo log files (and corresponding database backups) should be moved permanently to inexpensive offline storage media, such as tape, that can be stored in a safe place, separate from the database server.

At log switch time, when no more information will be written to a redo log, a record is created in the database's control file. Each record contains the thread number, log sequence number, low SCN for the group, and next SCN after the archived log file; this information is used during database recovery in Parallel Server configurations to automate the application of redo log files. See Oracle7 Parallel Server Concepts & Administration for additional information.

Archived Redo Log File Contents

An archived redo log file is a simple copy of the identical filled members that constitute an online redo log group. Therefore, an archived redo log file includes the redo entries present in the identical members of a group at the time the group was archived. The archived redo log file also preserves the group's log sequence number.

If archiving is enabled, LGWR is not allowed to reuse an online redo log group until it has been archived. Therefore, it is guaranteed that the archived redo log contains a copy of every group (uniquely identified by log sequence numbers) created since archiving was enabled.

Database Archiving Modes

A database can operate in two distinct modes: NOARCHIVELOG mode (media recovery disabled) or ARCHIVELOG mode (media recovery enabled).

NOARCHIVELOG Mode (Media Recovery Disabled)

If a database is used in NOARCHIVELOG mode, the archiving of the online redo log is disabled. Information in the database's control file indicates that filled groups are not required to be archived. Therefore, once a filled group becomes inactive and the checkpoint at the log switch completes, the group is available for reuse by LGWR.

NOARCHIVELOG mode protects a database only from instance failure, not from disk (media) failure. Only the most recent changes made to the database, stored in the groups of the online redo log, are available for instance recovery.

ARCHIVELOG Mode (Media Recovery Enabled)

If an Oracle database is operated in ARCHIVELOG mode, the archiving of the online redo log is enabled. Information in a database control file indicates that a group of filled online redo log files cannot be reused by LGWR until the group is archived. A filled group is immediately available to the process performing the archiving once a log switch occurs (when a group becomes inactive); the process performing the archiving does not have to wait for the checkpoint of a log switch to complete before it can access the inactive group for archiving. Figure 22 - 4 illustrates how the database's online redo log files are used in ARCHIVELOG mode and how the archived redo log is generated by the process archiving the filled groups (for example, ARCH in this illustration).

Figure 22 - 4. Online Redo Log File Use in ARCHIVELOG Mode

ARCHIVELOG mode permits complete recovery from disk failure as well as instance failure, because all changes made to the database are permanently saved in an archived redo log.

Automatic Archiving and the ARCH (Archiver) Background Process An instance can be configured to have an additional background process, Archiver (ARCH), automatically archive groups of online redo log files once they become inactive. Therefore, automatic archiving frees the database administrator from having to keep track of, and archive, filled groups manually. For this convenience alone, automatic archiving is the choice of most database systems that have an archived redo log.

If you request automatic archiving at instance startup by setting the LOG_ARCHIVE_START initialization parameter, Oracle starts ARCH during instance startup. Otherwise, ARCH is not started during instance startup.

However, the database administrator can interactively start or stop automatic archiving at any time. If automatic archiving was not specified to start at instance startup, and the administrator subsequently starts automatic archiving, the ARCH background process is created. ARCH then remains for the duration of the instance, even if automatic archiving is temporarily turned off and turned on again.

ARCH always archives groups in order, beginning with the lowest sequence number. ARCH automatically archives filled groups as they become inactive. A record of every automatic archival is written in the ARCH trace file by the ARCH process. Each entry shows the time the archive started and stopped.

If ARCH encounters an error when attempting to archive a group (for example, due to an invalid or filled destination), ARCH continues trying to archive the group. An error is also written in the ARCH trace file and the ALERT file. If the problem is not resolved, eventually all online redo log groups become full, yet not archived, and the system halts because no group is available to LGWR. Therefore, if problems are detected, you should either resolve the problem so that ARCH can continue archiving (such as by changing the archive destination) or manually archive groups until the problem is resolved.

Manual Archiving If a database is operating in ARCHIVELOG mode, the database administrator can manually archive the filled groups of inactive online redo log files, as necessary, whether or not automatic archiving is enabled or disabled. If automatic archiving is disabled, the database administrator is responsible for archiving all filled groups.

For most systems, automatic archiving is chosen because the administrator does not have to watch for a group to become inactive and available for archiving. Furthermore, if automatic archiving is disabled and manual archiving is not performed fast enough, database operation can be suspended temporarily whenever LGWR is forced to wait for an inactive group to become available for reuse. The manual archiving option is provided so that the database administrator can

archive a group when automatic archiving has been stopped because of a problem (for example, the offline storage device specified as archived redo log destination has experienced a failure or become full)

archive a group in a non-standard fashion (for example, one group to one offline storage device, the next group to a different offline storage device, and so on)

re-archive a group if the original archived version is lost or damaged

When a group is archived manually, the user process issuing the statement to archive a group actually performs the process of archiving the group. Even if the ARCH background process is present for the associated instance, it is the user process that archives the group of online redo log files.

Control Files

The control file of a database is a small binary file necessary for the database to start and operate successfully. A control file is updated continuously by Oracle during database use, so it must be available for writing whenever the database is open. If for some reason the control file is not accessible, the database will not function properly.

Each control file is associated with only one Oracle database.

Control File Contents

A control file contains information about the associated database that is required for the database to be accessed by an instance, both at startup and during normal operation. A control file's information can be modified only by Oracle; no database administrator or end-user can edit a database's control file.

Among other things, a control file contains information such as

the database name

the timestamp of database creation

the names and locations of associated databases and online redo log files

the current log sequence number

checkpoint information

The database name and timestamp originate at database creation. The database's name is taken from either the name specified by the initialization parameter DB_NAME or the name used in the CREATE DATABASE statement.

Each time that a datafile or an online redo log file is added to, renamed in, or dropped from the database, the control file is updated to reflect this physical structure change. These changes are recorded so that

Oracle can identify the datafiles and online redo log files to open during database startup.

Oracle can identify files that are required or available in case database recovery is necessary.

Therefore, if you make a change to your database's physical structure, you should immediately make a backup of your control file. See "Control File Backups"

for information about backing up a database's control file.

Control files also record information about checkpoints. When a checkpoint starts, the control file records information to remember the next entry that must be entered into the online redo log. This information is used during database recovery to tell Oracle that all redo entries recorded before this point in the online redo log group are not necessary for database recovery; they were already written to the datafiles. See "Checkpoints" .

Multiplexed Control Files

As with online redo log files, Oracle allows multiple, identical control files to be open concurrently and written for the same database. By storing multiple control files for a single database on different disks, you can safeguard against a single point of failure with respect to control files. If a single disk that contained a control file crashes, the current instance fails when Oracle attempts to access the damaged control file. However, other copies of the current control file are available on different disks, so an instance can be restarted easily without the need for database recovery.

The permanent loss of all copies of a database's control file is a serious problem to safeguard against. If all control files of a database are permanently lost during operation (several disks fail), the instance is aborted and media recovery is required. Even so, media recovery is not straightforward if an older backup of a control file must be used because a current copy is not available. Therefore, it is strongly recommended that multiplexed control files be used with each database, with each copy stored on a different physical disk.

Survivability

In the event of a power failure, hardware failure, or any other system-interrupting disaster, Oracle7 release 7.3 offers the standby database feature. The standby database is intended for sites where survivability and disaster recovery are of paramount importance.

For information about creating and maintaining a standby database, see the Oracle7 Server Administrator's Guide.

Planning for Disaster Recovery

The only way to ensure rapid recovery from a system failure or other disaster is to plan carefully. You must have a set plan with detailed procedures. Whether you are implementing a standby database or you have a single system, you must have a plan for what to do in the event of a catastrophic failure.

See Chapter 24, "Database Recovery", for information about recovering a database.

Standby Database

Release 7.3 provides a reliable and supported mechanism for implementing a standby database system to facilitate quick disaster recovery. The scheme uses a secondary system on duplicate hardware, maintained in a constant state of media recovery through the application of log files archived at the primary site. In the event of a primary system failure, the standby can be activated with minimal recovery, providing immediate system availability. There are new commands and internal verifications for operations involved in the creation and maintenance of the standby system, improving the reliability of the disaster recovery scheme.

A standby database involves two databases: a primary database and a standby database. The primary database is the production database that is in use. The standby database is a copy of the production database, ideally located on a separate machine. The standby database runs in recovery mode until there is a failure at the primary site. At the time of a failure, the standby database performs recovery operations and comes online as the primary database.

A standby database uses the archived log information from the primary database, so it is ready to perform recovery and go online at any time. When the primary database archives its redo logs, the logs must be transferred to the remote site and applied to the standby database. The standby database is therefore always behind the primary database in time and transaction history.

The physical hardware on which the standby database resides should be used only as a disaster recovery system; no other applications should run on it. Because the standby database is designed for disaster recovery, it ideally resides in a separate physical location than the primary. The standby database exists not only to guard against power failures and hardware failures, but also to protect your data in the event of a physical disaster such as a fire or an earthquake.

Prev Next