Oracle7 Parallel Server Concepts and Administrator's Guide

Library

Product

Contents

Index

Recovering the Database

Overview
Recovery from Instance Failure
Recovery from Media Failure
Parallel Recovery

This chapter describes Oracle recovery features on a parallel server. It covers the following topics:

Overview

Recovery from Instance Failure

Recovery from Media Failure

Parallel Recovery

Overview

This chapter discusses three types of recovery:

Type of Recovery Definition
Instance failure Occurs when a software or hardware problem prevents an instance from continuing work.
Media failure Occurs when the storage medium for Oracle files is damaged. This usually prevents Oracle from reading or writing data.
Parallel recovery One process reads the log files sequentially and dispatches redo information to several recovery processes, which apply the changes from the log files to the datafiles.

Table 23 - 1. Types of Recovery

Recovery from Instance Failure

The following sections describe the recovery performed after failure of instances accessing the database in shared mode.

Single-node Failure

Multiple-node Failure

Access to Datafiles for Instance Recovery

Phases of DLM and Oracle Recovery

After instance failure, Oracle uses the online redo log files to perform automatic recovery of the database. For a single instance running in exclusive mode, instance recovery occurs as soon as the instance starts up again after it has failed or shut down abnormally, as described in the "Recovering a Database" chapter of Oracle7 Server Administrator's Guide.

When instances accessing the database in shared mode fail, online instance recovery is performed automatically. Instances that continue running on other nodes are not affected, as long as they are reading from the buffer cache. If instances attempt to write, the transaction will stop. All operations to the database are suspended until cache recovery of the failed instance is complete.

Single-node Failure

A parallel server performs online instance recovery by coordinating recovery operations through the SMON processes of the different instances. If one instance fails, the SMON process of another instance notices the failure and automatically performs instance recovery for the failed instance.

Online instance recovery does not include restarting the failed instance or any applications that were running on that instance.

When one instance performs recovery for another instance that has failed, the surviving instance reads the redo log entries generated by the failed instance, and uses that information to ensure that all committed transactions are reflected in the database. No data from committed transactions is lost.

The instance that is performing recovery rolls back any transactions that were active at the time of the failure and releases any resources being used by those transactions.

Multiple-node Failure

As long as one instance continues running, its SMON process performs online instance recovery for any other instances that fail in a parallel server.

If all instances of a parallel server fail, instance recovery is performed automatically the next time an instance opens the database. The instance does not have to be one of the instances that failed, and it can mount the database in either shared or exclusive mode from any node of the parallel server. This recovery procedure is the same for Oracle running in shared mode as it is for Oracle in exclusive mode, except that one instance performs instance recovery for all of the instances that failed.

Access to Datafiles for Instance Recovery

An instance that performs recovery for another instance must have access to all of the online datafiles that the failed instance was accessing. When instance recovery fails because a datafile fails verification, the instance that attempted to perform recovery does not fail, but a message is written to the ALERT file.

After you correct the problem that prevented access to the database files, you must use the SQL statement ALTER SYSTEM CHECK DATAFILES to make the files available to the instance.

See Also: "Datafiles" .

Phases of DLM and Oracle Recovery

Figure 23 - 1 illustrates the degree of database availability during each phase of DLM and Oracle recovery.

Figure 23 - 1. Phases of DLM and Oracle Recovery

Phases of recovery are these:

1. Oracle Parallel Server is running on multiple nodes.

2. Node failure is detected.

3. The DLM is reconfigured; resource and lock management is redistributed onto the set of surviving nodes. One call will get persistent resources, if supported by the DLM. Lock value block is marked as dubious for locks held in exclusive or protected write mode. Lock requests are queued, for some DLM implementations.

5. Roll forward. Redo logs of the dead thread(s) are applied to the database.

LCK

7. Roll back. Rollback segments are applied to the database for all uncommitted transactions.

8. Instance recovery is complete, and all data is accessible. During phase 5 (forward application of the redo log), database access is limited by the transitional state of the buffer cache. The following data access restrictions exist for all user data in all datafiles, regardless of whether you are using hashed or fine grain locking, or any particular features:

No writes to any of the surviving buffer caches will succeed while the access is limited.

No disk I/O of any sort via the buffer cache and direct path can be done from any of the surviving instances.

No lock requests will be made to the DLM for any user data.

Reads of buffers already in the cache with the correct global lock can be done, since they do not involve any I/O or lock operations.

The transitional state of the buffer cache begins at the conclusion of the initial lock scan phase when instance recovery is first started by scanning for dead redo threads. Subsequent lock scans are made if new dead threads are discovered. This state lasts while the redo log is applied (cache recovery) and ends when the redo logs have been applied and the file headers have been updated. Cache recovery operations conclude with validation of the invalid locks, which occurs after the buffer cache state is normalized.

Recovery from Media Failure

After a media failure that results in the loss of one or more database files, you must use backup copies of the datafiles to recover the database. You might also need to apply archived redo log files to the database or use a backup copy of the control file. This section describes:

Complete Media Recovery

Incomplete Media Recovery

Mounting Redo Log Files for Recovery

Disaster Recovery

See Also: "Recovering a Database" in the Oracle7 Server Administrator's Guide for recovery procedures for various kinds of media failure.

Complete Media Recovery

You can perform complete media recovery in either exclusive or shared mode. The following table shows what the status of the database must be, for you to recover particular database objects.

To Recover Database Status
An entire database or the SYSTEM tablespace The database must be mounted but not opened by any instance.
A tablespace other than the SYSTEM tablespace The database must be opened by the instance performing the recovery and the tablespace must be offline.
A datafile The database can be open with the datafile offline, or the database can be mounted but not opened by any instance. (For a datafile in the SYSTEM tablespace, the database must be mounted but not open.)

Table 23 - 2. Database Status for Media Recovery

You can use the Server Manager Recover dialog box with the Database radio button, or the RECOVER DATABASE command to recover a database that is mounted in shared mode, but not open. Only one instance can issue this command in a parallel server.

To perform online recovery of tablespaces or datafiles in shared mode, you can use either the Server Manager Recover dialog box with the Tablespace or Datafile radio button or the command RECOVER TABLESPACE or RECOVER DATAFILE.

You can recover multiple datafiles or tablespaces on multiple instances simultaneously.

Note: The recommended method of recovering a database is to use Server Manager. Direct use of the ALTER DATABASE RECOVER SQL command is not recommended.

Incomplete Media Recovery

Incomplete media recovery can be performed while the database is mounted in shared or exclusive mode, but not open by any instance, using the following database recovery options:

UNTIL CANCEL

UNTIL CHANGE integer

UNTIL TIME date

See Also: The "Recovering a Database" chapter in Oracle7 Server Administrator's Guide for information about these options.

Mounting Redo Log Files for Recovery

Media recovery of a database accessed by a parallel server may require multiple archived log files to be mounted and open at the same time. Because each instance writes redo log data to a separate thread of redo, recovery may require as many as one archived log file per thread.

However, if a thread's online redo log contains enough recovery information, mounting any archived log files for that thread will be unnecessary.

When recovering using Server Manager, you are prompted for the archived log files as they are needed. Messages supply information about the required files, and Server Manager prompts you for the filename.

For example, if the log history is enabled and the filename format is LOG_T%t_SEQ%s, where %t is the thread and %s is the log sequence number, then you might receive these messages to begin recovery with SCN 9523 in thread 8:

ORA-00279: Change 9523 generated at 27/09/91 11:42:54 needed for thread 8

ORA-00289: Suggestion : LOG_T8_SEQ438

ORA-00280: Change 9523 for thread 8 is in sequence 438

Specify log: {<RET> = suggested | filename | AUTO | FROM | CANCEL}

If you use the ALTER DATABASE command with the RECOVER clause instead of Server Manager, you receive these messages but not the prompt. Redo log files may be required for each enabled thread in the parallel server. When a log file is no longer needed, Oracle issues a message indicating that you can dismount the file. The next log file for that thread is then requested, unless the thread was disabled or recovery is finished.

If recovery reaches a time when an additional thread was enabled, Oracle simply requests the archived log file for that thread. Whenever an instance enables a thread, it writes a redo entry that records the change; therefore, all necessary information about threads is available from the redo log files during recovery.

If recovery reaches a time when a thread was disabled, Oracle informs you that the log file for that thread is no longer needed and does not request any further log files for the thread.

Note: If Oracle reconstructs the names of archived redo log files, the format that LOG_ARCHIVE_FORMAT specifies for the instance doing recovery must be the same as the format specified for the instances that archived the files. All instances should use the same value of LOG_ARCHIVE_FORMAT in a parallel server, and the instance performing recovery should also use that value. You can specify a different value of LOG_ARCHIVE_DEST during recovery if the archived redo log files are not at their original archive destinations.

Disaster Recovery

Disaster recovery is used when a failure makes a whole site unavailable. In this case, you can recover at an alternate site using offline or online backups. (To recover up to the latest point in time, all logs must be available at a remote site; otherwise some work may be lost.) Use the following procedure.

To Perform Disaster Recovery in OPS

Oracle7 Server Administrator's Guide

2. Start up Server Manager.

3. Connect as SYSDBA.

4. Start and mount the database with the STARTUP MOUNT dialog box.

5. Initiate an incomplete recovery using the RECOVER dialog box with the appropriate UNTIL option.

The following command is an example of the line mode equivalent:

		RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL

6. When prompted with a suggested redo log file name for a specific thread, use that filename.

If the suggested archive log is not in the archive directory, specify where the file can be found. If redo information is needed for a thread and a file name is not suggested, try using archive log files for the thread in question.

7. Repeat step 6 until all archive log files have been applied.

8. Stop the recovery operation using the CANCEL button.

9. Issue the ALTER DATABASE OPEN RESETLOGS command.

Parallel Recovery

The goal of the parallel recovery feature is to use compute and I/O parallelism to reduce the elapsed time required to perform crash recovery, single-instance recovery, or media recovery. Parallel recovery is most effective at reducing recovery time when several datafiles on several disks are being recovered concurrently.

You can parallelize instance and media recovery in two ways:

Setting the RECOVERY_PARALLELISM Parameter

Specifying RECOVER Command Options

The Oracle Server can use one process to read the log files sequentially and dispatch redo information to several recovery processes to apply the changes from the log files to the datafiles. The recovery processes are started automatically by Oracle, so there is no need to use more than one session to perform recovery.

Setting RECOVERY_ PARALLELISM Parameter

The RECOVERY_PARALLELISM initialization parameter specifies the number of redo application slave processes that participate in instance or media recovery. A value of 0 or 1 indicates that recovery is to be performed serially by one process. The value of this parameter cannot exceed the value of the PARALLEL_MAX_SERVERS parameter.

Specifying RECOVER Command Options

When you use the RECOVER command to parallelize instance and media recovery, the allocation of recovery processes to instances is operating system specific. The DEGREE keyword of the PARALLEL clause can either signify the number of processes on each instance of a parallel server or the number of processes to spread across all instances.

See Also: Your Oracle system-specific documentation for more information on the allocation of recovery processes to instances.

Oracle7 Server Concepts for more information on parallel recovery.

Prev Next

Library

Product

Contents

Index

Type of Recovery	Definition
Instance failure	Occurs when a software or hardware problem prevents an instance from continuing work.
Media failure	Occurs when the storage medium for Oracle files is damaged. This usually prevents Oracle from reading or writing data.
Parallel recovery	One process reads the log files sequentially and dispatches redo information to several recovery processes, which apply the changes from the log files to the datafiles.

To Recover	Database Status
An entire database or the SYSTEM tablespace	The database must be mounted but not opened by any instance.
A tablespace other than the SYSTEM tablespace	The database must be opened by the instance performing the recovery and the tablespace must be offline.
A datafile	The database can be open with the datafile offline, or the database can be mounted but not opened by any instance. (For a datafile in the SYSTEM tablespace, the database must be mounted but not open.)