Oracle Instance Recovery Mechanisms and Procedures

In the operation of Oracle Database, “Instance Failure” is an unavoidable occurrence. When a database goes down due to a sudden power outage or process termination, Instance Recovery is the mechanism that automatically restores the database while maintaining consistency.

This article explains the fundamental knowledge that DBAs should know, from the causes of instance failure to the internal mechanisms of Redo/Rollback and the parameters that control recovery time.

Quick Execution Steps for Instance Recovery
1. What is Instance Failure? Causes and Impact
1. Major Causes
2. Impact on the Database
2. Mechanism of Instance Recovery: Two Phases
1. (1) Roll Forward (Forward Recovery)
2. (2) Rollback (Backward Recovery)
3. Settings to Shorten Recovery Time (FAST_START_MTTR_TARGET)
4. How to Confirm via the Alert Log
5. Operational Precautions and Prevention
6. FAQ: Frequently Asked Questions
Summary

Quick Execution Steps for Instance Recovery

Instance recovery typically occurs automatically during startup after a failure. The shortest steps a Japanese administrator should take are as follows:

Start the Instance: Execute the STARTUP command from SQL*Plus or a similar tool.
Wait for Automatic Recovery: Oracle verifies the consistency of the control files and automatically initiates recovery.
Confirm Completion: Monitor the alert log for messages such as Completed redo application or Database opened.

1. What is Instance Failure? Causes and Impact

Instance failure refers to a state where the SGA (memory structure) or background processes stop unexpectedly, making the database inaccessible.

Major Causes

Power Issues: Sudden power failure of the server.
OS Failure: System crashes such as a kernel panic.
Hardware Failure: Abnormalities in memory or CPU.
Inappropriate Operations: SHUTDOWN ABORT by an administrator or forcing a process termination (kill) at the OS level.

Impact on the Database

Because the system stops before data in memory (Buffer Cache) is written to disk (Data Files), “inconsistency” occurs where committed data may be lost or uncommitted data remains. The role of recovery is to resolve this.

2. Mechanism of Instance Recovery: Two Phases

Oracle automatically restores consistency during the next startup using “Online Redo Logs” and “UNDO Data.”

(1) Roll Forward (Forward Recovery)

Purpose: To reproduce committed data up to the moment of failure.
Mechanism: Applies all change history recorded in the REDO log files to the data files. At this point, uncommitted data is also included.

(2) Rollback (Backward Recovery)

Purpose: To undo uncommitted transactions that had not finished at the time of the failure.
Mechanism: Uses information from the UNDO tablespace to return incomplete processes to their pre-failure state. This ensures complete consistency.

3. Settings to Shorten Recovery Time (FAST_START_MTTR_TARGET)

In large-scale systems, recovery can take time. This is controlled by the FAST_START_MTTR_TARGET parameter.

Unit: Seconds
Default: 0 (Automatically calculated)
How it works: By setting this value to a shorter duration, Oracle increases the frequency of Checkpoints (writing from memory to disk) to reduce the amount of REDO logs that must be read during recovery.

Configuration Example (Targeting a 30-second recovery):

-- Apply to the entire system (dynamically changeable)
ALTER SYSTEM SET FAST_START_MTTR_TARGET = 30;

Note: If the target value is set too short, the write load of the DBWR (Database Writer) increases, which may impact performance during normal operations.

4. How to Confirm via the Alert Log

Recovery progress can be checked in real-time in the alert log.

Log Check Command (Linux Example):

tail -f $ORACLE_BASE/diag/rdbms/<DB_NAME>/<INSTANCE_NAME>/trace/alert_<INSTANCE_NAME>.log

5. Operational Precautions and Prevention

Instance recovery assumes that the “disks (data files) are not corrupted.”

Preparing for Media Failure: If the disk itself fails, instance recovery cannot fix it. Ensure you obtain backups using RMAN (Recovery Manager) or similar tools.
Resource Monitoring: Thoroughly monitor OS resources to prevent instance termination due to memory exhaustion.
Restoration: While recovery itself is automatic, if specific data is unintentionally missing (because it was uncommitted) after recovery, you must consider Point-in-Time Recovery from a backup.

6. FAQ: Frequently Asked Questions

Q: Can I execute instance recovery manually? A: No. Oracle executes it automatically when it detects inconsistency during startup. There is no command to start it manually.

Q: Can the database be used during rollback? A: Yes. In recent Oracle versions (including 19c), “Fast-Start Fault Recovery” functions, allowing the database to open once the roll forward is complete while performing the rollback in the background. This enables early service resumption.

Q: Does recovery run even after SHUTDOWN IMMEDIATE? A: No. Since IMMEDIATE or NORMAL performs a checkpoint during a clean shutdown, recovery does not occur during the next startup.

Summary

Instance recovery is performed automatically during startup after an abnormal termination.
It consists of two stages: Roll Forward (Redo application) and Rollback (Undo application).
The target time can be adjusted with FAST_START_MTTR_TARGET, but be mindful of the overhead.
It cannot handle disk failures, so regular backups are essential.

This article is based on Oracle Database 19c (screens and default values may differ in other versions).

[reference]
Oracle Database Backup and Recovery Reference, 19c