In the operation of Oracle Database, “Instance Failure” is an unavoidable occurrence. When a database goes down due to a sudden power outage or process termination, Instance Recovery is the mechanism that automatically restores the database while maintaining consistency.
This article explains the fundamental knowledge that DBAs should know, from the causes of instance failure to the internal mechanisms of Redo/Rollback and the parameters that control recovery time.
- Quick Execution Steps for Instance Recovery
- 1. What is Instance Failure? Causes and Impact
- 2. Mechanism of Instance Recovery: Two Phases
- 3. Settings to Shorten Recovery Time (FAST_START_MTTR_TARGET)
- 4. How to Confirm via the Alert Log
- 5. Operational Precautions and Prevention
- 6. FAQ: Frequently Asked Questions
- Summary
Quick Execution Steps for Instance Recovery
Instance recovery typically occurs automatically during startup after a failure. The shortest steps a Japanese administrator should take are as follows:
- Start the Instance: Execute the
STARTUPcommand from SQL*Plus or a similar tool. - Wait for Automatic Recovery: Oracle verifies the consistency of the control files and automatically initiates recovery.
- Confirm Completion: Monitor the alert log for messages such as
Completed redo applicationorDatabase opened.
1. What is Instance Failure? Causes and Impact
Instance failure refers to a state where the SGA (memory structure) or background processes stop unexpectedly, making the database inaccessible.
Major Causes
- Power Issues: Sudden power failure of the server.
- OS Failure: System crashes such as a kernel panic.
- Hardware Failure: Abnormalities in memory or CPU.
- Inappropriate Operations:
SHUTDOWN ABORTby an administrator or forcing a process termination (kill) at the OS level.
Impact on the Database
Because the system stops before data in memory (Buffer Cache) is written to disk (Data Files), “inconsistency” occurs where committed data may be lost or uncommitted data remains. The role of recovery is to resolve this.
2. Mechanism of Instance Recovery: Two Phases
Oracle automatically restores consistency during the next startup using “Online Redo Logs” and “UNDO Data.”
(1) Roll Forward (Forward Recovery)
- Purpose: To reproduce committed data up to the moment of failure.
- Mechanism: Applies all change history recorded in the REDO log files to the data files. At this point, uncommitted data is also included.
(2) Rollback (Backward Recovery)
- Purpose: To undo uncommitted transactions that had not finished at the time of the failure.
- Mechanism: Uses information from the UNDO tablespace to return incomplete processes to their pre-failure state. This ensures complete consistency.
3. Settings to Shorten Recovery Time (FAST_START_MTTR_TARGET)
In large-scale systems, recovery can take time. This is controlled by the FAST_START_MTTR_TARGET parameter.
- Unit: Seconds
- Default: 0 (Automatically calculated)
- How it works: By setting this value to a shorter duration, Oracle increases the frequency of Checkpoints (writing from memory to disk) to reduce the amount of REDO logs that must be read during recovery.
Configuration Example (Targeting a 30-second recovery):
-- Apply to the entire system (dynamically changeable)
ALTER SYSTEM SET FAST_START_MTTR_TARGET = 30;
Note: If the target value is set too short, the write load of the DBWR (Database Writer) increases, which may impact performance during normal operations.
4. How to Confirm via the Alert Log
Recovery progress can be checked in real-time in the alert log.
Log Check Command (Linux Example):
tail -f $ORACLE_BASE/diag/rdbms/<DB_NAME>/<INSTANCE_NAME>/trace/alert_<INSTANCE_NAME>.log
Major Messages Output to the Log: | Message | Meaning | | :— | :— | | Instance Recovery started | Start of instance recovery | | Redo applied up to change number <SCN> | Roll forward (Redo application) is in progress | | Incomplete recovery (Warning) | Recovery failed. Manual intervention may be required | | Instance Recovery completed | Recovery completed successfully | | Database opened | Database opened successfully and is available |
5. Operational Precautions and Prevention
Instance recovery assumes that the “disks (data files) are not corrupted.”
- Preparing for Media Failure: If the disk itself fails, instance recovery cannot fix it. Ensure you obtain backups using RMAN (Recovery Manager) or similar tools.
- Resource Monitoring: Thoroughly monitor OS resources to prevent instance termination due to memory exhaustion.
- Restoration: While recovery itself is automatic, if specific data is unintentionally missing (because it was uncommitted) after recovery, you must consider Point-in-Time Recovery from a backup.
6. FAQ: Frequently Asked Questions
Q: Can I execute instance recovery manually? A: No. Oracle executes it automatically when it detects inconsistency during startup. There is no command to start it manually.
Q: Can the database be used during rollback? A: Yes. In recent Oracle versions (including 19c), “Fast-Start Fault Recovery” functions, allowing the database to open once the roll forward is complete while performing the rollback in the background. This enables early service resumption.
Q: Does recovery run even after SHUTDOWN IMMEDIATE? A: No. Since IMMEDIATE or NORMAL performs a checkpoint during a clean shutdown, recovery does not occur during the next startup.
Summary
- Instance recovery is performed automatically during startup after an abnormal termination.
- It consists of two stages: Roll Forward (Redo application) and Rollback (Undo application).
- The target time can be adjusted with
FAST_START_MTTR_TARGET, but be mindful of the overhead. - It cannot handle disk failures, so regular backups are essential.
This article is based on Oracle Database 19c (screens and default values may differ in other versions).
[reference]
Oracle Database Backup and Recovery Reference, 19c


コメント