System Recovery | Database Management System

System Recovery | Database Management System

System Recovery: A local failure affects only the transaction in which a failure has occurred. Recovery from such failures has been covered above.

A global failure affects all transactions in progress at the time of failure. Such failure fall into two broad Categories :

  1. System Failure (e.g., Power Failure): This affects all transactions Currently in progress but does not physically damage the database. This failure is called soft Crash.
  2. Media Failure (e.g., Disk Head Crash): It causes damage to the database, or to Some portion of it, and affects those transactions Currently using that portion of the database. A media failure is called hard crash.

Recovery from System Failures

During System Failures, contents of main memory i.e. database buffers are lost. The precise state of the transaction which was in progress at the time of failure is no longer known. Such a transaction would need to be rolled back when the system restarts.

There may be some transactions which might have committed before the System failure but not managed to get their updates transferred from the database buffers to the physical database. Such transactions will need to be redone.

How does the system know at restart which transactions to UNDO and which transactions to REDO?

At some prescribed intervals, typically when prescribed number of entries have been written to the log, the system automatically takes a CHECKPOINT. Taking a CHECKPOINT involves:

  1. Physically writing (force-writing) the contents of the database buffers out to the physical database.
  2. Physical writing a special CHECK POINT RECORD out to the physical log. This CHECK POINT RECORD gives a list of the transactions that were in progress at the time when CHECKPOINT was taken.

There will be four types of transactions:

  1. Transactions, which began and committed before CHECKPOINT. These need no action during RESTART after a failure.
  2. Transactions, which began before or after the CHECKPOINT and COMMITTED after the checkpoint but prior to failure. These need REDO operation at the time of restart after the failure.
  3. Transactions, which began before or after the CHECKPOINT and were still NOT COMITTED at the time of failure. These need UNDO operation at the time of restart after the failure.

Recovery Procedure

At restart time, the system goes through the following procedure:

  1. Start with two lists- the UNDO list and REDO list. Initialize the UNDO list to the list of transactions given in the most recent CHECK POINT RECORD. Initialize the REDO list to empty.
  2. Search forward through the log, starting from the most recent CHECK POINT RECORD.
  3. If a BEGIN TRANSACTION log entry is found for transaction T, add T to the UNDO list.
  4. If a COMMIT log entry is found for transaction T, move T from UNDO list to REDO list.
  5. When the end of log is reached, the UNDO and REDO lists are final.
  6. The system now works backward through the log, undoing the transactions in the UNDO list. This is called BACKWARD RECOVERY.
  7. Then, the system works forward redoing the transactions in the REDO list. This is called FORWARD RECOVERY.

Media Recovery

Media failure is a failure like disk head crash or disk controller failure, in which some portion of the database is physically destroyed. Recovery from such a failure involves reloading the database from a backup copy (dump) and then using the log (both active and archive portions) to REDO all transactions that completed since the backup copy was taken. There is no need to UNDO those transactions, which were in progress at the time of failure, since those have been lost from the database buffers anyway.