Crash RecoveryΒΆ
It is imperative that the pipeline is robust against potential crashes at any point during processing. Here we outline the mechanism (to be) used to accomplish this.
Dangerous situations arising due to a crash in the middle of an operation that:
- creates or updates a file
- must create/update multiple files in a consistent way
- creates/updates file(s) and marks that in the database
- must perform several database modifications in a consistent way
The recovery system maintains recovery information in a database table, named
RecoveryInformation
as well as original copies of files being updated in a
temporary directory.
The table gets updated before any risky operation starts with information on how
to restore the filesystem to the state before the operation begins. Database
updates are performed only at the very end of the operation. The update is
performed through a database session which combines that with the removing of
the relevant recovery information from RecoveryInformation
.
In order to allow recovery from both new files and file updates, the
RecoveryInformation
maintains a list of updated files as well a copy of the
original file in a temporary location. For new files the original file entry is
NULL
.
The order of operations is as follows:
- The original files are copied to the temporary location
- The list of final and original copies is added to the database in a single session.
- The operation is performed
- The databese is updated as required by the operation and to delete the entries in
RecoveryInformation
created during step 1.- The temporary copies of the original files are deleted.
Upon starting of the pipeline recovery proceeds as follows:
- Any file in the temporary directory not listed in the
RecoveryInformation
table are deleted.- Any updated files listed in
RecoveryInformation
are overwritten by their originals.- The
RecoveryInformation
table is emptied.- The temporary directory is emptied.
This way if interruption occurs during:
- step 1: any copies of original files made will be deleted during recovery step 1 and the state of the database and file system will be exactly as it was before the operation started.
- step 2: the database will be automatically rolled back by the database and again file system and database will be in their original states.
- step 3: any updates to the file system will be undone by recovery step 2 and step 3 will restore the database.
- step 4: The database update will be rolled back by the database and we will be back to the case above.
- step 5: the database will already have the final state for the operation and recovery step 1 will take the file system to the desired final state.