This is a course note of an online course I took relating to the Operating System on educative.io.
Chapter 5 Introduction to FSCK and Journaling
Crash Consistency
One major challenge faced by a file system is how to update persistent data structures despite the system crash.
Crash scenario
When appending to the file, there are 3 on-disk structures: the inode (which must point to the new block and record the new larger size due to the append), the new data block, and a new version of the data bitmap.
With only a single write succeeds, there are 3 possible outcomes:
- Just the data block is written to disk. In this case, the data is on disk, but there is no inode that points to it and no bitmap saying the block is allocated.
- Just the updated inode is written to disk. In this case, the inode points to the disk address where the data block was about to be written, but the data block has not yet been written there. Thus, if we trust that pointer, we will read garbage data from the disk. Further, we have a new problem, which we call a file-system inconsistency. The on-disk bitmap is telling us that data block has not been allocated, but the inode is saying that it has.
- Just the updated bitmap is written to disk. In this case, the bitmap indicates that block is allocated, but there is no inode that points to it. Thus the file system is inconsistent again; if left unresolved, this write would result in a space leak, as block would never be used by the file system.
With two writes succeed and the last one fails:
- The inode and bitmap are written to disk, but not data. In this case, the file system metadata is completely consistent, but the data block has garbage in it again.
- The inode and the data block are written, but not the bitmap . In this case, the inode points to the correct data on disk, but have an inconsistency between the inode and the old version of the bitmap.
- The bitmap and data block are written, but not the inode. In this case, there is inconsistency between the inode and the data bitmap.
Summary:
- inconsistency in file system data structures.
- space leaks
- garbage data
Solution: FSCK(file system consistency check)
The system utility fsck (file system consistency check) is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux, macOS, and FreeBSD.
Here is a basic summary of what fsck
does:
- Superblock Checks
- File System and Inode List Size Checks
- Free Block Checks
- Free Inode Checks
- Inodes
- Link Count Checks: Each inode contains a count of the number of directory entries linked to it. The fsck program verifies the link count of each inode by examining the entire directory structure, starting from the root directory, and calculating an actual link count for each inode.
- Duplicate Block Checks
- And more!
As you can see, fsck
are very complex and slow. With a very large disk volume, scanning the entire disk can take hours.
Solution 2: Journaling (or Write-Ahead Logging)
The basic idea is as such: when updating the disk, before overwriting the structures in place, first write down a little note describing what you are about to do. This is the “write ahead”, and it is structured as a “log”; hence, write-ahead logging. By writing the note to disk, if a crash takes places during the update, you can refer to the note you made and try again.
Data journaling: Journaling is a technique for fault tolerance in file systems. It works by keeping track of all changes in a log (a “journal”) before committing the changes themselves to disk.
- Journal write: Write the contents of the transaction to the log; wait for these writes to complete.
- Journal commit: Write the transaction commit block to the log; wait for write to complete; the transaction is said to be committed.
- Checkpoint: Write the contents of the update to their final on-disk locations.
Other alternatives
Soft Updates
Soft updates work by tracking and enforcing dependencies among updates to file system meta-data. For example, by writing a pointed-to data block to disk before the inode that points to it, we can ensure that the inode never points to garbage, in this way, the only inconsistency is a storage leak. To recover from these leaks, the free space map is reconciled against a full walk of the file system at next mount.
Copy-on-writes
Full copy-on-write file systems avoid in-place changes to file data by writing out the data in newly allocated blocks, followed by updated metadata that would point to the new data and disown the old ones. This has the same correctness-preserving properties as a journal, without the write-twice overhead.
Backpointer-based consistency
In backpointer-based consistency, no ordering is enforced between writes. To achieve consistency, an additional back pointer is added to every block in the system; for example, each data block has a reference to the inode to which it belongs. When accessing a file, the file system can determine if the file is consistent by checking if the forward pointer (e.g., the address in the inode or direct block) points to a block that refers back to it.
Optimistic crash consistency
This new approach issues as many writes to disk as possible by using a generalised form of the transaction checksum, and includes a few other techniques to detect inconsistencies should they arise.
That’s so much of it!
Happy Reading!