This is a course note of an online course I took relating to the Operating System on educative.io.
Chapter 3. File and Directory
A persistent-storage device, such as a classic hard disk drive or a more modern solid-state storage device, stores information permanently . Unlike memory, whose contents are lost when there is a power loss, a persistent-storage device keeps such data intact.
Two key abstractions have developed over time in the virtualisation of storage.
- File : A file is simply a linear array of bytes, each of which you can read or write. Each file has some kind of low-level name which is often referred to as its inode number.
- Directory: A directory, like a file, also has a low-level name (i.e., an inode number), but its contents are quite specific: it contains a list of pairs of files (user-readable name → low-level name). Each entry in a directory refers to either files or other directories. By placing directories within other directories, users are able to build an arbitrary directory tree.
File Descriptor
The file descriptor is an integer number that uniquely represents an opened file in operating system. The file descriptor points to an entry in the kernel’s global file table. The file table entry contains information such as the inode of the file, byte offset, and the access restrictions for that data stream (read-only, write-only, etc.).Thus, once a file is opened, you use the file descriptor to read or write the file.
Inode: Short for index node, an inode is a database that describes the file/directory attributes such as metadata and the physical location on the hard drive.
Hard Links vs. Symbolic Links
Hard links and symbolic links are two different methods to refer to a file in the hard drive.
- Hard link: A hard link is a direct reference to a file via its inode. By using a hardlink, you can change the original file’s contents or location and the hardlink still points to the original file because its inode is still pointing to that file. In addition, hardlinks can only refer to files within the same volume otherwise symbolic links will be needed. The inode is only deleted when all links to the inode have been deleted.
- Symbolic link: Symbolic links are essentially shortcuts that reference to a file instead of its inode value. This method can be applied to directories and can reference across different hard disks/volumes. When the original file is placed into a different folder it will break the symbolic link, or create a dangling link.
Access Control List vs. Permission Bits
Most file systems have mechanisms to enable and disable sharing. A rudimentary form of such controls are provided by permissions bits from POSIX. Whereas, the more sophisticated access control lists allow for more precise control over exactly who can access and manipulate information.
POSIX permissions allow you to set permissions only for the Owner, one Group, and Others. ACLs give you the additional option to set permissions for multiple individuals and multiple groups for a shared item. ACLs also have more types of permissions.
In computer security, an access-control list (ACL) is a list of permissions associated with a system resource (object).
- Filesystem ACLs━filter access to files and/or directories. Filesystem ACLs tell operating systems which users can access the system, and what privileges the users are allowed.
- Networking ACLs━filter access to the network. Networking ACLs tell routers and switches which type of traffic can access the network, and which activity is allowed.
Mounting and Unmounting File Systems
Before you can access the files on a file system, you need to mount the file system. Mounting a file system attaches that file system to a directory (mount point) and makes it available to the system. The root (/) file system is always mounted. Any other file system can be connected or disconnected from the root (/) file system.
File System Structure
- Blocks: is a sequence of bytes or bits, usually containing some whole number of records. Blocked data is normally stored in a data buffer, and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data-stream.
- Data Regions: The region of the disk for user data is the data region. All the regions owned by a process are linked in a simple list.
- Inodes Table: To accommodate inodes, this portion of the disk the inode table, which simply holds an array of on-disk inodes.
- Allocation structures : A way to track whether inodes or data blocks are free or allocated. Bitmap is used for this, one for the data region ( data bitmap), and one for the inode table ( inode bitmap). A bitmap is a simple structure: each bit is used to indicate whether the corresponding object/block is free (0) or in-use (1).
- Superblock : The superblock contains information about this particular file system. When mounting a file system, the operating system will read the superblock first, to initialise various parameters, and then attach the volume to the file-system tree.
Caching and Buffering
Reading and writing files can be expensive, incurring many I/Os to the disk. To remedy this, most file systems use system memory (DRAM) to cache important blocks.
Dynamic random-access memory (dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell consisting of a tiny capacitor and a transistor.
Speeding up reads: Early file systems thus introduced a fixed-size cache to hold popular blocks.
Speeding up writing: Write buffering is used. First, by delaying writes, the file system can batch some updates . Also, some writes are avoided altogether by delaying them, e.g. create and then delete. So systems simply force writes to disk, by calling fsync()
, by using direct I/O interfaces that work around the cache, or by using the raw disk interface and avoiding the file system altogether.
Disk failure mode
- Latent-sector errors (LSEs): LSEs arise when a disk sector has been damaged.LSEs are a critical factor in data reliability, since a single LSE can lead to data loss when encountered during RAID re- construction after a disk failure.
- Block corruption: There are also cases where a disk block becomes corrupt in a way not detectable by the disk itself.
That’s so much of it!
Happy Reading!