Operating Systems 101: Virtualisation — Swapping(vi)
This is a course note of an online course I took relating to the Operating System on educative.io.
Memory swapping is a memory reclamation method wherein memory contents not currently in use are swapped to a disk to make the memory available for other applications or processes.
In operating systems, we generally refer to such disk space as swap space, because we swap pages out of memory to it and swap pages into memory from it. To do so, the OS will need to remember the disk address of a given page.
The way the hardware determines if a page is in disk or in memory is through present bit.
The act of accessing a page that is not in physical memory is commonly referred to as a page fault. Upon a page fault, the OS is invoked to service the page fault using a page-fault handler.
Swap daemon or page daemon
To avoid reach the memory capacity, most OS system has some kind of high watermark (HW) and low watermark (LW) metrics to help decide when to start evicting pages from memory.
- when the OS notices that there are fewer than LW pages available, a background thread that is responsible for freeing memory runs.
- The thread evicts pages until there are HW pages available.
- The background thread — swap daemon or page daemon, then goes to sleep.
When memory comes to full, the OS might evoke one or more pages to make space for the new page. The process of picking a page to kick out is known as the page-replacement policy. The goal in picking a replacement policy for this cache is to minimise the number of cache misses.
- A compulsory miss ( cold-start miss) occurs because the cache is empty, and this is the first reference to the item
- A capacity miss occurs because the cache runs out of space and has to evict an item to bring a new item into the cache.
- A conflict miss occurs in hardware because of limits on where an item can be placed in a hardware cache, due to something known as set-associativity. It does not arise in the OS page cache because such caches are always fully-associative, i.e., there are no restrictions on where in memory a page can be placed.
Replacement Policy Strategy
- FIFO: Pages were simply placed in a queue when they entered the system; when a replacement occurs, the page on the tail of the queue is evicted.
- Random: Simply picks a random page to replace under memory pressure.
- LRU: Any policy as simple as FIFO or Random is likely to have a common problem: it might kick out an important page. And thus, we can use Least-Frequently-Used (LFU) policy to replace the least-frequently-used page when an eviction takes place.
Reference bit: Whenever a page is referenced, the reference bit is set by hardware to 1 from 0.
Dirty bit: We also need to consider whether a page has been modified or not while in memory. If a page has been modified and is thus dirty, it must be written back to disk to evict it, which is expensive. If it has not been modified and is thus clean, the eviction is free. To support this behavior, a dirty bit is used. This bit is set any time a page is written.
Clock algorithm: A clock hand points to some particular page, to begin with. When a replacement must occur, the OS checks if the currently-pointed to page P has a ref bit of 1 or 0.
- If 1, this implies that page P was recently used/modified and thus is not a good candidate for replacement.
- Thus, the use bit for PP set to 0 (cleared), and the clock hand is incremented to the next page (P + 1)(P+1).
- The algorithm continues until it finds a use bit that is set to 0, implying this page has not been recently used.
The clock algorithm could be changed to scan for pages that are both unused and clean to evict first; failing to find those, then for unused pages that are dirty, and so forth.
Other VM Policies
- Page selection policy: the OS also has to decide when to bring a page into memory. For most pages, the OS simply uses demand paging, which means the OS brings the page into memory when it is accessed “on demand”.
- Clustering of writes: Another policy determines how the OS writes pages out to disk. Many systems collect a number of pending writes together in memory and write them to disk in one write. This is called clustering.
This problem arises when memory is simply oversubscribed, and the memory demands of the set of running processes simply exceed the available physical memory. In this case, the system will constantly be paging, a condition sometimes referred to as thrashing.
Admission control: For example, given a set of processes, a system could decide not to run a subset of processes, with the hope that the reduced set of processes’ working sets fit in memory and thus can make progress. This approach, generally known as admission control.
Some other systems take a more drastic approach, aka. out-of-memory killer when memory is oversubscribed; this daemon chooses a memory-intensive process and kills it.
The Linux Virtual Memory System
The Linux Address Space
Much like other modern operating systems, a Linux virtual address space consists of a user portion and a kernel portion. Like those other systems, upon a context switch, the user portion of the currently-running address space changes; the kernel portion is the same across processes. Like those other systems, a program running in user mode cannot access kernel virtual pages; only by trapping into the kernel and transitioning to privileged mode can such memory be accessed.
The Linux page cache is unified, keeping pages in memory from three primary sources:
- memory-mapped files,
- file data and metadata from devices ( accessed by directing
- heap and stack pages that comprise each process (anonymous memory, as there is no named file underneath of it, but rather swap space).
These entities are kept in a page cache hash table, allowing for quick lookup when said data is needed. The page cache tracks if entries are clean or dirty. Dirty data is periodically written to the backing store by background threads (called
pdflush), thus ensuring that modified data eventually is written back to persistent storage. This background activity either takes place after a certain time period or if too many pages are considered dirty.
2Q replacement policy
To decide which pages to kick out of memory, Linux uses a modified form of 2Q replacement, as standard LRU replacement is effective, but can be subverted by certain common access patterns. For example, if a process repeatedly accesses a large file, LRU will kick every other file out of memory.
The Linux version of the 2Q replacement algorithm solves this problem by keeping two lists, and dividing memory between them. When accessed for the first time, a page is placed on one queue ( inactive list); when it is re-referenced, the page is promoted to the other queue (active list ). When replacement needs to take place, the candidate for replacement is taken from the inactive list. Linux also periodically moves pages from the bottom of the active list to the inactive list, keeping the active list to about two-thirds of the total page cache size.
That’s so much of it!