Operating Systems 101: Virtualisation — Memory Allocation, Segmentation, Paging and TLB(v)
This is a course note of an online course I took relating to the Operating System on educative.io.
Free Space Management
Low level mechanism
- Splitting: it will find a free chunk of memory that can satisfy the request and split it into two. It will return the first chunk to the caller; the second chunk will remain on the list.
- Coalescing: when returning a free chunk in memory, look at the addresses of the chunk returning as well as the nearby chunks of free space; if the newly- freed space sits right next to one existing free chunks, merge them into a single larger free chunk.
- Best fit : Search through the free list and finds chunks of free memory that are as big or bigger than the requested size. Then, return the one that is the smallest in that group of candidates. Yet this is costly to implement.
- Worst fit :Opposite of the best fit. Find the largest chunk and return the requested amount, and keep the remaining chunk on the free list. Yet this is costly to implement.
- First fit : Finds the first block that matches and returns the requested amount. The remaining free space is kept. First fit has the advantage of speed — no exhaustive search of the entire free spaces is needed — but it might pollutes the beginning of the free list with small unused chunks.
- Next fit: Instead of always beginning search from the beginning of the list, the next fit keeps an extra pointer to the location within the list where one is looking last. The idea is to spread the searches for free space throughout the list more uniformly, thus avoiding pollution.
- Buddy allocation: Some approaches have been designed around making coalescing simple. One good example is found in the binary buddy allocator. In such a system, free memory is one big space of size 2^N . When a request for memory is made, the search for free space recursively divides free space by two until a block that is big enough to accommodate the request is found . At this point, the requested block is returned. When block is returned, the allocator checks whether the “buddy” is free; if so, it coalesces the two blocks into a bigger block. This recursive coalescing process continues up the tree, either restoring the entire free space or stopping when a buddy is found to be in use.
How do we support a large address space without wasting a lot of free space between the stack and the heap?
The solution — have a base and bounds pair per logical segment of the address space. 3 logically-different segments: Code / Stack/Heap.
With a base and bounds pair per segment, we can place each segment independently in physical memory. So only used memory is allocated space in physical memory.
When an illegal address that is beyond the end of the heap is referenced, the hardware detects that, traps into the OS, likely leading to the termination of the offending process — the segmentation violation or segmentation fault.
Which Segment Are We Referring To?
The hardware uses segment registers during translation. How does it know the offset into a segment, and to which segment an address refers?
- Explicit approach: One common approach is to pick the address space into segments based on the top few bits of the virtual address. E.g., if the top two bits are 0000, the virtual address is in the code segment, if the top two bits are 01, the address is in the heap
- Implicit approach: in this way, the hardware determines the segment by its origin, e.g. if the address is from the program counter, then the address is within the code segment; if the address is from the stack or base pointer, it is in the stack segment.
Segmentation for stack
Apart from base and bounds values, a 3rd indicator is needed to know if the stack grows or decreases in size.
Support for Sharing
Code sharing: To save memory, it is useful to share certain memory segments between address spaces.
Protection bits: To support sharing, we need protection bits, which adds a few bits per segment, indicating whether or not a program can read or write a segment, or perhaps execute code that lies within the segment.
Fine-Grained vs. Coarse-Grained Segmentation
- Coarse-grained: address space split into relatively large, coarse chunks of segmentations
- Fine-grained : address space split into relatively small, many chunks of segmentation. This may need segment table
OS Support for Segmentation
- Context switch: During context switch, the OS must save and restore segment registers
- How to deal with the growth or shrinkage of a segment dynamically: A program may call
malloc()to allocate an object. If the heap segment needs to grow →the memory-allocation library performs a system call
sbrk()to grow the heap →the OS provides more space, updating the segment size register to the new size →the library then allocates space for the new object.
- Management of external fragmentation:Segmentation can lead to free space getting chopped into small and fragmented sizes and thus difficult to allocate any segment anymore in contiguous segment. To solve that, there’s two approaches:
- Compacting physical memory
2. Free-list management algorithms, ie. keep large chunk of memory available for allocation. There are different approaches, like best-fit ( returns the one closest in size that match the requested size), worst-fit, first-fit, etc..
To solve the fragmentation problem with segmentation, we can split the memory space into fixed-sized pieces — paging. Each fixed-sized unit is a page, and the physical memory is an array of fixed-sized slots called page frames, and the pages of the virtual address space have been placed at different locations throughout physical memory.
A page table stores virtual-to-physical address translations (Virtual Page 0 → Physical Frame 3), (VP 1 → PF 7), (VP 2 → PF 5), and (VP 3 → PF 2).Because each address space requires such translations, in general, there is a one page table per process in the system.
To translate this virtual address that the process generated, we have to first split it into two components: the virtual page number (VPN), and the offset within the page. We also need physical frame number (PFN).
Storage of page table: Page tables require large memory to store the information, hence we don’t store it in the hardware, but in memory for each process.
Page Table Entry
In a linear page table, which is just an array. The OS indexes the array by the virtual page number (VPN) and looks up the page-table entry (PTE) at that index in order to find the desired physical frame number (PFN).
A typical PTE will contain:
- Page Frame Number: the number of bits for current frame, which is based on the size of physical memory/frame size.
- Valid bits indicates whether the particular translation is valid. For example, when a program starts running, all the unused space in-between heap and stack will be marked invalid.
- Protection bits :indicates whether the page could be read from, written to, or executed from.
- Present bit indicates whether this page is in physical memory or on disk (i.e., swapped out).
- Dirty bit indicates whether the page has been modified since it was brought into memory.
- Reference bit (a.k.a. accessed bit) is used to track whether a page has been accessed, and is useful in determining which pages are popular and thus should be kept in memory.
Translation-Lookaside Buffers (TLB)
Using paging can lead to high performance overheads. To speed address translation, we need a translation-lookaside buffer, or TLB.
A TLB is part of the chip’s memory-management unit (MMU), and is simply a hardware cache of virtual-to-physical address translations. Upon each virtual memory reference, the hardware first checks the TLB to see if the desired translation is held therein; if so, the translation is performed without having to consult the page table that holds all translations).
TLB basic algorithm
- TLB hit: first, extract the virtual page number (VPN) from the virtual address, and check if the TLB holds the translation for this VPN. If it does, we have a TLB hit
- TLB miss:If the CPU does not find the translation in the TLB (a TLB miss), we need to update the TLB, and the hardware retries the instruction.
TLB performance and locality
- Spatial locality:
With temporal locality, the idea is that an instruction or data item that has been recently accessed will likely be re-accessed soon in the future. TLB hit rate can be improved in this case. The elements of the array are packed tightly into pages. So if the item is stored and found in one page of TLB, it is highly likely to have closer items also found in the TLB. Also the bigger the size, the fewer misses of the TLB hit.
- Temporal locality:
With spatial locality, the idea is that if a program accesses memory at address x, it will likely soon access memory near x. TLB hit rate would be high because of temporal locality, i.e., the quick re-referencing of memory items in time. This is especially true if the items are accessed repetitively.
However, there is a trade-off between the TLB cache size and the look up speed. In general, if you want a fast cache, it has to be small.
Fully associative cache
TLB is fully associative cache, which is a cache where data from any address can be stored in any cache location. The whole address must be used as the tag. All tags must be compared simultaneously (associatively) with the requested address and if one matches then its associated data is accessed. This requires an associative memory to hold the tags which makes this form of cache more expensive. It does however solve the problem of contention for cache locations (cache conflict) since a block need only be flushed when the whole cache is full and then the block to flush can be selected in a more efficient way.
When switching from one process to another, the hardware and OS must be careful to ensure that the about-to-be-run process does not accidentally use translations from some previously run process on TLB, hence the flush.
- Flushing: One approach is to simply flush the TLB on context switches, thus emptying it before running the next process. However, there is a cost: each time a process runs, it must incur TLB misses. If the OS switches between processes frequently, this cost may be high.
- Address space identifier (ASID): To reduce this overhead, some systems adds an address space identifier (ASID) field, which is an extra bits for each entry in TLB to check if the process that is accessing that entry belongs to the process.
When a new entry is added in the TLB, we need to replace an old one, and thus: which one to replace?
One common approach is to evict the least-recently-used or LRU entry. LRU tries to take advantage of locality, assuming an entry that has not recently been used is a good candidate for eviction.
We can have the best of both world by using Segmented Paging where the memory is divided into segments which are then divided into fixed size pages.
In this way, each Segment has a page table, and there might be three page tables for each process, one for the code, heap, and stack parts of the address space.
The logical address is represented as a Segment Number, a Page number and page offset (an offset within the page frame).
Multi-level Page Tables
A different design will turn the linear page into a multi-level page table. This tackles the problem with invalid regions in the page table.
The basic idea behind a multi-level page table is simple.
- First, split up the page table into page-sized units;
- Then, if an entire page of page-table entries (PTEs) is invalid, don’t allocate that page of the page table at all.
- To track whether a page of the page table is valid, use a new structure, called the page directory. The page directory thus either can be used to tell you where a page of the page table is, or that the entire page of the page table contains no valid pages.
The page directory, in a simple two-level table, contains one entry per page of the page table. It consists of a number of page directory entries (PDE). A PDE has a valid bit and a page frame number (PFN), similar to a PTE.