Virtual Memory: Paging and Address Translation

Virtual memory is a foundational technology that allows modern computers to run multiple programs efficiently and securely. It creates the illusion of a vast, private address space for each process while using a limited amount of physical RAM, enabling process isolation and simplifying program development. At its core, this illusion is maintained by translating virtual addresses, generated by a running program, into physical addresses in actual computer memory through a structured mapping process.

The Core Idea: Paging

The most common implementation of virtual memory uses a technique called paging. In this scheme, both the virtual address space (seen by the process) and the physical memory (RAM) are divided into fixed-size blocks. A virtual block is called a page, and a physical block is called a frame. The system's page size is a power of two, typically 4 KiB (4,096 bytes), though larger sizes like 2 MiB are also used.

The operating system maintains a mapping for each process, specifying which virtual page is stored in which physical frame. This mapping is stored in a data structure called a page table. A crucial point is that not all pages of a process need to be in physical memory at once; pages can be stored on disk (in a swap file or swap space) and brought into RAM only when needed. This is what allows a system with 8 GB of RAM to run programs that collectively require 20 GB of memory.

When a process accesses memory using a virtual address, the hardware must translate it to a physical address. The virtual address is split into two parts: a virtual page number (VPN) and a page offset. The offset indicates the specific byte within the page. Since pages and frames are the same size, the offset from the virtual address is used directly as the offset in the physical frame—no translation is needed for those bits.

The Translation Process and the Page Table

The page table is the master lookup table for address translation. Each process has its own page table. The VPN from the virtual address acts as an index into this table. Each entry in the page table, called a Page Table Entry (PTE), contains the corresponding Physical Frame Number (PFN), along with important status bits.

These status bits are critical for system operation:

Valid Bit: Indicates whether the page is currently in physical memory (valid) or on disk (invalid, causing a page fault).
Protection Bits: Specify if the page is readable, writable, and/or executable.
Present Bit: Often synonymous with the valid bit.
Dirty Bit: Indicates if the page has been modified since it was loaded from disk.
Accessed Bit: Tracks if the page has been read from or written to, used for page replacement algorithms.

The simplified translation formula is: Physical Address = (Physical Frame Number) $\times$ Page Size + Offset

For example, with a 4 KiB ( $2^{12}$ byte) page size, a 32-bit virtual address is split. The lower 12 bits are the offset, and the upper 20 bits are the VPN. If the PTE for a given VPN contains PFN 0x1A3, and the offset is 0x3F4, the physical address is calculated as: (0x1A3 $\times$ 4096) + 0x3F4 = 0x1A3000 + 0x3F4 = 0x1A33F4.

Multi-Level Page Tables and Their Size

A single, flat page table for a large address space is inefficient. For a 32-bit address space with 4 KiB pages, a flat table would have $2^{20}$ (over 1 million) entries. If each PTE is 4 bytes, the table would consume 4 MiB of physical memory for every process, even if the process only uses a few pages.

The solution is a multi-level (hierarchical) page table. Think of it like a book's index: you first look up the chapter (first level), then the section within that chapter (second level), and so on. A common two-level scheme splits the 20-bit VPN into, for instance, a 10-bit index into the outer page directory and a 10-bit index into an inner page table.

The major advantage is that if a large range of virtual pages is unused (e.g., the entire "chapter" is empty), the entire inner page table for that region need not be allocated. This saves significant physical memory. Calculating the size of a multi-level page table requires knowing the number of actually allocated inner tables, not the total possible number.

The size of a single-level page table is straightforward: (Number of Virtual Pages) $\times$ (Size of a PTE). For a multi-level table, you calculate the size of the top-level directory plus the sum of the sizes of all allocated second-level tables.

Enabling Protection and Isolation

Beyond providing more memory, virtual memory is essential for memory protection and process isolation. Because each process has its own page table, it can only generate physical addresses that are mapped in its table. Process A cannot name a physical frame assigned to Process B, because that frame's PFN does not appear in Process A's page table. The operating system kernel manages all page tables, acting as a trusted referee.

The protection bits in each PTE enforce security policies at the hardware level. A page marked as read-only will cause a hardware exception if a process tries to write to it. A page marked as non-executable will prevent code execution from that region of memory, a key defense against certain types of malware exploits. This hardware-enforced separation is what prevents a buggy or malicious application from crashing or corrupting other applications or the operating system itself.

Common Pitfalls

Forgetting the Offset in Calculations: A frequent error is to confuse bits allocated for the VPN with the total number of page table entries. Remember, the number of entries is determined by the number of virtual pages, which is $2^{(VPN bits)}$ . The offset bits determine the page size ( $2^{(offset bits)}$ bytes). For a 32-bit address with a 12-bit offset, there are $2^{20}$ VPNs, not $2^{32}$ .

Misunderstanding the TLB's Role: The Translation Lookaside Buffer (TLB) is a hardware cache for recent page table translations. A common mistake is thinking a TLB hit eliminates the need for a page table. It does not; it simply makes the translation faster. The page table is the complete, authoritative mapping stored in memory. The TLB is just a performance-critical copy of the most frequently used parts of it.

Ignoring Page Table Size Overhead: When asked about memory overhead, students often focus only on the process's data pages. You must also account for the memory consumed by the page table structures themselves, especially when considering many simultaneous processes. Multi-level page tables are a direct optimization to reduce this exact overhead.

Confusing Page Faults with Protection Faults: Both cause traps to the operating system, but for different reasons. A page fault occurs when a valid virtual address references a page not currently in physical memory (Valid bit = 0). A protection fault (or segmentation fault) occurs when the access violates the permission bits (e.g., a write to a read-only page) for a page that is in memory.

Summary

Virtual memory via paging provides each process with a large, uniform, and private address space by mapping variable-sized needs onto limited physical RAM and disk storage.
Address translation is the hardware-mediated process of converting a virtual address (VPN + offset) into a physical address (PFN + offset) using a per-process page table.
Multi-level page tables dramatically reduce the physical memory overhead of the page tables themselves by only allocating memory for mappings that are in use.
The system enables process isolation and memory protection because a process can only access physical frames listed in its own page table, with hardware enforcing the permissions (read, write, execute) stored in each Page Table Entry.
Performance relies on caching translations in the TLB and efficiently handling page faults to bring data from disk into RAM when needed.

Virtual Memory: Paging and Address Translation

Virtual Memory: Paging and Address Translation

The Core Idea: Paging

The Translation Process and the Page Table

Multi-Level Page Tables and Their Size

Enabling Protection and Isolation

Common Pitfalls

Summary

Write better notes with AI