Memory Management

Operating systems:

  • uses intelligently size containers
    • memory pages of segments
  • Not all parts are needed at once
    • tasks operate on subset of memory
  • Optimized for performance
    • reduce time to access state in memory
      • leads to better performance!

Memory Management Goals


Virtual vs Physical memory

  • Allocate
    • allocation, replacement
  • Arbitrate
    • address translation and validation

Page-based Memory Management

  • Allocate => pages => page frames
  • Arbitrate => page tables

Segment-based Memory Management

  • Allocate => segments
  • Arbitrate => segment registers

Hardware Support


Memory Management Unit (MMU)

  • translate virtual to physical address
  • reports faults (illegal access, permission, not present in memory)


  • pointers to page tables
  • base and limit size, number of segments


  • Translation lookaside buffer
  • Valid VA-PA translations using TLB


  • Actual PA generation done in hardware

Page Tables


  • OS creates page table per process
  • On context switch, switch to valid page table
  • Updates register that points to correct page table. E.g CR3 on x86 architecture

Page Table Entry (PTE)



  • Present (valid/invalid)
  • Dirty (written to)
  • Accessed (for read or write)
  • Protection bits => RWX

Page Table Entry on x86



  • Present
  • Dirty
  • Accessed
  • R/W permission bit 0: R only, 1: R/W
  • U/S permission bit 0: usermode, 1: superviser mode only
  • others: caching related info (write through, caching disabled)
  • unused: for future use

Page faults


Page Table Size


  • 32 bit architecture
    • Page Table Entry (PTE) = 4 Bytes, including PFN + flags
    • Virtual Page Number (VPN) = 2^32/page_size
    • Page size = 4KB (...8KB, 2MB, 4MB, 1GB)

Therefore Page Table Size = (2^32 * 2^12)*4B = 4MB (per process)

  • for 64 bit architecture
    • Page Table Entry (PTE) = 8 Bytes
    • Page size = 4KB

Page Table Size = (2^64 * 2^12)*8B = 32PB (per process!)

  • processes don't use entire address space
  • even on 32 bit architecture, it will not always use all 4GB

But Page Table assumes an entry per VPN regardless, of whether corresponding virtual memory is needed or not.

Hierarchical Page Tables


On malloc, a new internal page table may be allocated.

Address split:

Page Number offset
P1 P2 d
12 10 10
  • inner table addresses => 2^10 * page_size = 2^10*2^10 = 1MB
  • don't need an inner table for each 1MB virtual memory gap

Additional Layers

  • page table directory pointer (3rd level)
  • page table directory map (4th level)

  • Important on 64 bit architectures

  • larger and more sparse => larger gaps would save more internal page table components


Tradeoffs of Multilevel Page Tables


  • Smaller internal page tables/directories
  • Granularity of coverage
    • Potentially reduced page table size


  • More memory accesses required for translation
  • increased translation latency

Overheads of Address Translation

For each memory reference :

Single level page table Four level page table
x1 access to PTE x4 accesses to PTE
x1 access to mem x1 access to mem

which results in slowdown.

Page Table Cache


Translation Lookaside Buffer

  • MMU level address translation cache
  • On TLB miss => page table access from memory
  • has protection/validity bits
  • small number of cached address => high TLB hit rate

    • temporal and spatial locality
  • Example

    • x86 Core i7
      • per core : 64-entry data TLB
        128-entry instruction TLB
      • 512-entry shared second-level TLB

Inverted Page Tables


Hashing Page Tables



Segmentation is the process of mapping virtual to physical memory using segments.

  • Segments: arbitrary granularity (size)
    • e.g. code, heap, data, stack..
    • address = segment - selector + offset
  • Segment
    • contiguous physical memory
    • segment size = segment base + limit registers


Segmentation + Paging


Page Size

  • 10 bit offset => 1 KB page size [2^10]
  • 12 bit offset => 4 KB page size [2^12]

In real world examples,

  • Linux/x86 : 4 KB, 2MB, 1GB
  • Solaris/Sparse: 8kB, 4MB, 2GB
page size 2 MB
offset bits 21 bits
reduction factor on page table size x512


  • larger pages
    • fewer page table entries, smaller page tables, more TLB hits


  • internal fragmentation => wastes memory

Memory Allocation

  • Memory allocator

    • determines VA to PA mapping
    • address translation, page tables => simply determine PA from VA and check validity/permissions
  • Kernel Level Allocators

    • kernel state, static process state
  • User Level Allocators
    • dynamic process state (heap), malloc/free
    • e.g. d/malloc, jemalloc, Hoard, tcmalloc

Demand Paging

  • Virtual Memory >> Physical Memory
    • virtual memory page is not always in physical memory
    • physical page frame saved and restored to/from secondary storage

Demand paging:

  • pages swapped in/out of memory & a swap partition (e.g. on a disk)


  • Original PA != PA after swapping
    • if page is "pinned", swapping is disabled

When pages should be swapped?

  • page(out) daemon
  • when memory usage is above threshold
  • when CPU usage is below threshold

Which page should be swapped out?

  • pages that won't be used
  • history based prediction
    • Least Recently Used (LRU policy). Access bit tracks if page is referenced.
  • page that don't need to be written out
    • Dirty bit to track if modified
  • avoid non-swappable pages


  • Failure and Recovery management technique
    • periodically save process state
    • failure may be unavoidable but can restart from checkpoint, so recovery would be faster

Simple Approach

  • pause and save

Better Approach

  • write-protect and copy everything at once
  • copy diffs of dirties pages for incremental checkpoints
    • rebuild from multiple diffs, or in background

Checkpointing can also be used in other services:

  • Debugging

    • Rewind-Replay
    • rewind = restart from checkpoint
    • gradually go back to earlier checkpoints until error is found
  • Migration

  • continue on another machine
  • disaster recovery
  • consolidation
  • repeated checkpoints in a fast loop until pause and copy becomes acceptable (or unavoidable)