Handling the Hardware Cache and the TLB

  • Cache and Translation Lookup Buffers are two mechanisms that can help boost efficiency a lot due to the decrease in lookup time for the addresses


Handling the Hardware Cache

  • Hardware cache is addressed by cache lines
    • L1_CACHE_BYTES macro yields the size of a cache line in bytes
      • Pentium 4: 128, Pre Pentium 4: 32
  • Cache hits can be maximized by doing the following
    1. Putting the most frequently used fields of a data structure in low offsets in the data structures so they can be cached on the same line
    2. Large data structures are stored in a way such that all cache lines are used uniformly
  • Cache sync is done by the hardware and not the kernel on x86 thus all cache flushes are performed by hardware
  • On hardware that does not support it the kernel will perform cache sync


Handling the TLB

  • The kernel decides when the mapping between a linear and physical address is invalid therefore kernel will perform TLB flushing
  • Therefore processors cannot sync TLB cache automatically


  • Processors will usually only offer very limited TLB flushing methods (Intel=2)
    1. All Pentium models flush all TLB entries in non global pages when cr3 is reloaded
    2. Pentium Pro and later use the  invlpg assembly instruction to invalidate a single TLB entry mapping given a linear address
  • The following functions use the assembly instructions mentioned above
    • These macros also are very important in implementing architecture independent TLB flushing

The Linux kernel uses invlpg in these functions for x86 Intel

  • flush_tlb_pgtables method is missing from Table 2-12: in the 80 × 86
    architecture nothing has to be done when a page table is unlinked from its parent
    table, thus the function implementing this method is empty


TLB Flushing

  • The CPU running the function sends a interprocessor interrupt to all other CPUs which forces them to run the TLB invalidating function
  • In general process switch=TLB invalidating time for local page tables
  • Kernel assigns a page frame to a User Mode process and stores its physical address into a Page Table entry.
    • it must flush any local TLB entry that refers to the corresponding linear address
  • There are some exceptions where the kernel will not flush TLB
    1. Switching between 2 user mode processes that share the same page tables
    2. Switching between a user mode process and a kernel thread
      • Ch9: Kernels do not have their own page tables
      • No kernel thread will access User mode address space


Lazy TLB Mode

  • If CPUs are sharing page tables lazy mode will delay flushing on as many CPUs as long as possible
  •  Case User(Non lazy)->Kernel Thread(Lazy)->User(Different Page Table Non lazy)
    1. When a CPU begins running a kernel thread it enables lazy TLB mode
    2. When the CPU switches back to a regular process with a different set of page tables hardware auto flushes TLB
    3. Kernel sets CPU to non lazy TLB mode
  • Case Kernel Thread(Lazy)->User (Same Page Tables Non Lazy)
    1. If the new user process has the same page tables..
    2. Any deferred TLB invalidation must be done by the kernel
    3. The kernel achieves this by invalidating all non-global TLB entries (reload cr3)


Data structures for Lazy TLB Mode

  • cpu_tlbstate: Part of NR_CPUS structure (# cpus default 32)
    • state field is set to TLB_LAZY when entering
    • state field is set to TLB_OK when leaving
    • NR_CPUS consists of active_mm field pointing to all active memory descriptors and TLB_OK/LAZY
  • cpu_vm_mask: Part of the active memory descriptor
    • stores the indices of all CPUs in the system  including the one that is entering in lazy TLB mode
    • When a CPU wants to invalidate TLB entries of all CPUs just send interprocessor interupt to all CPUs in this field of the memory


  • When a CPU receives an Interprocessor Interrupt for TLB flushing and verifies that it affects the set of page tables of its current process
  • It checks whether the  state field of its cpu_tlbstate element is equal to TLBSTATE_LAZY.
    • In this case, the kernel refuses to invalidate the TLB entries and removes the CPU index from the cpu_vm_mask field of the memory descriptor. This has two consequences: 
      1. As long as the CPU remains in lazy TLB mode, it will not receive other interprocessor Interrupts related to TLB flushing
      2.  If the CPU switches to another process that is using the same set of page tables
        as the kernel thread that is being replaced, the kernel invokes
        _ _flush_tlb() to
        invalidate all non-global TLB entries of the CPU

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s