- Cache and Translation Lookup Buffers are two mechanisms that can help boost efficiency a lot due to the decrease in lookup time for the addresses
Handling the Hardware Cache
- Hardware cache is addressed by cache lines
- L1_CACHE_BYTES macro yields the size of a cache line in bytes
- Pentium 4: 128, Pre Pentium 4: 32
- L1_CACHE_BYTES macro yields the size of a cache line in bytes
- Cache hits can be maximized by doing the following
- Putting the most frequently used fields of a data structure in low offsets in the data structures so they can be cached on the same line
- Large data structures are stored in a way such that all cache lines are used uniformly
- Cache sync is done by the hardware and not the kernel on x86 thus all cache flushes are performed by hardware
- On hardware that does not support it the kernel will perform cache sync
Handling the TLB
- The kernel decides when the mapping between a linear and physical address is invalid therefore kernel will perform TLB flushing
- Therefore processors cannot sync TLB cache automatically
- Processors will usually only offer very limited TLB flushing methods (Intel=2)
- All Pentium models flush all TLB entries in non global pages when cr3 is reloaded
- Pentium Pro and later use the invlpg assembly instruction to invalidate a single TLB entry mapping given a linear address
- The following functions use the assembly instructions mentioned above
- These macros also are very important in implementing architecture independent TLB flushing

- flush_tlb_pgtables method is missing from Table 2-12: in the 80 × 86
architecture nothing has to be done when a page table is unlinked from its parent
table, thus the function implementing this method is empty
TLB Flushing
- The CPU running the function sends a interprocessor interrupt to all other CPUs which forces them to run the TLB invalidating function
- In general process switch=TLB invalidating time for local page tables
- Kernel assigns a page frame to a User Mode process and stores its physical address into a Page Table entry.
- it must flush any local TLB entry that refers to the corresponding linear address
- There are some exceptions where the kernel will not flush TLB
- Switching between 2 user mode processes that share the same page tables
- Switching between a user mode process and a kernel thread
- Ch9: Kernels do not have their own page tables
- No kernel thread will access User mode address space
Lazy TLB Mode
- If CPUs are sharing page tables lazy mode will delay flushing on as many CPUs as long as possible
- Case User(Non lazy)->Kernel Thread(Lazy)->User(Different Page Table Non lazy)
- When a CPU begins running a kernel thread it enables lazy TLB mode
- When the CPU switches back to a regular process with a different set of page tables hardware auto flushes TLB
- Kernel sets CPU to non lazy TLB mode
- Case Kernel Thread(Lazy)->User (Same Page Tables Non Lazy)
- If the new user process has the same page tables..
- Any deferred TLB invalidation must be done by the kernel
- The kernel achieves this by invalidating all non-global TLB entries (reload cr3)
Data structures for Lazy TLB Mode
- cpu_tlbstate: Part of NR_CPUS structure (# cpus default 32)
- state field is set to TLB_LAZY when entering
- state field is set to TLB_OK when leaving
- NR_CPUS consists of active_mm field pointing to all active memory descriptors and TLB_OK/LAZY
- cpu_vm_mask: Part of the active memory descriptor
- stores the indices of all CPUs in the system including the one that is entering in lazy TLB mode
- When a CPU wants to invalidate TLB entries of all CPUs just send interprocessor interupt to all CPUs in this field of the memory
- When a CPU receives an Interprocessor Interrupt for TLB flushing and verifies that it affects the set of page tables of its current process
- It checks whether the state field of its cpu_tlbstate element is equal to TLBSTATE_LAZY.
- In this case, the kernel refuses to invalidate the TLB entries and removes the CPU index from the cpu_vm_mask field of the memory descriptor. This has two consequences:
- As long as the CPU remains in lazy TLB mode, it will not receive other interprocessor Interrupts related to TLB flushing
- If the CPU switches to another process that is using the same set of page tables
as the kernel thread that is being replaced, the kernel invokes _ _flush_tlb() to
invalidate all non-global TLB entries of the CPU
- In this case, the kernel refuses to invalidate the TLB entries and removes the CPU index from the cpu_vm_mask field of the memory descriptor. This has two consequences: