Category: Linux Kernel

Translation Lookaside Buffers (TLB)

  • TLB: Another cache used to speed up linear->physical address translation
  • Each CPU has its own TLB and TLBs do not need to be synced
    • This is because each CPU may associate different linear addresses with different physical ones]
  • Changing the cr3 register (page directory base address) will invalidate all TLB entries

 

Usage

  • When a linear address is first accessed it gets translated into a physical address (through paging as discussed before)
  • Afterwards it is stored into the TLB such that further accesses will pull the physical address from the TLB instead of computing it again

Hardware Cache (sRAM)

  • Introduced due to clock speed discrepancy in DRAM (dynamic RAM main memory) and CPU clock
  • Based on the locality principle
    • It states due to cyclic structure of programs and data packed into linear arrays addresses close by have a high probability of being used near in the future
  • Thus the concept of static RAM (sRAM) is implemented as on chip memory
  • The cache is subdivided into subsets of lines.  There are several ways of mapping
    • Direct mapping. A line in main memory is stored in exact same location in cache
    • Fully associative. A line in main memory can be stored in any location in cache
    • N-Way Associative. A line in main memory can be stored in N number of places on cache
  • CD flag of cr0 processor register determines whether caching is enabled (1) or not
  • NW flag of cr0 determines write-through or write-back
  • Pentium cache allows each page frame to have its own cache management policy
    • Therefore the page directory and page table entry has 2 extra flags
      • PCD: Page cache disable
      • PWT: Page write-through
    • Linux enables page cache always and always uses write-back

 

Cache Unit: Two Parts

  1. Cache Controller: Essentially the Page directory of cache.  Contains a few flags and a tag. Which is a way of representing the requested memory location (called a tag).
    • Data stored comes in 3 parts in order tag, cache controller subset index, offset within the line
  2. sRAM (hardware cache): The actual data stored

Cache Process

  1. CPU attempts to access RAM.  CPU extracts subset index from physical address.
  2. CPU compares the tags of all lines in the subset with the high order bits of physical address
  3. If a line with the same tag is found CPU has a cache hit (other wise cache miss)

Cache Hits and Misses

  • Cache Hit: When searching for a portion of RAM the CPU finds it in the cache.
    • For reads, the data is extracted from sRAM and put in a register dRAM is not accessed
    • For write the CPU uses one of two strategies write-through and write-back
      • Write-through:  The controller always writes into both RAM and the cache line, effectively switching off the cache for write operations
      • Write-back:  More immediate efficiency, only the cache line is updated and the contents of the RAM are left unchanged. RAM is only written to when the CPU executes a FLUSH command usually after cache misses
  • Cache Miss: When the CPU fails to do the above
    • Cache line will be written into RAM and if neccesary correct line will be fetched from main memory.

Multiprocessor System Cache

  • Each CPU has its own cache
  • If two or more CPUs have the same memory on cache an update of shared cache memory must be updated on all CPUs with that memory
    • This is done using cache snooping

Levels of Cache

  • L1 is the fastest cache and often the smallest L2, L3 and onwards are slower and often larger than L1
  • Linux assumes only one cache

Paging for 64-Bit Architecture

  • Using 4kB page frames again where 4kb is represented by 2^12 (12 bits)
  • In a 64 bit architecture there are 64 bits.  Below is an example paging format
    • Choose to use 48 bits for page  if we use 2 levels and 4kb pages
    • If 12 bits for offset
    • 18 bits per page level (2 levels, 2^18=256000)
  • Different arch make use of bits differently below are some

Capture.PNG

The Physical Page Extension (PAE) Paging Mechanism 

  • 32 bits allow for theoretically up to 4Gb of RAM however due to linear address space requirements only about 1Gb can actually be used  (discussed later)
  • This was solved by increasing the amount of pins for the memory bus from 32 to 36 in addition to this they added PAE to take advantage of the new space alerting
    • 4*2^4=64gb of RAM now
    • 32 linear address bits address 36 physical address bits
  • Enabled by setting the PAE flag in tune cr4 register PS flag in page directory now sets 2mb page frame 
  • 4 address pins added to processors 32->36

Format of PAE

  • 64 GB of RAM are split into 2^24 distinct page frames
    • Physical address field of Page Table entries has been expanded from 20 to 24 bits.
    • PAE Page Table entry must include the 12 flag bits (section “Regular Paging”) , 24 physical address bits, for a grand total of 36
    • Page Table entry size has been doubled from 32 bits to 64 bits.
      • 4-KB PAE Page Table includes 512 entries instead of 1,024
  • Added one more level of paging using the Page Directory Pointer Table (PDPT)
    • This is a table with 4, 64 bit entries 
  • cr3 now contains the 27 bit base address for the PDPT and not the page directory
    • PDPTs are stored in the first 4 GB of RAM and aligned to
      a multiple of 32 bytes (2^
      5), 27 bits are sufficient to represent the base address of such tables (?)

Linear Address Format Using PAE for 4kb Pages

  • cr3: Points to PDPT
  • bits 31-30: 1 of 4 possible entries in PDPT
  • bits 29-21: 1 of 512 possible entries in the page directory
  • bits 20-12: 1 of 512 possible entries in the page table
  • bits 11-0: offset on 4kb page

 

File:X86 Paging PAE 4K.svg

for 4 kb pages

 

Linear Address Format Using PAE for 2mB Pages

  • cr3: points to PDPT
  • bits 31-30: 1 of 4 possible entries in PDPT
  • bits 29-21: 1 of 512 possible entries in page directory
  • bits 20-0: offset to 2mb page
File:X86 Paging PAE 2M.svg

for a 2mb page

Limitations of PAE

  • It is important to note that linear addresses still remain at 32 bits therefore the actual linear address space remained the same size why we enlarged the physical address space
    • Therefore linear addresses will be reused when mapping to different parts of RAM
    • This is only a hack allowing one to use 32 bits to address 64 GB of RAM

Hardware Protection Scheme 

  • Unlike segmentation paging uses a flag user/supervisor to check read write privileges.
  • If the flag is set to 0 page can only be accessed if CPL is < 3 else it can always be accessed 
  • Read write + read are the only modes of access and it is marked in the page directory or the page table entry

Extended Paging

  • Extended paging allows for much larger page frames 4mb instead of 4kb
  • Allows for large amount of contiguous linear addresses to be converted until large sections of contiguous physical addresses.  The kernel can do so without intermediate paging thus saving memory 
  • Allows for preservation of TLB entries

One less level of paging.

  • This changes the format of the linear address
    • Most significant ten bits: Entry in page directory (select page)
    • Lease significant twenty two bits : Offset in page frame
  • Enabled using the PSE flag of the cr4 register both extended and regular paging can be used at the same time