Category: Linux Kernel

The Physical Page Extension (PAE) Paging Mechanism 

  • 32 bits allow for theoretically up to 4Gb of RAM however due to linear address space requirements only about 1Gb can actually be used  (discussed later)
  • This was solved by increasing the amount of pins for the memory bus from 32 to 36 in addition to this they added PAE to take advantage of the new space alerting
    • 4*2^4=64gb of RAM now
    • 32 linear address bits address 36 physical address bits
  • Enabled by setting the PAE flag in tune cr4 register PS flag in page directory now sets 2mb page frame 
  • 4 address pins added to processors 32->36

Format of PAE

  • 64 GB of RAM are split into 2^24 distinct page frames
    • Physical address field of Page Table entries has been expanded from 20 to 24 bits.
    • PAE Page Table entry must include the 12 flag bits (section “Regular Paging”) , 24 physical address bits, for a grand total of 36
    • Page Table entry size has been doubled from 32 bits to 64 bits.
      • 4-KB PAE Page Table includes 512 entries instead of 1,024
  • Added one more level of paging using the Page Directory Pointer Table (PDPT)
    • This is a table with 4, 64 bit entries 
  • cr3 now contains the 27 bit base address for the PDPT and not the page directory
    • PDPTs are stored in the first 4 GB of RAM and aligned to
      a multiple of 32 bytes (2^
      5), 27 bits are sufficient to represent the base address of such tables (?)

Linear Address Format Using PAE for 4kb Pages

  • cr3: Points to PDPT
  • bits 31-30: 1 of 4 possible entries in PDPT
  • bits 29-21: 1 of 512 possible entries in the page directory
  • bits 20-12: 1 of 512 possible entries in the page table
  • bits 11-0: offset on 4kb page

 

File:X86 Paging PAE 4K.svg

for 4 kb pages

 

Linear Address Format Using PAE for 2mB Pages

  • cr3: points to PDPT
  • bits 31-30: 1 of 4 possible entries in PDPT
  • bits 29-21: 1 of 512 possible entries in page directory
  • bits 20-0: offset to 2mb page
File:X86 Paging PAE 2M.svg

for a 2mb page

Limitations of PAE

  • It is important to note that linear addresses still remain at 32 bits therefore the actual linear address space remained the same size why we enlarged the physical address space
    • Therefore linear addresses will be reused when mapping to different parts of RAM
    • This is only a hack allowing one to use 32 bits to address 64 GB of RAM

Hardware Protection Scheme 

  • Unlike segmentation paging uses a flag user/supervisor to check read write privileges.
  • If the flag is set to 0 page can only be accessed if CPL is < 3 else it can always be accessed 
  • Read write + read are the only modes of access and it is marked in the page directory or the page table entry

Extended Paging

  • Extended paging allows for much larger page frames 4mb instead of 4kb
  • Allows for large amount of contiguous linear addresses to be converted until large sections of contiguous physical addresses.  The kernel can do so without intermediate paging thus saving memory 
  • Allows for preservation of TLB entries

One less level of paging.

  • This changes the format of the linear address
    • Most significant ten bits: Entry in page directory (select page)
    • Lease significant twenty two bits : Offset in page frame
  • Enabled using the PSE flag of the cr4 register both extended and regular paging can be used at the same time

Regular Paging

  • Starting with the 80386 processor all x86 processors use 4kB pages

 

Linear Address Format (32 Bits) 3 fields

  • Directory: Most Significant 19 bits
  • Table: Middle 10 bits
  • Offset: Least significant 12 bitss

 

Two Level Translation Scheme

  • This is done in order to reduce the amount of RAM required for preprocess page tables
    1. First the address of the page table is loaded from the page directory table using the directory bits
    2. Then the correct table is chosen from the table bits
    3. Finally the offset is added in order to obtain the correct physical address
  • Using a one level paging scheme would mean allowing all 2^20 possible entries occupy the page table eating all the RAM.
    • Two Level paging allows only required pages to be allocated thus saving memory
  • The actual process looks like the image below.  Note the cr3 register contains the physical address of the page directory

 

  • If each piece of data is one byte then there are 4096 bytes of data per page

Structure of Page Directories and Page Tables (same)

  • Present flag: If 1 the page is in main memory otherwise it is on disk. If the entry of a Page Table or Page Directory needed to perform an address translation has the Present flag 0, the paging unit stores the linear address in a control register cr2 and generates exception 14: the Page Fault exception.
  • Twenty MSB of page frame physical address: The base address of a page frame or table.
  • Access flag: Set everytime paging unit accesses page frame only cleared by OS never by the paging unit itself.  This flag may be used by the operating system when selecting pages to be swapped out.
  • Dirty flag: Same as above but only applies when a write occurs. 
  • Read/write flag: Access rights two types
  • User/supervisor flag: Mark what privilege is required in order to access the data
  • PCD and PWT flag: Controls how the data is handled by the hardware cache
    • Setting PCD disables caching
  • Page size flag: Applies only to page directories if set refers to page frame of size 2mb or  4mb (discussed later, 2mb for PAE 4mb for no PAE)
  • Global flag: Applies only to Page Table entries. Prevent frequently used pages from being flushed from the TLB cache (Translation Lookaside Buffers (TLB)). It works only if the Page Global Enable (PGE) flag of register cr4  is set. (discussed later)
File:X86 Paging 4K.svg

for 4kb pages

File:X86 Paging 4M.svg

for 4mb pages

Paging in Hardware

  • Paging Unit: Translates linear addresses into physical addresses.  Before that it needs to verify the requesting process is allowed to access that memory.  If not issue a page fault exception
  • Paging can be turned on by setting the PG flag to 1 in cr0; the control register 0.

Data Layout

  • Linear addresses are grouped into fixed length intervals called Pages (groups of data) which are mapped to contiguous blocks of physical memory called page frame (phyiscal addresses stored in main memory) 
  • Data structures mapping linear to physical addresses are called page tables stored in main memory

The Linux LDTs

  • Default LDT: This is the one specified in the GDT.  Most processes do not use a LDT therefore kernel keeps one with 5 entries by default
    • Stored in default_ldt_array
      • 2 are used call gates for iBCS and Solaris/x86 executable
        • Call Gates: Allow change of privilege level of PU while invoking predefined function
  • Some applications such as Wine will need them as Windows uses segment-oriented applications
    • modify_ldt()

      allows a process to set up a new LDT

    • When a processor starts executing a process having a custom LDT, the LDT entry in the CPU-specific copy of the GDT is changed accordingly

The Linux GDT

  • GDT (Global Descriptor Table): Contains segment descriptors for system wide use and has 18 entries, 14 of which are null unused or reserved.  One per processor. Each GDT has its own TSS segment and LDT/TLS/APM/PnP vary based on execution of processes or BIOS code
    • Unused entries are inserted on purpose so that Segment Descriptors usually accessed together are kept in the same 32-byte line of the hardware cache
    • capture
    • 4 cs and ds segments
    • Task state segment (1 per processor) linear address contained in kernel data seg
      • Stored sequentially in an init_tss array
      • Base: nth component (nth CPU) in init_tss
      • G: Cleared
      • Limit: 0xeb (segment is 236 bytes long)
      • Type: 9 or 11
      • DPL: 0 (only accessible in kernel mode)
    • Local Descriptor Table (1): This is not to be confused with LDTs created by some processes themselves.  In general all processes share the same LDT in the specified in the GDT
    • Thread-Local Storage Segment (3): Allows each thread to have its own data seg
    • Advanced Power Management (3): BIOS uses these as code/data segments when the Linux APM driver invokes BIOS functions
    • Plug and Play (PnP) BIOS Services (5): Same as above used for PnP devicest
    • TSS Double Fault Exceptions (1)

Segmentation in Linux

  • Segmentation encourages splitting a program into subroutines
  • Linux uses segmentation in a limited way it prefers paging which can make segmentation redundant
    • Memory management is simpler with paging
    • Paging is more adaptable to more systems (RISC arch doesn’t support segmentation)
  • 2.6 Kernel only uses segmentation when required by the x86 architecture
  • All User Mode processes share the same code and data segments
    • All Kernel Mode processes share the same kernel code and kernel data segments
    • Capture.PNG
    • These have macros __USER_CS, __USER_DS, __KERNEL_CS, __KERNEL_DS
      • To access these segments load each macro into its corresponding segmentation register
      • ss (stack segment) always contains the corresponding level stack descriptor
    • Notice that the base address starts from 0
      • This means that the 32 bit offset in the segment selector will always correspond to the linear address
      • This is fine because the MMU is responsible for linear->physical we do not need to be worried abotu conflicting address values
      • In C when you do
         int *ptr=&x;

        what you’re given is actually the linear address not the physical address.  Therefore ptrs of an array though in “linear” address space are contiguous may not be contiguous in physical

Segmentation Unit

  • Segmentation Unit: Converts logical to linear address
    1. Check TI (table indicator) bit to see if its in the GDT (gdtr register) or LDT (ldtr register) (0=GDT?)
    2. Compute base address of segment descriptor from index in segment selector x8
    3. Add offset from segment selector to the base field of the segment descriptor to get the correct linear address of whatever you’re looking for
    • Note: Because of the non programable register attached to the segmentation register you can skip the calculation of the address of the descriptor (first 2 steps)