Category: x86

2.3.2 Basic 64-bit Execution Environment

  • Recall we only use 48 bits for addresses
  • Below are the most important registers
    • Sixteen 64 bit general purpose registers
    • 64 bit flags register (RFLAGS)
    • 64 bit instruction pointer (RIP)
    • Eight 64 bit MMX registers
    • Sixteen 128 bit XMM registers (versus 8 in 32 bit)
  • General Purpose Registers
    • Even in 64 bit mode there are only eight 32 bit registers by default
    • Adding REX prefix to the instruction you now get access to sixteen 64 bit registers
    • Capture.PNG
    • You cannot access a high-byte register at the same time as a new low-byte register
    • The upper 32 bits of RFLAGS is not used, old flags remain

2.3.1 64-Bit Operation Modes

  • New IA-32e mode which contains 2 sub modes
    • Compatibility Mode
      • Allows 16 bit and 32 bit applications to run without being recompiled
    • 64-bit Mode
      • Runs applications in a 64 bit linear address space

2.3 64 bit x86-64 Processors

  • Important Features
    • Backwards compatible with x86 instruction set
    • Allows addressing 2^64 bytes (currently only 48 bits are used)
    • Uses 64 bit general purpose registers
    • Eight more general purpose registers over 32 bit
    • 48 bit physical address space which supports up to 256 terabytes of RAM
    • 16 bit real mode and virtual 8086 mode have been removed

2.2.3 x86 Memory Management

  • Real Address Mode
    • 1Mb of memory addressed (0x00000 to 0xFFFFF)
    • Only one program can be run at a time
    • Interrupts allow scheduling
    • Programs can access any location
  • Protected Mode
    • 4Gb of memory can be addressed
    • Each program has 4Gb of virtual memory
  • Virutal 8086
    • Runs in protected mode with 1Mb of memory
    • Simulating a virtual 8086 machine

2.2.2 Basic Execution Environment

  • Address Space
    • 32-bit protected mode allows addressing of 4Gb of memory
    • PAE allows addressing up to 64Gb of memory
    • Real address mode allows addressing of 1Mb of memory
  • Basic Program Execution Registers
    • General Purpose Registers
      • EAX: Automatically used for multiply/divide called “extended accumulator register”
      • ECX: Automatically uses as a loop counter “extended counter register”
      • ESP: Contains of the address of the top of the stack in the current execution context, “extended stack pointer register”
      • EBP: Contains the bottom of the stack for the current execution context, “extended base pointer register”
      • ESI and EDI: Used for high speed memory transfer “extended source/destination index registers”
      • EBX, EDX
      • Segment Registers
        • Real Address Mode: 16 bit memory segments with predefined base addresses
        • Protected Mode: Holds pointers to segment descriptor tables which may contain code (instructions), data (variables), and stack segment for local variables and parameters
      • Instruction Pointer: EIP contains the address of the next instruction
      • EFLAGS Register: Binary bits with flags representing things like carry, interrupts, etc
  • MMX Registers
    • Used to improve the performance of multimedia applications (video/communication)
    • Eight 64 bit registers
    • Support for special SIMD (Single instruciton, multiple data) instructions
    • Allows parallel processing of the data
    • In fact all 8 registers are aliases for the same registers used for FP calculations
  • XMM Registers
    • Eight 128 bit registers
    • Floating Point Units
      • Labeled as ST(0) to ST(7)
      • Capture.PNG

2.2.1 Modes of Operation

  • 3 primary modes of operation
    1. Protected mode
      • Native state of processor
      • Programs are allocated segments
      • Programs are restricted from accessing memory outside its respective segment
    2. Real-address mode
      • Implements the early 8086 programming environment
      • Useful if a program requires direct access to system memory and hardware devices
    3. System management mode
      • Often used by computer manufacturers to implement low level functions such as system security or power management
    4. *Sub-mode, virtual-8086 a special version of protected mode
      • Allows the execution of very early MS-DOS programs safely, meaning no kernel panic if it crashes

2.1.3 Reading From Memory

  • 4 Simple Steps to read from Memory
    • 1. Place address of value you want to read on bus
    • 2. Assert the processor’s RD (read) pin
    • 3. Wait one clock cycle for the memory chips to respond
    • 4. Copy the data that has now appeared on the bus
  • Caches on the CPU
    • Used to store most recently used instructions
    • This is done because often the CPU is the true bottle neck to most processes
    • There are levels of cache the lower number=faster=closer to the actual CPU
    • Level 1 is directly on the CPU
    • Level 2 is connected to the CPU by a high speed memory bus
    • These are static RAM (SRAM) which is more expensive and faster than traditional RAM (DRAM)

ECE391 Lecture Notes: Set 1

x86 instruction set reference

The Basics: Registers, Data Types and Memory

  • Most of the x86 complexity comes from backwards compatibility
  • Modern x86=IA32
    • 8 general purpose 32 bit integer registers
      • eax, ebx, ecx, edx, esi, edi, ebp, esp
      • These can be addressed by 32 bits (e_x), lower 16 (_x), higher 8 bits in the lower 16 (_H), lower 8 bits in the lower 16 (_L) 
    • 2 special purpose 32 bit registers, instruction pointer and flags
    • % denotes a register in assembly code
  • x86 ISA supports:
    • 2’s complement
    • unsigned integers in widths 32,  16, 8 bits
    • IEEE single double precision
    • 80-bit intel floating point
    • ASCII Strings
    • BCD
  • x86 is byte addressable every memory address (32 bits) points to a byte
  • x86 uses little endian
    • 0x12345678 is stored as
      • 0x78, 0x56, 0x34, 0x,12 in consecutive increasing memory locations

LC-3 to x86

Operate Instructions

  • x86 has many more operations than LC3
    • Arithmetic Operators:
      • ADD, SUB, NEG, INC, DEC
    • Logical Operators:
      • AND, OR, XOR, NOT
    • Shift Operators:
      • SHL, SAR, SHR, ROL, ROR
  • x86 typically has 2 argument instructions with the second one being the destination
    • addl %eax, %ebx # EBX<-EBX+EAX
    • Operands can optionally have “l”, “w’, or “b” attached to them
      • l=32 bits (long)
      • w=16 bits (word)
      • b=8 bits (byte)
  • Immediate values have to be preceded by the $ sign such as
    • addl $20, %esp # ESP<-ESP+20
    • $0x____=hex
    • $0____=octal
    • $(1-9)(0-9)=decimal cannot use this to add 0
  • If the $ sign is not used a number is treated as a memory reference
    • addl 20, %esp # ESP<-ESP+M[20]
  • Addresses can be calculated in an operand as follows
    • displacement(SR1, SR2, scale): displacement+SR1+(SR2*scale)
      • displacement defaults to if not specified
      • scale defaults to if not specified and can be 1,2,4,8
      • The reason for this format is to allow one to select a data structure (SR1) and then furthur offset it (SR2) and if an array use an index (displacement)
      • Access Nth element of an array of 32bit integers one could put a poitner to the base of the array into EBX and index N into ESI and execute
        • movw (%ebx, %esi, 4), %eax # EAX<-M[EBX+ESI*4]
        • if the array started at the 28th byte of the structure
        • movw 28(%ebx, %esi, 4), %eax # EAX<-M[EBX+ESI*4]
    • leal (%eax, %ebx), %ecx # ECX<-EAX+EBX
  • Interesting instruction: xorl %edx, %edx # EDX<-0

Data Movement Instructions

  • LC-3 had three addressing modes for load and stores
    • PC relative: LD/ST
    • Indirect: LDI/STI
    • Base+Offset: LDR/STR
    • LEA
  • x86 unifies the above (except LEA) into the mov instruction
    • LDI/STI is not covered but can be done in 2 instructions
    • LD/ST is technically unavailable but can be done with direct addressing where the address to be used is specified as an immediate value in the instruction

Condition Codes: Only 5 will be mentioned

  • Sign Flag (SF): Records whether the last result represented a negative 2’s complement integer (MSB set)
  • Zero Flag (ZF): If last result was exactly zero will be set
  • Carry Flag (CF): If last result generated a carry or required a borrow. Some instructions use this flag for specialty purposes
    • such as to check if the high word of multiplication is nonzero or not
  • Overflow Flag (OF): Checks whether the last result overflowed if interpreted as 2’s complement operation also has specialty purposes 
  • Parity Flag (PF): Set if the last result has even # of 1s else cleared
  • Not all instructions set flags
    • eg. mov, lea, not
    • To generate flags from these instructions call CMP (compare) or TEST instructions to set flags
      • cmp: Performs a subtraction second arg-first arg and sets flags discard result
      • test: Performs AND between two operands sets flags (OF and CF are cleared, SF, ZF, PF are set according to result) discard result
  • Instructions that set flags do not have to change all flags
    • eg. rol, ror which only affect OF and CF
    • eg. inc and dec which affect all but CF

Conditional Branches

  • There are 8 basic branch conditions and their inverses in x86
  • Branches are listed below along with the conditions under which the branch is taken

  • The table should be used as follows. After a comparison such as
    • cmp %ebx,%esi # set flags based on (ESI – EBX)
  • Choose the operator to place between ESI and EBX, based on the data type.
  • If ESI and EBX hold unsigned values, and the branch should be taken if ESI ≤ EBX, use either JBE or JNA.
  • If ESI and EBX hold signed values, and the branch should be taken if ESI > EBX, use either JG or JNLE.
  • For branches other than JE/JNE based on instructions other than CMP, you should check the stated branch conditions rather than trying to use the table.

Other Control Instructions

  • x86 uses INT instead of LC-3’s TRAP
  • x86 also replaces JSR and JSRR with CALL which can be both direct and indirect
    • indirect operands are preceded by an asterisk
    • Call pushes the return address onto stack before changing EIP
    • RET pops the return address off the stack before changing EIP

Labels, Comments, Directives, Pseudo-ops

  • Labels can begin wtih any letter a period or underscore and must be terminated by a colon
  • gcc generated assembly code uses a period to start all its labels
  • Use # for comments ; for putting more code on one line and /* */ like C for multi line comments
  • .ORIG/.END are inserted byt the compiler don’t bother yourself with it
  • .GLOBAL/.EXTERN used to declare symbols to be visible externally and to be defined externally respectively
    • Implies that assembly cannot identitify unidefined

other directives that you may use

  • .INCLUDE directive tells the assembler to read in the contents of another file and to insert it in place of the directive (C’s #include)

Input and Output

  • x86 can use both memory-mapped and instruction based I/O
  • Uses specific registers for the I/O instructions
    • Data must be put in the EAX register
    • Data from ports are also read into EAX
    • Port number can be specified as either an 8 bit immediate or loaded into DX

Other Useful Instructions

Stack Operations

  • Stack abstraction directly has native PUSH and POP instructions
  • ESP contains the address of the element on top of the stack
  • Stack grows downward in addresses (smaller address on top of stack)
    • pushl %eax # M[ESP-4]<-EAX, ESP<-ESP-4
      • Equivalently
      • movl %eax, -4(%esp) # M[ESP-4]<-EAX
      • subl $4, %esp                # ESP<-ESP-4

  • Other than a pop into the eflags register push and pop do not set flags

Multiplication and Division

  • Both signed and unsigned are available
  • Unsigned multiply MUL requires EAX to be one of the operands (or AX or AL)
    • Places results of multiplication  of high bits in EDX and low bits in EAX (or DX:AX or AX)
    • Only the CF and OF flags have meaning (both imul and mul) all other flags are undefined (not unaffected they may chagne)
      • Both CF and OF are set if the high bits of the result are nonzero
  • Signed multiply IMUL allows both 2 and 3 operand forms
    • High bits are discarded in signed multiplication
    • imull %ebx,%eax              # EAX ← EAX * EBX
      imull $1000,%ebx,%eax # EAX ← 1000 * EBX
  • Division works similarly to the multiplication
    • Dividend must be placedin EDX:EAX (or DX:AX or AX)
    • EAX (or AX, or AL) holds the quotient
    • EDX (or DX, or AH) holds the remainder
    • Overflows in the destination register will cause an exception to be generated
    • Flags are undefined after division (no meaning subject to change)

Data Type Conversions

  • MOV can be used to convert small ints into big ints
    • MOVS(orig size)(newsize): Sign extension
    • MOVZ(orig size)(newsize): Zero extension
      • eg. MOVZBL: zero extend a byte into a long
  • There are variations of this for the EAX register
    • CBTW converts signed byte AL to a word AX
    • CWTL converts a signed word AX to a long word EAX
    • CLTD converts a signed long word EAX to a double word (EDX:EAX)
      • Useful for preparing for IDIV (?)
    • CWTD (caution): AX->DX:AX

 

The Calling Convention

Caller-saved registers (AKA volatile registers) are used to hold temporary quantities that need not be preserved across calls.
For that reason, it is the caller’s responsibility to push these registers onto the stack if it wants to restore this value after a procedure call.

Callee-saved registers (AKA non-volatile registers) are used to hold long-lived values that should be preserved across calls.
When the caller makes a procedure call, it can expect that those registers will hold the same value after the callee returns, making it the responsibility of the callee to save them and restore them before returning to the caller.

Parameters, return values, and registers

  • Parameters passed from a function are pushed right to left onto the stack this allows for any number of parameters without needing a var to store # var or sentinels
  • For pointers and integers < 32 bits the return value is placed in the EAX register
  • For 32 < bits < 64 EDX stores high bits EAX stores low bits
  • Floating point values are returned on top of the floating point stack
  • Most registers are caller saved (eg ECX EFLAGS) but the ESP and EBP must be returned unchanged
  • EBX ESI and EDI are callee saved

subl $20, %esp # subtracts 20 from the stack poitner (esp) this allocates 20 bytes of space on the stack that can be used for local variables

Caller Side: When a_func is called

  1. Push function parameters to stack (20, 10) called formals
  2. Call is executed which pushes EIP (pointing to ADD after call)
  3. EIP is changed to the start of a_func
  4. Execute func
  5. Remove paremeters from stack by using ADD
  6. Store the result in value (returned in EAX)

Callee Side

  1. Set up stack frame by pushing old frame pointer (EBP) to stack
  2. Copy ESP into EBP (old EBP at bottom of stack)
  3. If any callee-saved registers are used for a_func values are pushed to the stack (EBX, ESI, EDI)
  4. Stack pointer is updated to make room for local vars
  5. Tear down stack using the LEAVE instruction this restores the old values of EBP and ESP
    • ESP<-EBP+4, EBP<-M[EBP]
  6. IF callee-saved registers must be restored us LEA to point the ESP to the upper most saved register and then a sequence  POPs (the last into EBP) restores the original stack state

 

Miscellaneous

  • Byte addressable (1 address=1 byte)
  • 1 byte=8 bits (therefore long is 4 addresses)
  • x86 ISA does not require alignment (16 bit values only even addresses 32 bit values multiple of 4 addresses)
  • Arrays of data should be properly aligned
    • Use the .ALIGN directive to do this

2.1.2 Instruction Execution Cycle

  • An Execution Cycle often requires more than one clock cycle to complete and has multiple steps to complete execution
    1. Fetch Instruction: Fetch instruction from the instruction queue
    2. Decode: Look at binary instruction and decode based upon args and opcode
    3. Execute: Executes the instruction and updates any flag bits
    4. Store Result: Store result in cache or main memory
Capture.PNG

Basic CPU block diagram

  • A more detailed description of this now follows
    1. Address of instruction placed on address bus
    2. Instruction fetched from memory and put on data bus and code is now available in code  cache
    3. Instruction pointer determines which instruction to execute next (PC)
    4. Instruction decoder decodes and tells the control unit what type of instruction this is
    5. Control unit performs the operations needed to compute something reading data as it needs and writing when complete

2.1.1 Basic Microprocessor Design

  • A CPU has registers, clock, control unit, and arithmetic logic unit
    • Registers: Store data
    • Clock: Synchronizes internal operations of CPU with other system components
    • ALU: Performs arithmetic operations such as addition subtraction and Boolean operations
Capture.PNG

Basic Microcomputer block diagram

  • Memory Storage Unit: Stores instructions and data during execution
    • The MMU manages this.
  • The control bus uses binary signals to synchronize actions of all devices attached to the system bus.
  • The address bus holds the addresses of instructions and data when the currently executing instruction transfers data between the CPU and memory.