A.1 Introduction
A.2 Classifying Instruction Set Architectures
- Type of internal storage in a processor
- Stack: Operands are implicitly on top of the stack
- Stack computer must compute all computations in one order
- Accumulator: One operand is implicitly in the accumulator
- Registers: Only explicit operands either in memory or registers
- Operations can be done in multiple orders eg. parantheses
- (A*B)-(B*C)-(A*D) multiplications can be done in any order
- Registers are general purpose, faster than memory and reduces memory traffic speeding up stuff
- Code density imporves as registers cost less bits than memory addresses
- Operations can be done in multiple orders eg. parantheses
- Stack: Operands are implicitly on top of the stack
- Register-memory: Can access memory as part of any instruction
- Load-store: Can only access memory only with load/store instructions
- Operations required for: C=A+B
- Two major instruction set characteristics for GPR architectures for ALU stuff
- 2 vs 3 operands (source/dest, source or dest, source, source)
- How many operands may be memory addresses
- Figure A.3 and A.4 have common combinations of register/memory/stuff
A.3 Memory Addressing
- Inter preting Memory Addresses
- Little Endian vs Big Endian
-
Little Endian -
Big Endian
-
- Memory Alignment: This is done to optimize access speeds. If misaligned memory access may take multiple aligned memory references
- Aligned if: dddr mod size = 0
- Little Endian vs Big Endian
- Addressing Modes: https://en.wikipedia.org/wiki/Addressing_mode
- Addressing modes specify constants and registers in addition to locations in memory
- Effective Address: When a memory location is used, the actual memory address specified by the addressing mode is called the effective effective address
- Figure A.6 has a list of addressing modes and NAMES
- PC-Relative Addressing: Addressing modes that depend on the program counter
- Mostly used for specifying code addresses or control transfer instructions
- Different modes can have significantly less instruction counts and as a result of complexity can be complex and increase average clock cycles per instruction (CPI)
- Displacement Addressing Mode
-
+------+-----+-----+----------------+ | load | reg | base| offset | reg := RAM[base + offset] +------+-----+-----+----------------+ (Effective address = offset + contents of specified base register)
-
- Immediate or Literal Addressing Mode
-
+------+-----+-----+----------------+ | add | reg1| reg2| constant | reg1 := reg2 + constant; +------+-----+-----+----------------+
-
- Summary: Memory Addressing
- New architectures should support the minimum of
- Displacement, immediate, and register indirect addressing modes
- New architectures should support the minimum of
A.4 Type and Size of Operands
A.5 Operations in the Instruction Set
- Most commonly executed instructions are the simple operations of an instruciton set
A.6 Instrucitons for Control Flow
There are 4 types of control flow change. Conditional branches, jump,s procedure calls, procedure returns.
- Addressing Modes for Control Flow Instructions
- Destination address of control flow instruction must always be specified
- Most common way is to offset PC
- Allows for position independence where it can run anywhere with non address dependent things
- If target is not known at compile time need to specify target dynamically
- One way is to name register that will contain address
- This way of jumping allows for important features
- Case or switch statements
- Virtual functions or methods
- High-order funcitons/function pointers where functions are passed as argugments
- Dynamically shared libraries where libraries are loaded and linked
- This way of jumping allows for important features
- One way is to name register that will contain address
- Conditional Branch Options
- There are different ways to decide when to branch
- Condition Code: Test special bits
- Advantage: Sometimes condition is set for free
- Disadvantage: CC is an extra state. Can constrain ordering of instrucitons since they pass information from one isntruction to a branch
- Condition Register: Test arbitrary register with result of comparison
- Advantage: Simple
- Disadavantage: Uses a register
- Compare and Branch: Compare is part of branch, often compare is limited to subset
- Advantage: One instruciton rather than two for a branch
- May be too much work per instruction for pipelined execution
- Condition Code: Test special bits
- There are different ways to decide when to branch
- Procedure Invocation Options
- Callee Saving: Called procedure must save registers it wants to use
- Caller Saving: Calling procedure must save registers that it wants preserved
- Must be used at times when global variables are accessed by both callee and caller
A.7 Encoding an Instruction Set
Number of registers and addressing modes impact the encoding of instructions considering the number of bits needed to be allocated to that in the instruciton. One also needs to take into account how easy piplined implementations take to the size of the instruction
- Reduced Code Size in RISCs
- Due to less complex instructions the RISC processors have larger amounts of code
- Since embedded is used in embedded this began to be a problem
- Now some hybrids offer both 16 and 32 bit instructions (40% reduction)
- IBM compresses standard instruction set and adds hardware to decompress instructions when fetched from memory on a cache miss
A.8 Crosscutting Issues: The Role of Compilers
- The Structure of Recent Compilers
- Complexity of writing a correct compiler limits optimization
- Multiple pass structure helps reduce complexity
- But transform code in certain ways before others
- This is known as the phase-ordering problem
- One optimization is the global common subexpression elimination
- If an assignment or result occurs multiple times save the result and reuse it later
- It is better to save it in a regsiter because later loads may be slow
- However limiting a register might be even slower
- Styles of transformation
- High Level Optimization: Done on source output is sent to later optimization passes
- Local Optimizations: Optimize code with a straight-line code fragment (basic block compiler)
- Global Optimizations: Extend local optimizations across branches aimed at optimizing loops
- Register Allocation: Associates registers with operands
- Processor Dependent Optimizations: Specific to the Architecture
- Register Allocation
- Important in speediung up code and making other optimizations useful it is thus one of the most important
- For 16+ registers and additional FP registers use Graph coloring
- NP Complete problem and takes exponential time
- The idea is to create a graph where no two adjacent nodes (variables) have the same color (registers)
- Because graph coloring is slow there are near linear heuristic algorithms for smaller number of registers
- Impact of Optimizations on Performance
N.M=Not Measured
- The Impact of Compiler Technology on the Architect’s Decisions
- How are variables allocated and addressed?
- How many registers are needed to allocate variables appropiately?
- Look at where high-level programs allocate data
- Stack: For local variables, addressed relative to stack pointer, grown or shrunk on procedure call or return, used for activation record. Mostly scalars or single variables
- Global data area: Allocate statically declared objects. Usually arrays or other aggregate data structures.
- Heap: Allocate dynamic objects that do not adhere to stack discipline. Accessed with pointers and usually not scalars
- Register allocation is much more effective for stack allocated objects than for global variables
- And is essentially impossible for heap allocated objects because of pointer access
- Aliased variables: Multiple ways to refer to the address of a variable
- This causes them to be illegal to put in register
- Most heap variables are aliesed
-
p = &a-- gets address of a in p a = ...-- assigns to a directly *p = ...-- uses p to assign to a ...a...-- accesses a
- *p would be illegal and generate incorrect code
- Aliasing makes it difficult to decide what a pointer points to and some compilers will not allocate local variables to a register if pointers are present
- How the Architect Can Help the Comiler Writer
- Complexity arises in large complex programs and is hard to compile for because compilers make decisions one step at a time about which code sequence is best
- The best advice is “Make the frequent cases fast and rare cases correct”
- Some guidelines on designing an architecture
- Provide Regularity: Operations, data types, and addressing modes should be orthogonal
- Two aspects of an architecture are orthogonal if they are independent
- Eg. Every operation can be used with any addressing mode
- The opposite would be certain registers can be used only for certain instructions
- Provide Primitives, not solutions: Don’t specialize your architecture for a specific high-levle language. This solution may not work for other languages
- Simplify trade-offsd among alternatives: Help compiler writers decide between alternative solutions.
- Eg. In register-memory architectures and the problem of deciding how many times how many times a variable must be referenced before moved to a register
- Provide instructions that bind the quantities known at compile time as constants
- Provide Regularity: Operations, data types, and addressing modes should be orthogonal
A.9 Putting It All Together: The MIPS Architecture
GOOD FOR PRACTICE READING SKIP FOR NOW
A.10 Fallacies and Pitfalls
- Erroneous beliefs
- Pitfalls
- Designing a “high-level” instruction set feature specifically oriented to supporting a high-level language structure
- These instructions often do more work than is required and may not match the requirements of some languages
- Innovating at the instruction set architecture to reduce code size without accounting for the compiler
- Often compilers can do a much better job at compressing code than ISA innovations can
- If any optimizations are to be made consider optimization on the output of compilers
- Designing a “high-level” instruction set feature specifically oriented to supporting a high-level language structure
- Fallacies
- There is such a thing as a typical program
- No program is alike and do not act like synthetic benchmarks
- Optimizing or providing instructions for the “typical program” may cause the instruction to be underused
- An architecture with flaws cannot be successful
- Appendix K
- x86… nuff said
- You can design a flawless architecture
- Tradeoffs exist and are very real
- There is such a thing as a typical program