(2.3.5) ARM ISA

John Goodacre and Andrew N. Sloss, Parallelism and the ARM instruction set architecture. Computer, July 2005. IEEE Xplore


Typically : Performance and Efficiency methods
     Variable execution time
     subword parallelism
     DSP (Specialization)
     TLP, exception handling
     multiprocessing

Variable execution time
     multiple loads/stores on single instr
          epilog and prologue of subroutines
          code density

Inline barrel shifter
Conditional execution (predication)
16-bit thumb instr set (read separately!)

Data Level Parallelism
     sub word SIMD. (divide the 32 bits into 8x4/2x16 and parallel)
Thread Level Parallelism
     ? Improve exception handling
     increases complexity in interrupt handler, scheduler, context switch
     > Special instructions : 
          CPS : Change processor state
          RFE : Return from exception
          SRS : Save return state
Multiprocessor Atomic instructions
     LDREX (load exclusive)/ STREX
     >> Physically tagged cache over virtually tagged : 20% improvement in overall performance.
Instruction level parallelism

2004 : cost-performance-through-MHz wall
Why multicores ? 
     High MHZ > costly
     ILP is complex and costly (Extracting)
     Programming multiple independent processors > non portable and inefficient
ARM 11 : multiprocessor 
     Generic interrupt controller
     Snoop Control Unit

Enhanced atomic instructions
Lock-free syncronization > wake up/sleep spin locks
CPU number and context registers with privileges
Weakly ordered memory consistency
     wmb() > write memory barrier
     rmb()  > read mem barrier
     DSB() > drain store buffer
    
SMP performance
     Cache Coherence (SCU : at CPU speed)
     Inter processor communication (Software initiated interprocessor interrupt, async)
     Load Balanced interrupt handling
     SCU : 
          copy of physical address tag for fast access
          migratory line > if a line moved from shared to write and another processor requests it, its assumed the other processor will eventually write it, moved directly to M'. cool for locks and stuff