CIS 451 Week 7
SuperScalar / Super pipeline
- Instructions form a partial order
- Think about building a house.
a = 0; b = 3; a = a +c; b = b+ c; d = a+b
- Having multiple pipelines allow us to take advantage of this instruction level parallelism
- Modern CPUs have multiple functional units – as many as six or eight!
- Ideally an “x-way” pipeline will get work done x times as fast.
- What goes wrong?
- Hazards have a larger effect on wider pipelines.
- In a regular pipeline, a load followed by an access requires 1 stall.
- How many stalls required in a 2-way pipeline? Why?
- How about an n-way processor?
-
what about control hazards?
- How can we schedule instructions to functional units?
- Statically (VLIW)
- Dynamically
- What are the advantages and disadvantages of each?
- Static
- Can be simpler and, therefore, faster; but,
- Can’t react to stalls and other dynamic events.
- Dynamic
- More complex
- Can react to dynamic events (e.g., cache misses)
- Presents common interface. (Small implementation changes don’t require recompilation)
- Static
-
Which is more common today? Why?
- Itanium was HP/Intel joint venture attempt at static scheduling.
- Performance wasn’t good enough to justify the cost.
- Dynamic scheduling introduces new data hazards
- Standard hazard is RAW – Read after Write
- This is what forwarding and stalls address in a standard pipeline.
-
If a CPU has out-of-order issue (looks at a window of instructions and chooses the next available to run), must also worry about WAR hazard: Write after read.
lw $t0, 40($s0) or $t3, $s5, $s6 sw $s7, 80($t3) add $t1, $t0, $s1 sub $t0, $s2, $s3 # be careful when moving this instruction up WAR hazard and $t2, $s4, $t0
- WAR hazards are called “fake” hazards Why
- They can be eliminated by register renaming.
- Also called “anti-dependence” or “name dependence”
- Most CPUs require in-order completion. (Instructions held and committed in the original order).
- What is the challenge with out-of-order completion?
- Unwinding exceptions.
- What is the challenge with out-of-order completion?
-
Out-of-order completion raises the possibility of WAW data hazards.
-
Compiler optimizations can help reduce stalls / unused functional units.
for (int i = 0; i < MAX; i++) { c[i] = a[i] + b[i] } ; The base address of array a is in r1 ; The base address of array b is in r2 ; The base address of array c is in r3 addi r4, r0, 4 ; Set r4 (the loop counter) to MAX LOOP: lw r5, 0(r1) ; Load from a into r5 lw r6, 0(r2) ; Load from b into r6 add r7, r5, r6 ; r7 = r6 + r5 sw 0(r3), r7 ; store result back in array c addi r1, r1, 4 ; increment pointers for arrays a, b, and c addi r2, r2, 4 addi r3, r3, 4 subi r4, r4, 1 ; decrement loop counter bnez r4, LOOP ; branch nop ; branch delay slot sw 0(r3), r0 ; set c[0] to 0. (not important, just something to do after the loop.) nop; nop; nop; nop; trap #0;