CIS 451 Week 2
CISC vs. RISC
- Re-read Stallings 15.4
- Fewer slower instructions vs. more faster instructions
- What is potential benefit of fewer slower instructions?
- Less repeated work (e.g., instruction fetch)
- What is balance today?
- Mostly RISC, with a few favorite CISC instructions.
- Arguments for CISC
- Compiler simplification: Compiler has less work to do since it
needs to generate fewer instructions.
- What is the counter-argument? It difficult hard to program compilers to take advantage of esoteric instructions.
- CISC programs are smaller.
- What is the counter-argument? Fewer instructions yes; but, those instructions tend to be larger.
- Why do the instructions tend to be larger?
- More addressing modes and more use of memory access tend to require more bits.
- More op-codes also requires more bits.
- Compiler simplification: Compiler has less work to do since it
needs to generate fewer instructions.
Performance
- Suppose you have a RISC and a CISC processor and are asked to decide which is “better?”
What makes one CPU “better” than another?- Purchase Cost
- Total cost of ownership
- Throughput
- Energy efficiency
- Reliability
- Speed / Time
- What are the different times than can be measured when timing a
program?
- Wall clock
- CPU time (time when program has CPU, excluding when processes is blocked on I/O or cache misses)
- Our model will be “wall clock” time since we’ll assume only one program running on CPU
- For CPUs we make it simple: latency — How fast can a CPU complete a given task. (Task may be one user’s specific task, or processing a batch of a million Amazon orders.)
- Throughput also common.
- What is danger of throughput?
- Starvation for minority
- What is danger of throughput?
- Performance is the inverse of time: less time <===> better performance
- Try to avoid using the term “increasing” and “decreasing”. Notice that increasing performance means decreasing execution time (latency). Although well-defined, it can cause confusion. “Improve” is a better term.
- Avoid using percents
- CPU A completes a task in 10 seconds
- CPU B completes the same task in 15 seconds
- How much faster is A than B?
- 50% ? (
(15-10) / 10
is 50%) - 33% ? (
(15 - 10)/ 15
is 33%)
- 50% ? (
- It’s better to use the term “Speedup”.
- Speedup =
original_time / new_time
- Switching from B to A results in a speedup of 1.5.
- Switching from A to B results in a speedup of .666.
- A speedup of < 1 represents a slowdown.
- Speedup =
Benchmarks
- So, now, we time the RISC processor and time the CISC processor and see which one is faster.
- What’s missing?
- What programs do we time?
- Ideally we could exactly reproduce what the user intends to do on that machine.
- Called the “workload”
- In practice, we need to choose an approximation of the user’s real workload called a bemchmark
- Benchmarks: A set of programs and input that represent a workload.
- Not a perfect representation; but, designed to be a reasonably close approximation.
- Several types:
- Kernels (short, focused tasks like matrix multiply)
- Synthetic benchmarks (made up lists of tasks designed to stress the CPU)
- TPC-C Database activities for transaction processing.
- TPC-H Decision support queries
- Dhrystone
- Benchmark suites – sets of real programs
- SpecWeb
- SpecInt (our “go-to” benchmark)
- SpecInt is a suite of programs that typify what we might do in EOS. (See Figure 1.17 on slide or in textbook.)
- How should we present results in a single number?
- What is the problem with the arithmetic mean? (i.e., “average”)
- It overweights longer programs. Consider a benchmark with two programs:
- P1 that takes about 10 hours to run on CPU A
- P2 takes takes 1.5 hours to run on CPU A.
- Suppose CPU A is 20% faster than B on P1 (i.e., B takes 12 hours), but
- CPU A is 50% slower than CPU B (i.e., B takes 1 hour).
- Which CPU should be be considered “faster”?
- Solution: Use ratios (present each program as “speedup of CPU A over CPU Y).
(time_a / time_y) / (time_b / time_y) = time_a / time_b
- That gives
.86
and1.5
- (Notice how the time of the reference machine Y becomes irrelevant?)
- Now, what do we do with those scores?
- Arithmetic mean isn’t the right average. Consider
- CPU A has tasks that take 2 hrs and 3 hrs respectively.
- CPU B has tasks that take 3 hrs and 2 hrs respectively.
- Ratios are
.666
and1.5
. Which CPU “wins” depends on which one ends up in the denominator. - Geometric mean gives the expected result: The CPUs are equal.
- It overweights longer programs. Consider a benchmark with two programs:
ISA (Instruction Set Architecture)
Let’s think about how the design of an ISA (think “machine language”) matters.
- Suppose you are designing a CPU from scratch.
- What is the first thing/feature/parameter you choose?
- Probably the word size.
- How does the word size affect the rest of the CPU?
- What is the most common word size today?
- What drove the move from 32 to 64 bit machines?
- Address space
- Suppose moving to 64 bit word slows clock period by 5%. What % of
instructions must be eliminated? 5%?
- No. Remember 50% loss and 50% gain don’t cancel.
- How many registers should the CPU have?
- What are the consequences of many/few registers?
- Performance (less RAM access)
- More registers requires more resources.
- Let’s design a basic instruction: add
- How many parameters?
- How few parameters can you have?
- What are benefits of many vs. few?
- What are benefits of one parameter?
- What are benefit of zero parameter?
- One parameter add is an accumulator based architecture.
- Notice how this computer emphasized the simplicity of the machine over performance.
- Compare this mindset to the typical mindset today.
- Zero parameter add is a stack-based architecture.
- Not a bad idea, but never took off.
- But, there is a stack based architecture today.
- What is it? (Java Virtual Machine)
- Notice how ideas can come back in a new context.
- He who anticipates when the time is right for an idea to come back has the potential to make millions.
- Again, you may not use Computer Architecture directly in your first job; but, developing a sense of the big picture will pay off later in your career. (Possibly even early in your career.)
- Instruction width:
- Design a machine instruction that does
add r1 <= r2 + 6
- Which version of
add
needs more bits? - Fixed vs. variable width instructions.
- What are the benefits of each?
- fixed: Simpler hardware => potentially faster
- variable: more flexible. Possibly fewer instructions?
- How do you know which one (fixed/variable) is “better”?
- We’ll get back to that.
- Design a machine instruction that does