CIS 451 Week 2

CISC vs. RISC

Re-read Stallings 15.4
Fewer slower instructions vs. more faster instructions
What is potential benefit of fewer slower instructions?
- Less repeated work (e.g., instruction fetch)
What is balance today?
- Mostly RISC, with a few favorite CISC instructions.
Arguments for CISC
1. Compiler simplification: Compiler has less work to do since it needs to generate fewer instructions.
  - What is the counter-argument? It difficult hard to program compilers to take advantage of esoteric instructions.
2. CISC programs are smaller.
  - What is the counter-argument? Fewer instructions yes; but, those instructions tend to be larger.
  - Why do the instructions tend to be larger?
    - More addressing modes and more use of memory access tend to require more bits.
    - More op-codes also requires more bits.

Performance

Suppose you have a RISC and a CISC processor and are asked to decide which is “better?”
What makes one CPU “better” than another?
- Purchase Cost
- Total cost of ownership
- Throughput
- Energy efficiency
- Reliability
- Speed / Time
What are the different times than can be measured when timing a program?
- Wall clock
- CPU time (time when program has CPU, excluding when processes is blocked on I/O or cache misses)
- Our model will be “wall clock” time since we’ll assume only one program running on CPU
For CPUs we make it simple: latency — How fast can a CPU complete a given task. (Task may be one user’s specific task, or processing a batch of a million Amazon orders.)
Throughput also common.
- What is danger of throughput?
  - Starvation for minority
Performance is the inverse of time: less time <===> better performance
Try to avoid using the term “increasing” and “decreasing”. Notice that increasing performance means decreasing execution time (latency). Although well-defined, it can cause confusion. “Improve” is a better term.
Avoid using percents
- CPU A completes a task in 10 seconds
- CPU B completes the same task in 15 seconds
- How much faster is A than B?
  - 50% ? ((15-10) / 10 is 50%)
  - 33% ? ((15 - 10)/ 15 is 33%)
- It’s better to use the term “Speedup”.
  - Speedup = original_time / new_time
  - Switching from B to A results in a speedup of 1.5.
  - Switching from A to B results in a speedup of .666.
  - A speedup of < 1 represents a slowdown.

Benchmarks

So, now, we time the RISC processor and time the CISC processor and see which one is faster.
What’s missing?
What programs do we time?
Ideally we could exactly reproduce what the user intends to do on that machine.
- Called the “workload”
In practice, we need to choose an approximation of the user’s real workload called a bemchmark
Benchmarks: A set of programs and input that represent a workload.
- Not a perfect representation; but, designed to be a reasonably close approximation.
- Several types:
  - Kernels (short, focused tasks like matrix multiply)
  - Synthetic benchmarks (made up lists of tasks designed to stress the CPU)
    - TPC-C Database activities for transaction processing.
    - TPC-H Decision support queries
    - Dhrystone
  - Benchmark suites – sets of real programs
    - SpecWeb
    - SpecInt (our “go-to” benchmark)
SpecInt is a suite of programs that typify what we might do in EOS. (See Figure 1.17 on slide or in textbook.)
How should we present results in a single number?
What is the problem with the arithmetic mean? (i.e., “average”)
- It overweights longer programs. Consider a benchmark with two programs:
  - P1 that takes about 10 hours to run on CPU A
  - P2 takes takes 1.5 hours to run on CPU A.
  - Suppose CPU A is 20% faster than B on P1 (i.e., B takes 12 hours), but
  - CPU A is 50% slower than CPU B (i.e., B takes 1 hour).
  - Which CPU should be be considered “faster”?
- Solution: Use ratios (present each program as “speedup of CPU A over CPU Y).
  - (time_a / time_y) / (time_b / time_y) = time_a / time_b
  - That gives .86 and 1.5
  - (Notice how the time of the reference machine Y becomes irrelevant?)
- Now, what do we do with those scores?
- Arithmetic mean isn’t the right average. Consider
  - CPU A has tasks that take 2 hrs and 3 hrs respectively.
  - CPU B has tasks that take 3 hrs and 2 hrs respectively.
  - Ratios are .666 and 1.5. Which CPU “wins” depends on which one ends up in the denominator.
  - Geometric mean gives the expected result: The CPUs are equal.

ISA (Instruction Set Architecture)

Let’s think about how the design of an ISA (think “machine language”) matters.

Suppose you are designing a CPU from scratch.
- What is the first thing/feature/parameter you choose?
- Probably the word size.
  - How does the word size affect the rest of the CPU?
  - What is the most common word size today?
  - What drove the move from 32 to 64 bit machines?
    - Address space
  - Suppose moving to 64 bit word slows clock period by 5%. What % of instructions must be eliminated? 5%?
    - No. Remember 50% loss and 50% gain don’t cancel.
- How many registers should the CPU have?
- What are the consequences of many/few registers?
  - Performance (less RAM access)
  - More registers requires more resources.
Let’s design a basic instruction: add
- How many parameters?
- How few parameters can you have?
- What are benefits of many vs. few?
- What are benefits of one parameter?
- What are benefit of zero parameter?
One parameter add is an accumulator based architecture.
- Notice how this computer emphasized the simplicity of the machine over performance.
- Compare this mindset to the typical mindset today.
Zero parameter add is a stack-based architecture.
- Not a bad idea, but never took off.
- But, there is a stack based architecture today.
- What is it? (Java Virtual Machine)
- Notice how ideas can come back in a new context.
- He who anticipates when the time is right for an idea to come back has the potential to make millions.
- Again, you may not use Computer Architecture directly in your first job; but, developing a sense of the big picture will pay off later in your career. (Possibly even early in your career.)
Instruction width:
- Design a machine instruction that does add r1 <= r2 + 6
- Which version of add needs more bits?
- Fixed vs. variable width instructions.
- What are the benefits of each?
  - fixed: Simpler hardware => potentially faster
  - variable: more flexible. Possibly fewer instructions?
- How do you know which one (fixed/variable) is “better”?
  - We’ll get back to that.