Analysis of Algorithms

Show selection sort, Bubble sort, and Merge sort.
Given these sorting algorithms, what criteria might you use to decide which one is “better” than the other?
- Speed
- Memory usage
- Energy usage
- A combination of both
- Simplicity / maintainability
What is the difference between throughput and latency?
- Which is more important?
These big questions are important; but, we’ll leave them for other courses (where they are more directly applicable).
For now, we’ll focus solely on time (i.e., latency) as the measure of performance.
What if one algorithm is faster for small arrays and the other is faster for large arrays?
- How can this even happen?
- Imagine this situation: My 15-year old has a show-shoveling business. It takes about 30 minutes per driveway. (Plot 30*x). My uncle has a snow plowing business. It takes only 10 minutes per driveway. But, his snowplow requires 1 hour of maintenance each day. (Plot 10*x + 60)
- In almost all cases, we are most interested in the long-term behavior (i.e., the speed when given the largest inputs). That’s usually where any innovations are most beneficial.
Can the CPU determine which algorithm is faster? If so, how?
- Think about sorting an array at a very high level: You have (more or less) two activities: (a) comparing elements, and (b) moving them around. You can take two approaches: Do lots of comparisons and think carefully about where to move things, then make as few movements as possible, or (b) make quick decisions, even if you have to make extra movements. Which approach is faster depends on the relative speed of comparing elements vs moving them (i.e., alu speed vs. memory speed)
With the plowing example, the machine “wins” because it is faster than the human. In general, which wins depends on the relative speed of the human and the plow.
- A sufficiently fast human beat a sufficiently slow machine.
- Graph with different contants (e.g., a human that can shovel in 7 seconds}.
Similarly, you can “cheat” and pick whether selection sort or bubble sort wins by putting the winner on a faster CPU.
However, our shoveling model fails to consider fatigue: When the human shovels, each driveway takes longer than the last. Watch what happens when we modify the model to something like 7*x^1.1
- Because the human slows down but the machine doesn’t, the machine is eventually faster.
- Make the human faster and show that he still eventually loses.
- Notice that the graphs have different shapes: The increase at different rates. (The human’s time increases at a faster rate.)
It is the growth rate that is important when comparing algorithms. By looking at the growth rate (i.e., the shape of the curve), we can focus on the performance of the algorithm independantly of the quirks/speed of the particular CPU and/or programming language.
At a high level, we want to do this:
1. Graph the running time of each algorithm (on any CPU using any programming language).
2. Examine the shape of the curve.
  - If the curves have the same shape, then we say they are “equal” regardless of which one looks faster.
  - If the curves have different shapes, then the the curve that stays lower as size becomes infinitely large is faster.
The trick, of course, is to formalize “same shape”.

Big-O

Goal: A system for comparing algorithms independent from the machines they run on and the size of any specific input.
Idea: Count lines of code executed (not lines of source code in the .java file!) and present as a function of input size (i.e., T(n))
Go through examples
- Examples#sumArray
  - How many operations are in the for statement
  - How many operations are in answer = answer + values[i];?
- Examples#multiplyArray
  - Same T(n) but, sum should be faster, right?
What are the limitations?
- Different input sizes.
- Not all lines of code take the same amount of time. Give some examples
  - x + y vs. x - y
Bottom line: The value you get for T(n) is necessarily going to be noisy.
How to we address limitations
- Consider the right side of the graph (large inputs)
- Assume any to single operations differ by most a constant amount of time (e.g., that multiplication is at most c times slower than addition) and count them all as “1”.
- This means that you remove constants from your T(n) (because they are just approximations).
- If you get a result of, say, T(n) = 15n, just call it n. The 15 is just a guess, and you know you have an upper bound of cn.
- This is reasonable, because it is the closest you can come without considering CPU-specific details.
- In other words, its the closest you can come considering the algorithm only.
What are the advantages and disadvantages of this decision?
- Advantage: Don’t have to worry about minor details like
  - relative speed of + and *.
  - precisely how many operations are in statements like array[i]
- Disadvantage: You can’t determine formally whether addSum or addMultiply is faster.
But, “disadvantage” isn’t really a big deal.
- First, I could bring you two reasonable machines where add was faster than multiply. (One machine would be much older and slower than the other, but I could do it.)
- Second, when looking at the “big picture” the differences are minor.
compute T(n) for the two mode operations. Then graph the results.
- Once you get to n = 100, the differences between the two n functions are negligible compared to the difference to the n^2 functions — The coefficient isn’t needed to “tell the story”.
- The n function will eventually always be faster than the n^2 function, regardless of the coefficients. (Unlike two n functions, where the constants determine which is faster.)
This is the key idea of comparing algorithms: We say Algorithm A is faster than Algorithm B if A is eventually always faster than B, regardless of the machines chosen.
Goal isn’t to totally order algorithms, but to put them in different “buckets”
If you can choose which algorithm runs faster by playing games with the CPUs, we put the two algorithms in the same “bucket”.

Take 2 – Bottom-Up Approach

We describe an algorithm’s performance with a growth function T(n) that gives a visual representation of how time grows as the input grows.
For example:
- Finding the maximum value in an array would look something like this: (Show linear function)
  - This shows that searching an array of 200 takes about twice as long as searching an array of 100.
- Sorting an array looks something like this: (Show n log n function).
  - This shows that searching an array of 200 takes about 2.3 times as long as searching an array of 100.
- Deciding whether an array of integers can be broken into two groups with equal sums looks something like this:(Show 2^(n/2) function)
  - This shows that adding just two more elements to the array will double the amount of time taken!
The purpose of this function is to show growth (the pattern) not the precise time taken.
1. It is difficult to identify the precise time needed for any given operation (e.g., exactly how much slower is multiplication than addition)?
  - (The reasons for this are even more complex/subtle than you might imagine. We explain why in CIS 451.)
2. We want to focus on properties of the algorithm itself and downplay differences that are caused by the CPU, programming language, and/or compiler.
  - The precise difference in speed between addition and multiplication is CPU dependent. We don’t want that difference to factor into our analysis of the algorithm (general approach to solving the problem).
To avoid both of these issues, when determining T(n), each constant-time operation (or set of operations) counts as 1, regardless of the actual time taken.
- a + b
- a * b
- a == b
- array.length
- array[x] = array[y] + array[z]
By “constant time”, we mean an operation that takes the same amount of time each time it is executed regardless of the input or its size. (Note how everything above is constant time.)
Remember, we count it all as 1 because the actual differences in time between such operations depend on the details of the CPU and/or compiler; but, we want to consider only the properties of the algorithm itself.
Because the constants are not precise, we put algorithms into groups by the “order” of the T(n).
- All linear functions go in a group called O(n) or “big O of n”
- All quadratic functions go in a group called O(n^2) or “big O of n^2.
There is a precise mathematical definition for when two different “T” functions go in the same order; but, for now,
- The key idea is curves with the same shape go in the same group.
- Think about what happens when the input size increases:
  - If doubling the input doubles the output, then put it in the O(n) group.
  - If doubling the input quadruples the output, then put it in the O(n^2) group.
  - If adding just one more input doubles the output, then put it in the O(2^n) (i.e., “exponential”) group.
    - (Side note: A pet peeve of mine is when people say “exponential growth” for something that is just really quadratic. In fact, I’m beginning to see people use the term “exponential” for anything that is more than linear!)
- The general trick for finding the right group:
  1. Remove all the constants
  2. Look at the biggest term (e.g., largest exponent in a polynomial)
General approach to finding T(n):
- Work from the inside out.
- Just count blocks of “1” as a single “1”. (After all we are throwing out the constants.)
- When you come to a loop, multiply the inside by the number of times the loop runs.
Look at sumArray, multiplyArray, sumArray_v2, locationOfMax.
- Notice that they all get simplified to n. This means they in the same class: Differences in performance could possibly be caused by things not related to the algorithm (compiler, CPU, etc.)
  - In other words, within this group, I can choose which one looks the fastest by picking (perhaps unfairly) which machine each algorithm runs on.
Analyze fastMode. (This one has a subtlety that we can ignore for now)
Analyze slowMode.
- What do we do about the if statement?
  - In this case, nothing special. We can simply count the whole block as “1”.
T(n) for slowMode is O(n^2).
- This means that there is a fundamental difference between the two that goes beyond CPU speed.
- We can’t make slowMode look faster by playing games with the CPU or compiler.
- The shape of the growth function curves is different: slowMode grows much faster.
- Actually, the more relevant insight is that the growth for
- fastMode is faster regardless of any CPU or compiler changes because the time for slowMode grows so much faster.
- Given enough input, the fast algorithm will always come out ahead in the end.
  - This is what the formal mathematical definition of “big-O” captures.

Context for Big-O

In general, Computer Science focuses on big-O because this is where the most fundamental / significant / interesting differences are.
- The complexity class an algorithm belongs to determines how it can be used / scaled.
- A O(n) algorithm can typically be scaled as a problem / business grows: If you have twice as much input, you need twice as much computing power. If input correlates to income, you are in good shape.
- A O(n^2) algorithm does not scale nicely. If you have twice as much input, you need four times as much computing power. If input correlates to income, the size of your business is limited.
  - You can still handle big problems with current technology; but, it gets expensive fast.
- A O(2^n) algorithm is generally considered “intractable”: Reasonable sized problems (more than a few thousand inputs) are generally considered unsolvable in any practical sense.
  - This isn’t simply an economic limit; but, a physical one: real-world problems of reasonable size would either require a computer bigger than the known universe, or would take far more time than the age of the universe to solve.
Within a complexity class, there are certainly faster and slower algorithms.
- Consider sumAndProduct.
  - It is O(n).
  - But, it is clearly slower than either sum or product.
  - But the difference in speed is minor and obvious.
  - The algorithm can be used in the same context as other O(n) algorithms.
  - Studying it isn’t going to teach us any thing fundamental about Computer Science that we won’t learn studying other O(n) algorithms.
- There is certainly a place for optimizing code from an “engineering” perspective: make it go faster and save money; but,
  - Doing so is more “work” than “learning”.
  - The details are often implementation dependant, so the lessons don’t necessarily generalize in a useful way.
  - That’s why we don’t do a lot of this in the undergrad curriculum.

Best/Worst/Average case

In almost all cases, when we analyze algorithms, we are interested in the worst case.
- Why?
Sometimes the average case is more relevant; but, the analysis is much more difficult.

More Examples

linearSearch
trick1
trick2
trick3
meetEverybody
sumSome
binarySearch

O(log n)

An algorithm with O(log n) time is a big win.
- As a practical matter, you can’t do better. Why
  - Just about the only thing faster than log n is constant. An algorithm whose time doesn’t grow as input grows can’t really be considering all the input. (Or has a hard upper limit on the input size.)
- Some people consider log n complexity “effectively constant” (not a serious comment). Why?
  - There is a practical upper limit on how big the value of log n can get. In order for log n to reach 100, n would be bigger than the number of particles in the universe.
Why do we say O(log n) and not O(log<sub>10</sub> n) or O(log<sub>2</sub> n)?
- Logs with different bases differ by a constant (see change of base formula) and we ignore constants.

Subtleties

What is the big-O running time of the addition algorithm you learned in first grade?
- O(n)
Think carefully: What is the size of the input?
- The size of the input is the length of the binary string. Thus, when you enter an integer i, the input size is really log i.
In CIS 351, we will learn a O(log n) algorithm.
With that in mind, What is the big-O running time of the simplest algorithm for primarily testing?
- O(sqrt(2^n)) = O(2^(n/2))
If the addition algorithm is O(log n), why can we count addition statements as constant?

Formalities

Formally, big-O is a way of comparing the growth rate of functions.
We often use it for running time; but, it can be applied to anything (including memory used or cost)
Formally: When we say f is O(g), we mean that f does not grow faster than g.
- We often get lazy and assume this means that f and g grow at the same rate.
- This is wrong. big-O is more like &le than ==
- For example, it is technically correct to say that n is O(n^2); although we don’t often do this.
The formal definition is: f is O(g) if there exist constants N and c such that f(x) < c*g(x) for all x &gt N.
- The N is the eventually part: We only care about very large values of x.
- The c takes care of the constant we throw out. Or, if you prefer, it represents putting the fast algorithm on the slowest machine possible.
- If there is a c that makes the inequality work, that means that f does not grow faster than g.
- If f did grow faster than g, then it doesn’t matter how small c gets, f(x) will eventually become larger because it grows faster.

Monday 29 June

Interface clarification
- “Interface” can have a general and specific meaning:
- There is a formal Java construct called an interface that provides a list of methods that implementing classes must implement.
- However, when you use the inheritance mechanism extends Foo (whether it be from a “concrete” or “abstract” class), you are also applying the general concept of an interface: The subclass inherits the parent class’s interface
Review:
- Trying to describe time taken by an algorithm given the size of the input
- Trying to estimate time taken by counting operations
- It is impractical to be precise because
  1. What counts as a single operation is machine/OS/programming language dependant.
  2. Different operations take different amounts of time.
- Instead, we
  1. Count all groups of constant-time operations as “1”
  2. Throw out constants and keep only high-order term
- This puts algorithms in groups by their growth rate
  - Within a given group, there may be faster and slower algorithms, but they all grow at the same rate (e.g., same thing happens to time when input size doubles).
  - Within a given group we can use “tricks” to choose which one looks fastest.
  - Given two algorithms in different groups, one will eventually always be faster.

Exponential: constant^n (where n describes input size) Polynomial: n^constant (exponent is constant)

Basic approach
1. Work from the inside out.
2. Groups of constant-time operations count as “1”
3. if-else: Take max of the two (for a worst-case analysis)
4. loops: multiply
5. method calls: Use big-O of the method.
  - Be sure to look at the size of the input. It may not be the same as the method you are analyzing.
“Tricks” to watch for
1. Loops that run a constant amount of time
2. if-else where one side is taken almost all the time.
3. 1 + 2 + 3 + 4 + 5 + ... + n pattern
O(log n) time (e.g., binary search)
- Remember: When using big-O, don’t include the base
Binary search
- How it works
- How to analyze
- Re-write recursively
- Generate graph with big-O lab
Finding big-O for recursive problems
- Write T(n) with a recursive definition.
- Either
  1. “Unwind” the definition until you see a pattern
  2. There are some standard patterns you will learn either in MTH 225/325 or CIS 263.
- Note: String#substring is a O(n) operation. This makes recursive calls that use this method quite expensive (often O(n^2)).
  - It’s better to use helper methods with index (like we did for arrays).
Time recursive Fibonacci.
- Why is it so slow?
- How to fix (without reverting back to iterative method)?
What is the running time of the recursive maze solver?
- (It looks exponential; but, it’s not. Why?)