GVSU CIS 263
Week 11 / Day 1
- Offline bin packing
- Obviously, if we can see all the items first, we can do better.
- (Trivially, we can find the optimum by exhaustive search —
O(n!))
- What approach would you take to approximate? Why?
- Place the biggest boxes first — they seem to create the most problems in the online version.
- First fit:
- Worst cast:
(4M + 1)/3
- Easier to think of as
M + ceil((M-1)/3) – About 1/3 “extra”.
- Lemma: All the items placed in “extra” bins have size of at most 1/3.
- Suppose to the contrary that the first item in box
M+1 – called s_i has size 1/3.
- All previous items must also have size 1/3.
- Boxes 1..M must have at most two items.
- In fact the first
j must have one element, and the rest must have two.
- Why must all the one item boxes come before two item boxes?
- Remember, boxes placed in size order.
- Now size
M is the optimial bin packing size, there must be a way to re-arrange
the first i-i packages so package i will fit. However,
- Nothing can double up with the first
j items in one of the first j bins,
otherwise first fit would have already done si.
- None of the boxes can have 3 items since they are all size 1/3 or more.
- Thus, this is a contradiction. Either
M is not optimal, or s_i must
have size < 1/3.
- Lemma: There can be at most
M-1 objects in “extra” bins.
- Note that sum of all package sizes is at most M. (Otherwise we couldn’t fit them all in M boxes.)
- Now consider
- the total size of the packages in the first M boxes: sum(W_i)
- the total size of the next M packages: sum(x_i)
- These two totals together must be less than sum(s_i) because it doesn’t include all the packages.
- However, consider what happens when we pair each
W_i with an x_i: That sum must be larger than 1,
otherwise we would have put package x_i in bin W_i. This is another contradiction, so we know
there can be at most M-1 objects in extra bins.
- Since there are at most
M-1 “extra” objects, and the extra objects have size at most 1/3, then
we can pack them 3 to a bin, for a total of about 1/3 extra bins.
- How close to this 4/3 bounds can youi come?
- Not very. This bound is not “tight”.
- A much more complicated analysis shows an upper bound of
11/9M + 6/9
- That’s 22% “extra” instead of 33%
- That bound is “tight”: We can generate a set of packages will require
11/9 opt bins when first fit is used.
- In practice, if package sizes are uniformly distributed over the interval O..1, then the number of extra bins is
about
sqrt(M). For large values of M, this is a relatively small percentage:
- 10% for M=100
- 3% for M=1000
- Think of “diminishing returns” — i.e., how expensive the last few percent of accuracy is compared to the first
9x%.
Week 11 / Day 2
Divide and Conquer
- Really only applies when
- Both halves of the division require processing
- Binary search doesn’t count, since only one half is processed
- The division is of size
O(n)
- Selection sort doesn’t count, because you only take 1 element at a time.
- Closest Points
- Divide in half
- Find closest point in each half.
- Trick is to quickly find the closest points if they are in each half.
- Key observations:
- If
D is the distance between the closest points entirely in one half,
then we need only consider points within D of center line.
- Within any given
DxD rectangle, only 4 points can be found that are at most D units apart.
- Thus, for any point near the center line, need only measure the distance to up to 8 other points (within two
DxD
squares: One on each side of the center line).
O(n log n) for the same reason that merge sort is O(n log n)
- But, how do you quickly find the points in the
DxD square?
- Selection problem (e.g., finding the median)
- How quickly can we find the median of
O(n) numbers?
- Review quicksort
- Remember, worst case is
O(n^2)
- How to modify to find the
kth smallest element without sorting the entire array?
- Pick a pivot.
- Move elements to either side of the pivot. (Call the left side
S_1)
-
| If $k < |
S_1 |
$ recurse on S_1 |
- If $k = |S_1| + 1, return pivot.
$ Else recurse on
S_2: quickselect(S_2, k - |S_1| -1)
- Note the worst-case running time is
O(n^2)?
- Average case is
O(n)
- Can we get
O(n) worst-case?
- What do we need to do to guarantee
O(n)?
- Guarantee that the pivot is not too close to the edge.
- The algorithm
- Arrange
N elements into groups of 5.
- Find the median of each group.
- Use median of medians as pivot.
- There are
N/5 columns. In each column, there are either 2 values above the median, or two values below the
median. Also, half of the medians are less than the pivot.
Thus there are approximately 2*(n/5)/2 + n/5/2 = n/5 + n/10 = 3/10n
elements either above or below the median of medians.
- Thus, the pivot is guaranteed to exclude
O(n) numbers, avoiding the O(n^2) worst case.
- What’s the catch?
- Circular reasoning: Our find-the-median algorithm itself needs to find the median of
N/5 elements.
- Solution: Just call the algorithm recursively.
- To be precise:
- Assume $N = 5(2k+1) = 10k + 5$. (This means N is a multiple of 5 and
N/5 is odd).
- When we find the median of medians (call it
v, there will be k + 1 pair of nodes less than v and
k + 1 pair of nodes greater than v. Also, k of the medians will be < v and k will be > v. Thus
, you know you that the pivot will exclude at least 3k+1 nodes where k = (N-5)/10
- Multiplication
- What is big-O of multiplying two N-digit numbers by hand? (The way you learned in 3rd grade)
- Can we do better?
- Let’s try divide an conquer.
- Suppose
X and Y have 8 digits each.
- Rewrite
X and Y as X_L * 10^4 + X_R and Y_L * 10_4 + Y_R.
- What is
XY X_L*Y_L*10^8 + (X_LY_R + Y_LX_R)* 10^4 + X_R*L_R.
- Doesn’t help. Each multiplication is 1/2 the size; but, there are 4 of them, so we are back where we started.
- The trick is to rework so the product has fewer than 4 multiplications.
- Where might the middle term show up?
- Inside
(X_L - X_R)(Y_R - Y_L)
- Specifically, to get the desired term, we add
X_LY_L and X_RY_R, which we already computed.
- When we do this, we get
T(n) = 3*T(n/2) + O(N), which simplifies to O(N^(log_2(3)) = O(N^1.59) — not
linear; but, better than quadratic.
- Not commonly used
- For small N, overhead isn’t worth it.
- For large N, there are even better divide and conquer algorithms.
- Matrix Multiplication
- What is conventional running time for multiplying
NxN matrix? O(N^3)
- How can we do better?
- Similar trick.
- Divide each matrix into quarters.
- Gives a straightforward result requiring 8 smaller multiplies.
- However, we cleverly manipulate the 8 quarters so that need only 7 multiplies.
- Resulting analysis:
T(N) = 7*T(N/2) + O(N^2)
- Simplifies to
O(N^(log_2 7)) = O(N^2.81)
- Mostly of theoretical interest. Not practical.