GVSU CIS 263
Week 11 / Day 1
- Offline bin packing
- Obviously, if we can see all the items first, we can do better.
- (Trivially, we can find the optimum by exhaustive search —
O(n!)
)
- What approach would you take to approximate? Why?
- Place the biggest boxes first — they seem to create the most problems in the online version.
- First fit:
- Worst cast:
(4M + 1)/3
- Easier to think of as
M + ceil((M-1)/3)
– About 1/3 “extra”.
- Lemma: All the items placed in “extra” bins have size of at most 1/3.
- Suppose to the contrary that the first item in box
M+1
– called s_i
has size 1/3.
- All previous items must also have size 1/3.
- Boxes 1..M must have at most two items.
- In fact the first
j
must have one element, and the rest must have two.
- Why must all the one item boxes come before two item boxes?
- Remember, boxes placed in size order.
- Now size
M
is the optimial bin packing size, there must be a way to re-arrange
the first i-i
packages so package i
will fit. However,
- Nothing can double up with the first
j
items in one of the first j
bins,
otherwise first fit would have already done si.
- None of the boxes can have 3 items since they are all size 1/3 or more.
- Thus, this is a contradiction. Either
M
is not optimal, or s_i
must
have size < 1/3.
- Lemma: There can be at most
M-1
objects in “extra” bins.
- Note that sum of all package sizes is at most M. (Otherwise we couldn’t fit them all in M boxes.)
- Now consider
- the total size of the packages in the first M boxes: sum(W_i)
- the total size of the next M packages: sum(x_i)
- These two totals together must be less than sum(s_i) because it doesn’t include all the packages.
- However, consider what happens when we pair each
W_i
with an x_i
: That sum must be larger than 1,
otherwise we would have put package x_i
in bin W_i
. This is another contradiction, so we know
there can be at most M-1 objects in extra bins.
- Since there are at most
M-1
“extra” objects, and the extra objects have size at most 1/3, then
we can pack them 3 to a bin, for a total of about 1/3 extra bins.
- How close to this 4/3 bounds can youi come?
- Not very. This bound is not “tight”.
- A much more complicated analysis shows an upper bound of
11/9M + 6/9
- That’s 22% “extra” instead of 33%
- That bound is “tight”: We can generate a set of packages will require
11/9 opt
bins when first fit is used.
- In practice, if package sizes are uniformly distributed over the interval O..1, then the number of extra bins is
about
sqrt(M)
. For large values of M, this is a relatively small percentage:
- 10% for M=100
- 3% for M=1000
- Think of “diminishing returns” — i.e., how expensive the last few percent of accuracy is compared to the first
9x%.
Week 11 / Day 2
Divide and Conquer
- Really only applies when
- Both halves of the division require processing
- Binary search doesn’t count, since only one half is processed
- The division is of size
O(n)
- Selection sort doesn’t count, because you only take 1 element at a time.
- Closest Points
- Divide in half
- Find closest point in each half.
- Trick is to quickly find the closest points if they are in each half.
- Key observations:
- If
D
is the distance between the closest points entirely in one half,
then we need only consider points within D
of center line.
- Within any given
DxD
rectangle, only 4 points can be found that are at most D
units apart.
- Thus, for any point near the center line, need only measure the distance to up to 8 other points (within two
DxD
squares: One on each side of the center line).
O(n log n)
for the same reason that merge sort is O(n log n)
- But, how do you quickly find the points in the
DxD
square?
- Selection problem (e.g., finding the median)
- How quickly can we find the median of
O(n)
numbers?
- Review quicksort
- Remember, worst case is
O(n^2)
- How to modify to find the
k
th smallest element without sorting the entire array?
- Pick a pivot.
- Move elements to either side of the pivot. (Call the left side
S_1
)
-
If $k < |
S_1 |
$ recurse on S_1 |
- If $k = |S_1| + 1, return pivot.
$ Else recurse on
S_2
: quickselect(S_2, k - |S_1| -1)
- Note the worst-case running time is
O(n^2)
?
- Average case is
O(n)
- Can we get
O(n)
worst-case?
- What do we need to do to guarantee
O(n)?
- Guarantee that the pivot is not too close to the edge.
- The algorithm
- Arrange
N
elements into groups of 5.
- Find the median of each group.
- Use median of medians as pivot.
- There are
N/5
columns. In each column, there are either 2 values above the median, or two values below the
median. Also, half of the medians are less than the pivot.
Thus there are approximately 2*(n/5)/2 + n/5/2 = n/5 + n/10 = 3/10n
elements either above or below the median of medians.
- Thus, the pivot is guaranteed to exclude
O(n)
numbers, avoiding the O(n^2)
worst case.
- What’s the catch?
- Circular reasoning: Our find-the-median algorithm itself needs to find the median of
N/5
elements.
- Solution: Just call the algorithm recursively.
- To be precise:
- Assume $N = 5(2k+1) = 10k + 5$. (This means N is a multiple of 5 and
N/5
is odd).
- When we find the median of medians (call it
v
, there will be k + 1
pair of nodes less than v
and
k + 1
pair of nodes greater than v
. Also, k
of the medians will be < v
and k
will be > v
. Thus
, you know you that the pivot will exclude at least 3k+1
nodes where k = (N-5)/10
- Multiplication
- What is big-O of multiplying two N-digit numbers by hand? (The way you learned in 3rd grade)
- Can we do better?
- Let’s try divide an conquer.
- Suppose
X
and Y
have 8 digits each.
- Rewrite
X
and Y
as X_L * 10^4 + X_R
and Y_L * 10_4 + Y_R
.
- What is
XY
X_L*Y_L*10^8 + (X_LY_R + Y_LX_R)* 10^4 + X_R*L_R
.
- Doesn’t help. Each multiplication is 1/2 the size; but, there are 4 of them, so we are back where we started.
- The trick is to rework so the product has fewer than 4 multiplications.
- Where might the middle term show up?
- Inside
(X_L - X_R)(Y_R - Y_L)
- Specifically, to get the desired term, we add
X_LY_L
and X_RY_R
, which we already computed.
- When we do this, we get
T(n) = 3*T(n/2) + O(N)
, which simplifies to O(N^(log_2(3)) = O(N^1.59)
— not
linear; but, better than quadratic.
- Not commonly used
- For small N, overhead isn’t worth it.
- For large N, there are even better divide and conquer algorithms.
- Matrix Multiplication
- What is conventional running time for multiplying
NxN
matrix? O(N^3)
- How can we do better?
- Similar trick.
- Divide each matrix into quarters.
- Gives a straightforward result requiring 8 smaller multiplies.
- However, we cleverly manipulate the 8 quarters so that need only 7 multiplies.
- Resulting analysis:
T(N) = 7*T(N/2) + O(N^2)
- Simplifies to
O(N^(log_2 7)) = O(N^2.81)
- Mostly of theoretical interest. Not practical.