# Computer Science 3719, Winter '08 Course Diary

## Thursday, February 7 (Lecture #1) [Class Notes]

• Algorithm Design Techniques
• Over the next few lectures, we'll look in depth at various design algorithm design techniques for various problems on discrete structures.
• In many cases, we will be looking for a substructure of a given structure that is optimal relative to some function (combinatorial optimization problems). It would be worth talking a bit more about these before we go on.
• Combinatorial optimization problems
• Each solution associated with an instance of a combinatorial optimization problem has a cost, and we want the solutions of optimal (min/max) cost over the whole solution space.
• Any time you have a problem of the form "Find the best X" where X is some discrete structure, chances are you're dealing with a combinatorial optimization problem.
• Distinguish several types of solutions:
• Candidate solutions = all possible solutions to the instance, e,g., all possible edge-sequences of a given graph that start with vertex u and end with vertex v.
• Viable solutions = all potentially optimal candidate solutions, e.g., all edge-sequences of a given graph that are paths between vertices u and v.
• Optimal solutions = those viable solutions that have optimal value relative to the cost function over the set of viable solutions, e.g., all paths in a given graph between vertices u and v that have the smallest number of edges over the space of all such paths.
• Each such problem instance has a combinatorial, i.e., exponential in instance size, candidate (and often viable) solution space -- the job of an algorithm is to find the optimal solutions in this space (of which there will always be at least one).
• Example: The 0/1 Knapsack problem

0/1 Knapsack
Input: A set U of items, an item-size function s(), an item-value function v(), and a positive integer B.
Output: A subset U' of U such that the sum of the sizes of the items in U' is less than or equal to B and the sum of the values of the items in U' is the largest possible. space.

• Candidate solutions = subsets of U.
• Viable solutions = subsets of U whose summed size is <= B.
• Optimal solutions = subsets of U whose summed size is <=B and whose summed value is maximum over the space of viable solutions.
• Each subset U' can be modeled as a binary vector of length |U| (v_i = 1 (0) => item i is (not) in U'); hence, there are 2^{|U|} candidate solutions for any instance of 0/1 Knapsack.
• Sometimes, you can't generate viable solutions -- you have to generate all candidate solutions and check viability along the way.
• Example: The Longest Common Subsequence problem

Longest common subsequence (LCS)
Input: Two strings s, s' over an alphabet Sigma.
Output: All strings s'' over Sigma such that s'' is a subsequence of both s and s' and the length of s'' is of is maximized over the solution space.

• Candidate solutions = all possible strings over alphabet Sigma of length <= min(|s|,|s'|).
• Viable solutions = all subsequences of the shorter given string.
• Optimal solutions = all subsequences of the shorter given string that are also subsequences of the longer given string and have the maximum length over the space of viable solutions.
• Each subsequence s'' can be modeled as a binary vector of length min(|s|,|s'|) (v_i = 1 (0) => symbol i is (not) included in the subsequence); hence, there are O(2^{min(|s|,|s'|)}) viable solutions for any instance of LCS (note that this worst case only occurs if each symbol in s and s' only occurs once).
• Sometimes, you can generate viable solutions up front -- this depends in part on your ingenuity.
• Example: The Minimum Spanning Tree problem

Minimum spanning tree (MST)
Input: An edge-weighted graph G = (V, E, w).
Output: All spanning trees T of G, i.e., trees that connect all vertices in G, such that the sum of the weights of the edges in T is minimized over the solution space.

• Candidate solutions = all (|V| - 1)-sized subsets of the edges of G.
• Viable solutions = all (|V| - 1)-sized subsets of the edges of G that are spanning trees of G.
• Optimal solutions = all spanning trees of G that have minimum summed edge-weight over the space of viable solutions.
• By Cayley's Theorem, a graph G = (V, E) has O(|V|^{|V| - 2}) spanning trees (the worst case here is when G is a complete graph); hence, there are O(|V|^{|V| - 2}) viable solutions for any instance of MST.
• Sometimes even the set of viable solutions is dauntingly large ...
• Can solve combinatorial optimization problems by looking at all (candidate/viable) solution and saving those of the optimal cost (which we can determine on the fly by saving those solutions with the best cost we've seen so far at any point in the enumeration of all solutions).
• This is inherently inefficient given the exponential sizes of the (candidate/viable) solution spaces involved. Can we do better?
• We most certainly can. Consider MST; though the set of viable solutions for each instance of MST is potentially exponentially large, we can solve all instances of MST in low-order polynomial time! (i.e., combinatorics is not destiny)
• How good we can do algorithm-wise depends on how much of the (candidate/viable) solution space our algorithm can ignore. Over the next few lectures, we'll examine four techniques for doing this whose applicability depend on some fairly simple properties of problem solution-spaces.
• Combinatorial Solution-Space Trees (Sections 9.1-9.7, B&B)
• The most basic approach to solving combinatorial optimization problems.
• In some cases, is the only known way to solve the problem.
• Is also one of the few dependable ways to generate all solutions for an instance of a combinatorial optimization problem.
• As a first step, simplify generation of solutions for a problem instance by using a combinatorial solution-space tree (CST). In such a tree, internal nodes are partial solutions and the full solutions labeling the leaves comprise the solution space.
• Example: A CST for an instance of 0/1 Knapsack in which |U| = 3.
• Can traverse such a tree and examine all of its leaves using depth-first search (DFS) or breadth-first search (BFS). Only those leaves corresponding to viable solutions are important.
• Example: Viability conditions for 0/1 Knapsack, LCS, and MST.
• Do items in partial solution fit in knapsack? (0/1 Knapsack)
• Is partial solution string a subsequence of both given sequences? (LCS)
• Do edges in partial solution form a forest? (MST)
• As a second step, decrease space requirements by operating on an implicit CST, i.e., generate tree nodes as necessary during traversal.
• Example: Algorithm for DFS-based traversal of implicit CST (minimization version).
```DFS-I(i, sol)
if (i == n + 1)
if (viable (sol))
return(cost(sol))
else
return(INFINITY)
else
return(min(DFS-I(i + 1, sol),
DFS-I(i + 1, sol U item[i])))
```
• Initial call: DFS-I(1, empty set)
• Note that n is the number of items in the set in the instance; hence "i == n + 1" means that you have reached a leaf in the CST.
• Example: Algorithm for BFS-based traversal of implicit CST (minimization version).
```BFS-I()
Q = {(1, empty set)}
best = INFINITY
while not empty(Q)
(i, sol) = pop(Q)
if (i == n + 1)
if (viable(sol))
best = min(best, cost(sol))
else
push(Q,(i + 1, sol))
push(Q,(i + 1, sol U item[i]))
return(best)
```
• Modifications:
• To make the above work for maximization problems, change min to max and INFINITY to -INFINITY.
• To obtain optimal solutions, the simplest way is to run a second tree-traversal that takes the optimal cost produced previously as a parameter and stores / prints all viable solutions with that cost.
• This is all well and good; however, we are still looking at every node in the tree, and as this tree is of exponential size, our algorithm runs in exponential time. Can we do better? Unfortunately, we cannot in the worst case; however, we can sometimes do much better in practice. This will be the topic of our next lecture.

## Tuesday, February 12 (Lecture #2) [Section 15.3-15.4; Class Notes]

• Algorithm Design Techniques (Cont'd)
• Combinatorial Solution-Space Trees (Sections 9.1-9.7, B&B) (Cont'd)
• As a third step, where possible, use information about partial solution viability and potential optimality to terminate search of unfeasible subtrees of the CST (dynamic pruning).
• Traditionally, DFS + dynamic pruning is known as backtracking and BFS + dynamic pruning is known as branch and bound.
• Backtracking typically used; however, if search tree does not have finite depth, branch and bound is preferable.
• Dynamic pruning works in CST because all "full" candidate solutions that incorporate a particular particular partial solution are leaves in the subtree rooted at the node corresponding to that partial solution; hence, non-viability of a partial solution implies the non-viability of all solutions in the subtree rooted at the node corresponding to that partial solution!
• Example: Potential optimality conditions for 0/1 Knapsack, LCS, and MST.
• Does sum of values of items in partial solution plus sum of values of all remaining items that could be added exceed the value of the best solution seen so far? (0/1 Knapsack)
• Does the length of the partial solution string plus the length of the string of all remaining characters that could be added exceed the length of the best solution string seen so far? (LCS)
• Is the sum of the weights of the edges in the partial solution less than the cost of the best solution found so far? (MST)
• Example: Algorithm for DFS-based traversal of implicit CST + dynamic pruning (minimization version).
```DFS-IP(i, sol, best)
if (not viable(sol)) or (best < cost(sol))
return(INFINITY)
else if (i == n + 1)
best = min(best, cost(sol))
return(cost(sol))
else
return(min(DFS-IP(i + 1, sol, best),
DFS-IP(i + 1, sol U item[i], best)))
```
• Initial call: DFS-IP(1, empty set, BEST) where BEST is an object encoding the integer INFINITY.
• Note that variable BEST is passed by reference, e.g., it is a pointer to an integer; if it is passed by value, the value of BEST is not saved from any of the leaf-evaluations.
• Example: Algorithm for BFS-based traversal of implicit CST + dynamic pruning (minimization version).
```BFS-IP()
Q = {(1, empty set)}
best = INFINITY
while not empty(Q)
(i, sol) = pop(Q)
if (viable(sol) and (cost(sol) < best)
if (i == n + 1)
best = min(best, cost(sol))
else
push(Q,(i + 1, sol))
push(Q,(i + 1, sol U item[i]))
return(best)
```
• Example: A Java implementation of a DFS-based CST algorithm for solving the Vertex Cover problem [Problem Description, VCST.java]
• Such tricks can drastically reduce actual running time for CST traversal; however, all of these algorithms are still asymptotically exponential time in the worst case (you still might have to visit all the nodes in the tree).
• The CST approach to solving combinatorial optimization problems assumes nothing about the structure of the solution space for an instance. If we did have such knowledge, can we do better? That is the topic of our next lecture.
• There are other types of computation trees besides CST. There are, for example, the trees associated with recursive computations, in which nodes represent individual recursive algorithm calls on subproblems of smaller size. Such a recursive-call tree is also called a recursive decomposition.
• Characteristics of recursive decomposition:
• Can decompose instance of problem into smaller instance of the same problem, i.e., subproblems.
• At any point in the computation, may be several choices for decomposition.
• Problems that are solvable by recursive decomposition have the optimal substructure property, i.e, an optimal solution for an instance of the problem can be constructed using optimal solutions for instances of that problem that are of smaller size (subproblems)..
• Our remaining algorithm design techniques will be applicable if a problem's recursive decomposition satisfies additional properties.
• Divide-and-Conquer (D&C)
• Applies if a problem has recursive decomposition in which each subproblem occurs only once, i.e., the subproblems do not repeat / overlap.
• Will typically only be efficient if the depth of recursion is logarithmically bounded, thus ensuring that the total number of recursive calls is polynomially bounded.
• Example: Merge sort
• Example: Binary search
• Example: Fast exponentian (a^n)
• Example: Strassen's matrix multiplication algorithm (Section 28.2)
• This is all fine and good; however, can we do anything if we have a problem with a recursive decomposition in which each subproblem occurs more than once? It turns out that we can, and this will be the topic of our next lecture.
• Dynamic Programming
• Applies if a problem has a recursive decomposition in which the number of distinct subproblems is bounded and can thus be organized in an n-dimensional table.
• Example:: Computing Fibonacci numbers
• Recurrence
• Recursive algorithm:
```FibR(n)
if ((n == 1) || (n == 2))
return(1)
else
return(FibR(n - 1) + FibR(n - 2))
```
• Recursive-calls tree: 2^{n/2} - 1 <= Tree Size <= 2^{n} - 1
• The problem here is that subproblems are being repeated; perhaps the recursive algorithm can keep track of what subproblems have been solved by storing those values in a table and look these up as necessary? An algorithm that does this is called a memoized algorithm.
• Memoized (recursive) algorithm:
```InitFibT(n,FibT)
Allocate n-element table and assign to FibT
FibT = FibT = 1
for (i = 3; i <= n; i++)
FibT[i] = INFINITY

FibM(n, FibT)
if (FibT[n] == INFINITY)
FibT[n] = FibM(n - 1, FibT) +
FibM(n - 2, FibT)
return(FibT[n])
```
• This table idea was pretty neat. In fact, do we really need the recursion anymore? Let's just fill out the table!
• Dynamic programming (iterative) algorithm:
```FibI(n)
if ((n == 1) || (n== 2))
return(1)
else
Allocate n-element table FibT
FibT = FibT = 1
for (i = 3; i <= n; i++)
FibT[i] = FibT[i - 1] + FibT[i - 2]
return(FibT[n])
```
• If we are lucky, there are a polynomial number of distinct subproblems. If we are really lucky, we can use the recurrence to solve these subproblems in a "bottom up" rather than a "top down fashion", solving the smallest subproblems first and solving progressively larger subproblems until we solve the original problem instance.
• Dynamic programming (DP) => table-driven, bottom-up recursive decomposition algorithms!
• The Dynamic Programming Cookbook:
• Step 1: Find a recurrence / recursive decomposition for the problem of interest.
• This also (in the case of complex recurrences) involves proving that this recurrence is correct.
• This step is difficult, and very much an art -- one learns this by practice, and drawing on a standard list of example recurrences.
• Step 2: Lay out the distinct subproblems in a table.
• Step 3: Fill in the base-case values in the table.
• Step 4: Run the recurrence "in reverse" / in a bottom-up fashion, using smaller solved subproblems to solve larger subproblems, until the table is filled in.
• At the end of this fill-in process, the cost of the optimal solution is now in one of the table entries, e.g., FibT[n].
• Step 5: Use traceback from the optimal-cost table entry to reconstruct one or more optimal solutions for the problem.
• As noted in Step 1 above, the design of DP algorithms is made easier by having a standard set of example algorithms to draw on for inspiration. In the next several lectures, we will present several such examples.

## Thursday, February 14 (Lecture #3) [Section 15.4]

• Algorithm Design Techniques (Cont'd)
• Dynamic Programming (Cont'd)
• Example: DP algorithm for Longest Common Subsequence (LCS)
• Recursive decomposition: three cases.
• Recurrence (recursive + base cases)
• D(i,j) = length of longest common subsequence of first i characters of s and first j characters of s'.
• Example run: s = cbab and s' = abacab
• Matrix fill-in (including backpointers) for deriving length of longest common subsequence.
• Rectangular (m + 1) x (n + 1) matrix (where m and n are the lengths of s and s', respectively); fill-in proceeds from cell (1,1) row-wise, column-wise, or anti-diagonal-wise.
• Fill-in of an individual cell involves consulting filled-in cells in the 3-cell "overhang" to left and behind of that cell.
• Traceback for deriving longest common subsequences.
• DP algorithm (table fill-in + traceback)
• Time complexity of matrix fill-in: (# cells) x (#choices per cell) x (max # subproblems per choice) = (mn) x (1) x O(1) = 2mn = O(mn)
• Space complexity: # cells = (((m + 1) x (n + 1)) = O(mn)
• Deriving an arbitrary longest common subsequence given a filled-in matrix takes O(m + n) time; however, enumerating all longest common subsequences may require exponential time!

## Tuesday, February 19

• Midterm break; no lecture

## Tuesday, February 19

• Midterm Exam Notes
I've finished making up the upcoming midterm exam. This exam will be closed-book and has a total of 60 marks (note that I have tried to make the number of marks for a question roughly equivalent to the number of minutes it should take you to do it). The current layout is as follows:

• Recurrences (10 marks)
• Asymptotic worst-case time complexity (22 marks)
• Algorithm design techniques (28 marks)

Things that will not be covered are solution of recurrences and detailed derivation of "raw" worst-case time complexity functions; however, all else is fair game. The questions on the first assignment are guides to what I have in mind; you should also find the questions on previous CS 3711 exams of some help:

• Midterm exam (Winter 2003) (7 pages: PDF)
• Answers to midterm exam (Winter 2003) (8 pages: PDF)
• Midterm exam (Winter 2004) (7 pages: PDF)
• Answers to midterm exam (Winter 2004) (8 pages: PDF)
• Midterm exam (Winter 2005) (7 pages: PDF)
• Answers to midterm exam (Winter 2005) (7 pages: PDF)

I hope the above helps, and I wish you all the best of luck with this exam.

## Thursday, February 21 (Lecture #4) [Section 15.2; Class Notes]

• Algorithm Design Techniques (Cont'd)
• Dynamic Programming (Cont'd)
• Example: DP algorithm for Matrix Chain Parenthesization (MCP)
• Review of 2-matrix multiplication: O(n^3) time (n = max matrix dimension)
• 3-matrix multiplication: Parenthesization of matrix "chain" for 2-matrix multiplication can drastically affect the number of individual element multiplications.
• Recurrence (recursive + base cases)
• m(i,j) = minimum number of element multiplications required to multiply matrices i through j in the given chain.
• Note that this is the first recurrence we've seen that involves multiple subproblem decomposition choices.
• DP algorithm
• Matrix fill-in (including backpointers) for deriving minimum number of multiplications relative to optimal matrix chain parenthesization. subsequence.
• Ragged-bottom pyramid / B1 bomber shape matrix; fill-in proceeds from left to right in ascending rows.
• Fill-in of an individual cell involves consulting filled-in cells in the "boomerang" hanging off that cell.
• Backpointer matrix cells have value k at which matrix sequence i .. j split into matrix subsequences i .. k and (k + 1) .. j.
• Traceback for deriving optimal matrix chain parenthesization.
• Time complexity of matrix fill-in: (# cells) x (#choices per cell) x (max # subproblems per choice) = (n(n - 1)/2) x O(n) x O(1) = O(n^3).
• Space complexity: # cells = (n(n - 1)/2) = O(n^2)
• Example run: matrix chain is A_1 A_2 A_3 A_4, where A_1 = 10 x 5, A_2 = 5 x 20, A_3 = 20 x 5, and A4 = 5 x 20. Thus, n = 4 and dimension list p = (10,5,20,5,20)

## Tuesday, February 26 (Lecture #5) [Sections 15.3 and 16.1; Class Notes]

• Algorithm Design Techniques (Cont'd)
• Dynamic Programming (Cont'd)
• Dynamic programming is very powerful, but it is not always applicable or practical -- its time now to rain a little on our DP Parade ...
• An important characteristic of problems whose optimal substructure can be exploited by dynamic programming (and indeed by divide-and-conquer and greedy algorithms as well) is subproblem independence, i.e., solutions to subproblems do not overlap. This is important because subproblem solutions can only be combined if they do not overlap.
• Example: The existence (impossibility) of dynamic programming algorithms for the Shortest (Longest) Path problems.
• Another important characteristic of the dynamic programming algorithms we have seen so far ius that they have all had a polynomially-bounded number of distinct subproblems. However, this need not always be the case.
• Example: DP algorithm for 0/1 Knapsack
• Recurrence (recursive + base cases)
• V(i,j) = summed of value of optimal knapsack load with size <= j, 0 <= j <= B, composed of selection from first i, 0 <= i <= |U|, items of U.
• Recursive cases:
• V(i,j) = max{V(i - 1,j), V(i - 1, j - s(u_i)) + v(u_i)} if s(u_i) <= j
• First term: Cannot add item i to knapsack.
• Second term: Can add item i to knapsack.
• Assumes item i fits in a knapsack of size j, i.e., s(u_i) >= j; as this is not always so, we have the second recursive case below.
• V(i,j) = V(i - 1,j) if s(u_i) > j
• Intuition: If item i cannot fit in a knapsack of size j, then the best load we can create is relative to items 1 through (i - 1) inclusive.
• Base case:
• V(i,0) = V(0,j) = 0 (intuitively, both the knapsack of capacity 0 and the knapsack with 0 items to include in it have value 0); all other V(i,j) consulted by the recurrence are out of bounds of the table are assumed to have value -INFINITY.
• DP algorithm
• Example run
• Matrix fill-in (including backpointers) for deriving value of optimal knapsack.
• Rectangular (|U| + 1) x (B + 1) matrix; fill-in proceeds from cell (1,1) row-wise, column-wise, or anti-diagonal-wise.
• Fill-in of an individual cell involves consulting cell immediately above (V(i - 1,j)) and cell back one row to the left (V(i - 1, j - s(u_i))).
• Backpointer matrix cells have value 1 or 2, depending on which recursive case was invoked.
• Traceback for deriving composition of optimal knapsack.
• Time complexity of matrix fill-in: (# cells) x (#choices per cell) x (max # subproblems per choice) = ((|U| + 1) x (B + 1)) x 1 x O(1) = O(|U|B)
• Space complexity: # cells = ((|U| + 1) x (B + 1)) = O(|U|B)
• Wait a minute ... this isn't a polynomial-time algorithm, because |U|B is not a polynomial of the input size (|U| + log B). What we actually have here is an exponential time algorithm!
• The same problem actually arises as well in our DP Fibonacci number algorithm; to compute the nth Fibonacci number that algorithm needs O(n) time and space -- however, if the algorithm is only given n as input (which is stored in binary using log n bits, hence giving an input size of log n), the algorithm actually runs in exponential time and space!
• DP is not always better than CST!
• CST for Knapsack (DFS + pruning): O(2^{|U||U|) time, O(|U|) space.
• DP for Knapsack: O(|U|B) time and space.
• There may be no "best" algorithm for a problem! Rather there are only algorithms that are the best relative to particular ranges of input-parameter values.
• The relationship between CST, DP, and Divide-and-Conquer algorithms:
```				Optimal
Substructure
AND
Independent
Subproblems?
/          \
/ YES        \ NO
/              \
Repeated            CST
Subproblems?
/            \
/ NO           \ YES
/                \
D&C                 DP
```
• Dynamic programming and divide-and-conquer algorithms have enabled us to make many apparently difficult problems solvable in low-order polynomial time. Can we do still better? In certain cases, it seems that we can.
• Consider binary search; it is a particularly interesting example of a D&C algorithm in which there is, at each point in the computation, at most 1 choice and 1 subproblem to be solved. In this case, the recursive computation collapses to a path of recursive calls from the "root" problem to a "leaf" subproblem.
• The subproblem structure for binary search is unusually simple, in that a choice can be made before the subproblem is solved, i.e., the choice can be made in a top-down, locally optimal (greedy) manner. Compare this with divide-and-conquer and dynamic programming algorithms, which solve subproblems in a bottom-up manner and make choices only after all necessary subproblems have been solved.

• Midterm Exam

## Tuesday, March 4 (Lecture #6) [Section 16.1; Class Notes]

• Algorithm Design Techniques (Cont'd)
• Greedy Algorithms
• Example: The Activity Selection problem
• Problem statement
• DP algorithm
• Can we do better? In particular, do we have to evaluate all subproblem-pairs induced by a choice ak from S(i,j), or can we get away with a single ak-choice and subproblem?
• It turns out that we can -- pick the activity am from S(i,j) with the earliest finishing time -- we are then left with the subproblem S(m,j)! (Theorem 16.1).
• This gets even better -- note that instead of solving all S(i,j) in the bottom-up manner described above, we start with S(0,n+1) and solve directly in a top-down manner!
• Greedy algorithm (iterative)

## Thursday, March 6(Lecture #7) [Sections 23.1-23.2; Class Notes]

• Algorithm Design Techniques (Cont'd)
• Greedy Algorithms (Cont'd)
• Example: The Minimum Spanning Tree problem (Cont'd)
• DP algorithm.
• Based on problems of the form P[i] = the cost of the minimum spanning forest for edge-weighted graph G that has i edges (note that what we ultimately want is P[|V| - 1]).
• We can imagine a recurrence for P[i] in which we add all possible edges to solution-forests for P[i - 1] that connect two formerly distinct components in the forest and select those edges that result in minimum-cost forests on n edges. To do this, we need to associate the solution spanning-forests explicitly with P[i], i.e., we cannot reconstruct them after matrix fill-in by traversing backpointers.
• The resulting recurrence has a double minimization over solution spanning-forests for P[i - 1] and edges that can be added to these forests, respectively.
• Can we do better? In particular, do we have to evaluate all of the subproblems induced by all edge choices at each point, or can we get away with a single edge-choice and subproblem?
• Consider the following very useful property of minimum spanning forests: Given such a forest for a graph G, each minimum-weight edge linking separate components in the forest must be in some minimum spanning tree for G (Theorem 23.1; see also Corollary 23.2). Such an edge is said to be safe.
• What this means is that all we have to do is choose a safe edge and evaluate the resulting subproblem! This gets rid of the inner minimization over edges in the recurrence; moreover, as the choice of a safe edge always leads to some minimum spanning tree, we can also get rid of the outer minimization over spanning-forests.
• Generic MST algorithm (page 563):
```GENERIC-MST(G = (V,E,w))
F = (V, {})
for (i = 1; i <= |V| - 1; i++)
Find a safe e in E
F = (V, E' u {e})
return F
```
• Under appropriate selections of safe edges, this generic algorithm underlies both of the classic minimum spanning tree algorithms considered below.
• Kruskal's algorithm
• Start with trivial forest of G that just includes the vertices in V; add edges greedily, minimum-weight first, growing a minimum spanning forest that eventually coalesces into a minimum spanning tree for G.
• Algorithm:
```MST-KRUSKAL(G, w)
A = {}
for each vertex v in V do
MAKE-SET(v)
sort the edges in E in nondecreasing order by weight
for each edge in (u,v) taken in nondecreasing order by weight do
if FIND-SET(u) <> FIND-SET(v) then
A = A u {(u,v)}
UNION(u,v)
return(A)
```
• Note that this a direct implementation of the generic MST algorithm above.
• Running time =
|V|(T(MAKE-SET)) + 2|E|(T(FIND-SET)) + |E|(T(UNION)) + SORT(|E|)
= |V|(T(MAKE-SET)) + 2|E|(T(FIND-SET)) + |E|(T(UNION)) + |E| log |E|
= |V|(T(MAKE-SET)) + 2|E|(T(FIND-SET)) + |E|(T(UNION)) + |E| log |V|
• Prim's algorithm
• Start with a single vertex in G; add edges greedily, minimum-weight first, growing minimum spanning tree for successively larger subsets of V until you have a minimum spanning tree for G.
• Algorithm:
```MST-PRIM(G,w,r)
for each u in V do
key[u] = INFTY
pi[u] = nil
key[r] = 0
Q = BUILD(V)
while Q is not empty do
u = EXTRACT-MIN(Q)
for each v in adj[u] do
if v in Q and w(u,v) < key[v] then
pi[v] = u
key[v] = w(u,v)
```
• Note that in this algorithm (unlike Kruskal's), the spanning forest has exactly one non-trivial component with edges at any point during processing.
• Running time = T(BUILD) + |V|(T(EXTRACT-MIN)) + |E|(T(DECREASE-KEY))
• Note that Kruskal's and Prim's algorithms are, in a broad sense, converses of each other -- by adding minimum-weight edges, Kruskal's algorithm coalesces a (initially trivial) spanning forest for the given graph G into a spanning tree for G and Prim's algorithm extends spanning tree for successively larger subgraphs of G until it has a spanning tree for G.
• There is usually a quicker way of deriving a greedy algorithm than developing and subsequently "pruning" a dynamic programming algorithm (see Section 16.2); however, the examples given here are useful as they show the intimate relationship between dynamic programming and greedy algorithms.
• The relationship between CST, DP, Divide-and-Conquer, and Greedy algorithms:
```				Optimal
Substructure
AND
Independent
Subproblems?
/          \
/ YES        \ NO
/              \
Greedy             CST
Choice?
/        \
/ NO       \ YES
/            \
Repeated       Greedy
Subproblems?
/            \
/ YES          \ NO
/                \
DP                 D&C
```
• The differences that make related versions of a problem amenable to greedy algorithms instead of just dynamic programming algorithms can be very subtle, and do not appear to be (at this point in time, anyway) either systematic or comprehensible.
• Example: Versions of the Knapsack problem that allow greedy and dynamic programming algorithms.
• Exploiting Data Structures
• The techniques we've seen so far use properties of solution spaces to organize efficient searches of these spaces; in many cases, we have lowered time complexities from exponential to low-order polynomial.
• Lower bound on time complexity of any algorithm for a problem is typically linear (must at least read all the input). To get as close as possible to this bound, need to choose data structures that provide best time complexities for operations required by an algorithm.
• Example: Exploiting data structures to solve the Minimum Spanning Tree problem
• Kruskal's algorithm
• Running time
= |V|(T(MAKE-SET)) + 2|E|(T(FIND-SET)) + |E|(T(UNION)) + O(|E| log |E|)
= |V|(T(MAKE-SET)) + 2|E|(T(FIND-SET)) + |E|(T(UNION)) + O(|E| log |V|^2)
= |V|(T(MAKE-SET)) + 2|E|(T(FIND-SET)) + |E|(T(UNION)) + O(|E| log |V|)
• Running time relative to various data structures for disjoint sets (see here for more details about cited data structures):

 Data Structure Time Complexity (Worst Case) MAKE-SET FIND-SET UNION Kruskal'sAlgorithm List (naive) O(1) O(n) O(1) O(|E||V|) List (rep-ptr) O(1) O(1) O(n) O(|E||V|) Forest (naive) O(1) O(n) O(1) O(|E||V|) Forest (rep-ptr) O(1) O(1) O(n) O(|E||V|) Forest (UR + PC) O(1) O(1) O(1) O(|E| log |V|)

• Prim's algorithm
• Running time = T(BUILD) + |V|T(EXTRACT-MIN)) + |E|(T(DECREASE-KEY))
• Running time relative to various data structures for priority queues (see here for more details about cited data structures):

 Data Structure Time Complexity (Worst Case) BUILD EXTRACT-MIN DECREASE-KEY Prim'sAlgorithm Array (Unsorted) O(n) O(n) O(n) O(|E||V|) Linked list (Unsorted) O(n) O(n) O(n) O(|E||V|) Array (Sorted) O(n log n) O(n) O(n) O(|E||V|) Linked list (Sorted) O(n log n) O(1) O(n) O(|E||V|) Search tree O(n) O(n) O(n) O(|E||V|) Balanced search tree O(n) O(log n) O(log n) O(|E| log |V|) Binary heap O(n) O(log n) O(log n) O(|E| log |V|)

• There are several lessons implicit in the tables above:
• To obtain the best overall set of running times for a set of operations relative to an abstract data type, a complex underlying data structure may be required.
• When one is dealing with an algorithm that only invokes a subset of the operations relative to an abstract data type, the best overall data structure implementing that data type may not be appropriate -- rather, one looks for a data structure that gives the best running times for the required subset of operations.
• An efficient algorithm is a collaboration between an appropriate algorithm design technique and one or more data structures whose operation-sets are optimally efficient for the subset of operations required by the algorithm. Hence, need good knowledge of both algorithm design techniques and data structures to create efficient algorithms.
• An algorithm designer also needs a good knowledge of the best-known algorithms for standard problems. After all, who wants to reinvent the wheel?
• Such knowledge tends to be domain-specific. Let's concentrate on the domain of graphs and standard graph algorithms, starting with our next lecture.

## Tuesday, March 11 (Lecture #8) [Sections 22.2, 24.1, and 24.3; Class Notes]

• Algorithm Design Techniques (Cont'd)
• Example: Shortest Paths in Graphs
• Has many applications, especially if you also have edge-weights, e.g., distances, costs.
• Two types of problems:
• Single-source shortest paths
• All-pairs shortest paths
• Can further break down algorithms on edge-weighted graphs based on whether or not they detect negative-weight cycles. Such cycles are problematic because they render shortest-paths estimates meaningless, e.g., can make length of path arbitrarily short by appropriate number of traversals around the cycle.
• Properties of shortest paths:
• Triangle inequality: d(s,v) <= d(s,u) + w(u,v)
• Shortest path has at most |V| - 1 edges, i.e., can eliminate negative-, zero-, and positive-weight cycles from shortest paths.
• Shortest paths satisfy the optimal substructure property, i.e., given a shortest path from s to v and a vertex u on this path, the subpaths s->u and u->v are also shortest paths; this means that there are many nice algorithmic possibilities, i.e., DP, D&C, Greedy.
We will exploit each of these properties to develop different types of shortest-path algorithms over the next two lectures.
• Single-Source Shortest Paths
• Can do this in unweighted graphs using general breadth-first search (BFS).
• Intuition behind BFS: Exploration of shells of vertices at successively larger distances from a selected point.
• Basic BFS algorithm:
```BFS(G, s)
for each vertex u in G - {s} do
color[u] = white
color[s] = gray
Q = {}
ENQUEUE(Q, s)
while Q is not empty do
u = DEQUEUE(Q)
for each v in adj[u] do
if color[v] = white then
color[v] = gray
ENQUEUE(Q,v)
color[u] = black
```
• Interpretation of colors:
• black = fully explored (node + children)
• gray = partially explored (node only)
• white = unexplored
• Visualize as ink spreading from a central point outwards along edges until it has reached all vertices in a graph.
• Analysis of running time
• Running time = |V|O(1) + |Q|O(1) + |E|O(1) + T_tot(next-adj)) = |V| + |V|O(1) + |E| + T_tot(next-adj)) = O(|V| + |E| + T_tot(next-adj))
• Adjacency list: O(|V| + |E| + |E|) = O(|V| + |E|) time
• Adjacency matrix: O(|V| + |E| + |V|^2) = O(|V|^2) time
• Though |E| <= |V|(|V| - 1)/2 = O(|V|^2), often keep |E|-term in expressions to show running time relative to sparse graphs.
• Did not analyze algorithm in terms of rigorously-derived T(n) function -- worked less formally and directly in terms of big-Oh behavior of individual algorithm components.
• Did not analyze algorithm by nested control structure -- had to reason about behavior of algorithm as a whole to derive certain quantities, e.g., total length of queue over execution of algorithm.
• Full BFS algorithm (page 532):
```BFS(G, s)
for each vertex u in G - {s} do
color[u] = white
**      d[u] = INFTY
**      pi[u] = NIL
color[s] = gray
**  d[s] = 0
**  pi[s] = nil
Q = {}
ENQUEUE(Q, s)
while Q is not empty do
u = DEQUEUE(Q)
for each v in adj[u] do
if color[v] = white then
color[v] = gray
**              d[v] = d[u] + 1
**              pi[v] = u
ENQUEUE(Q,v)
color[u] = black
```
• What are the pi and d arrays for?
• BFS defines a search tree based on all vertices of G that is rooted at the start vertex s; the pi-array defines the node-parent relations in this tree.
• This search tree is structured such that the length of the path from s to u in this tree is the length of the shortest path (in terms of # edges) between s and u in G; the d-array contains these shortest-path length values.
• Intuition behind proof of correctness for d-values: If there were a shorter path from s to v via some vertex u, during exploration of children of u, v would have previously been colored gray!
• Note that Prim's algorithm for computing minimum spanning trees is somewhat like BFS; both are expanding a frontier outwards from an initial vertex in the graph, but while BFS expands this frontier uniformly by distance from the start point, the expansion by Prim is relative the the smallest-weight edge extending into the frontier.
• Can solve on edge-weighted graphs by applying edge-expansion and BFS; however, this yields an exponential time algorithm. Can we do better?
• Single-source (and indeed all-source) shortest-path algorithms on weighted graphs use the Relax operation.
• Algorithm (pages 585-586):
```Initialize-Single-Source(G, s)
for each vertex v in adj[v] do
d[v] = INFTY
pi[v] = nil
d[s] = 0

Relax(u, v, w)
if (d[v] > d[u] + w(u,v)) then
d[v] = d[u] + w(u,v)
pi[v] = u
```
• The relax operation is trivial; however, sequences of relaxations can be quite powerful.
• Many of the algorithms we will consider for the next 1 1/2 lectures will essentially differ only in the the order in which edges are relaxed and the number of times each edge is relaxed.
• Dijkstra's algorithm
• This algorithm essentially applies relaxations in a systematic matter to grow a set of vertices with known shortest-path distances outwards from the source vertex to encompass the whole graph. In particular, relaxations are applied to vertices (or rather, the adjacencies of those vertices) as they are added to this set and can be seen as extending out from the frontier of this region into the remainder of the graph.
• Algorithm:
```Dijkstra(G, w, s)
Initialize-Single-Source(G, s)
S = {}
Q = V[G]
while Q is not empty do
u = EXTRACT-MIN(Q)
S = S u {u}
for each vertex v in adj[u] do
Relax(u, v, w)
```
• Analysis of running time
• Running time
= T(INITIALIZE) + |V|T(EXTRACT-MIN) + |E|T(RELAX)
= T(INITIALIZE) + |V|T(EXTRACT-MIN) + |E|T(DECREASE-KEY)
• Runs in O(|E| log |V|) time relative to a binary heap.
• Doesn't Dijkstra's algorithm look an awful lot like Prim's algorithm for minimum spanning trees? It should! Try writing the code for the initialization and relaxation procedures into Dijkstra's algorithm, eliminating the S-lines (which are not being used), and then comparing with Prim's algorithm -- oddly enough, you'll see that relaxation is a generalization of the key-value adjustment done by Prim on lines 9 and 11, e.g.,
```Dijkstra-Rewrite(G, w, s)
for each u in V do
d[u] = INFTY
pi[u] = nil
d[s] = 0
Q = V
while Q not empty do
u = EXTRACT-MIN(Q)
for each v in adj[u] do
if d[u] + w(u,v) < d[v] then
d[v] = d[u] + w(u, v)
pi[v] = u
```
• Dijkstra's algorithm cannot detect negative cycles, and will produce erroneous results if one or more such cycles are present. Worse, this algorithm may fail in the presence of negative-weight edges, independent of whether these edges are part of a negative-weight cycle.
• Dijkstra's algorithm is fast, but at the price of restricted applicability. Can we do better?
• The Bellman-Ford algorithm
• This algorithm essentially adopts the philosophy that if you invoke relaxation often enough, all will be well. Oddly enough, it turns out that if you relax each edge in the graph |V| - 1 times, the d-values associated with all vertices are guaranteed to settle to the actual shortest-distance values. This works because each round of relaxations relaxes one edge further along a shortest path and each shortest path has at most |V| - 1 edges.
• Algorithm:
```Bellman-Ford(G, w, s)
Initialize-Single-Source(G,s)
for i = 1 to |V| - 1 do
for each edge (u,v) in E do
Relax(u, v, w)
for each edge (u,v) in E do
if d[v] > d[u] + w(u,v) then
return FALSE
return(TRUE)
```
• Running time = O(|V||E|)
• This algorithm does (|V| - 1)|E| relaxations, as compared to the |E| relaxations done by Dijkstra's algorithm, and hence runs much slower. However, it has the advantage of being able to detect negative-weight cycles.
• The intuition behind this is almost the same as for the Bellman-Ford algorithm in general as given above -- in addition to propagating correct shortest distances along a path, the invoked relaxations propagate the negative summed weights around negative-weight cycles and dump these negative weights onto regular paths, where the resulting screwup in the relation between d-values of adjacent vertices can be detected by a simple scan of all edges (lines 5-7 of the algorithm)
• Shouldn't Dijkstra's algorithm be able to detect negative-weight cycles by the same argument? I suspect not, as Dijkstra's algorithm probably does not invoke enough relaxations to guarantee that the influences of negative-weight cycles will necessarily be dumped onto all (or even any) paths -- however, I could be wrong in this.
• Note that the running times given for both Dijkstra's algorithm and the Bellman-Ford algorithm assume that the input graph is represented as an adjacency list -- if the graph is instead represented as an adjacency matrix, the running time of both algorithms becomes O(|V|^3).

## Thursday, March 13 (Lecture #9) [Sections 25.1-25.2; Class Notes]

• Algorithm Design Techniques (Cont'd)
• Example: Shortest Paths in Graphs
• All-Pairs Shortest Paths
• Can do this by applying either of the algorithms for single-source shortest paths |V| times (once for each vertex in the graph). Is there a faster way? Try dynamic programming!
• Some recurrences for shortest paths
• Path-length based
 L(0)i,j= { 0 i == 0 INFTY i <> 0
 L(m)i,j = min1 <= k <= n { L(m-1)i,k + w (k,j) }
• Intermediate-vertex-set based:

D(0)i,j = Wi,j
D(k)i,j = min { D (k-1)i,j ,     D (k-1)i,k + D (k-1)k,j}

• These algorithms will now be creating matrices instead of vectors of shortest-path distances; moreover, they will operate on adjacency matrix representations of input graphs instead of adjacency lists.
• Matrix-multiplication-based algorithm
• Uses first (path-length based) recurrence above.
• Basic subroutine:
```EXTEND-SHORTEST-PATHS(L, W)
n = rows[L]
Create n x n matrix L'
for i = 1 to n do
for j = 1 to n do
L'_ij = INFTY
for k = 1 to n do
L'_ij = min(L'_ij, L_ik + w_kj)
return L'
```
• Yes, this does look a lot like matrix multiplication. Odd, isn't it? There are some deep mathematical reasons for this, tied up with a mathematical structure common to both problems that is called a closed semiring (the interested are referred to Section 27.5 in the first edition of the textbook for more details).
• Slow algorithm (O(|V|^4) time):
```SLOW-ALL-PAIRS-SHORTEST-PATHS(W)
n = rows[W]
L^(1) = W
for m = 2 to n - 1 do
L^(m) = EXTEND-SHORTEST-PATHS(L^(m - 1), W)
return L^(n - 1)
```
• Why grow paths slowly one edge at a time? If we send L^(m - 1) twice to EXTEND-SHORTEST-PATHS instead of L^(m - 1) and W, the path-length will increase at an exponential rate and be >= |V| - 1 in only log |V| calls to EXTEND-SHORTEST-PATHS!
• Fast algorithm (O(|V|^3 log |V|) time):
```FAST-ALL-PAIRS-SHORTEST-PATHS(W)
n = rows[W]
L^(1) = W
m = 1
while m < n - 1 do
L^(2m) = EXTEND-SHORTEST-PATHS(L^(m), L^(m))
m = 2 * m
return L^(m)
```
• The Floyd-Warshall algorithm
• Uses second (intermediate-vertex-set based) recurrence above.
• Algorithm:
```FLOYD-WARSHALL(W)
n = rows[W]
D^(0) = W
for k = 1 to n do
for i = 1 to n do
for j = 1 to n do
D^(k)_ij = min(D^(k - 1)_ij,
D^(k - 1)_ik +
D^(k - 1)_kj)
return D
```
• Running time = O(|V|^3)
• Note that both of these algorithms are technically operating on a 3-D matrix of size O(|V|^3); however, as we only need to save adjacent 2-D "slices" of this matrix, we actually use O(|V|^2) space.
• Note that the running times given for the three algorithms above assume that the input graph is represented as an adjacency matrix -- if the graph is instead represented as an adjacency list, the running times of each algorithm increase by a a multiplicative factor of |V|!

## Thursday, April 3

• Final Exam Notes #1
Though Dr. Wang and myself haven't yet made up your final exam, at the request of some of you, I am posting some of my previous exams with answers so you can get an idea of the types of questions I ask:

When I have a better idea what I will be putting on your exam, I will post further notes.

## Monday, April 14

• Final Exam Notes #2
Dr. Wang and myself have finished making up the final exam. This exam will be closed-book and has a total of 120 marks (note that we have tried to make the number of marks for a question roughly equivalent to the number of minutes it should take you to do it). The current layout is as follows:

• Time complexity (24 marks)
• Algorithm design techniques / Graph algorithms (26 marks)
• Algorithm design (35 marks)
• Proving polynomial-time intractability (35 marks)

All is fair game; this includes any algorithm we discussed in class, e.g., the various shortest path algorithms. The questions on the assignments are guides to what I have in mind; you should also find the questions on my previous CS 3711 exams (posted in Final Exam Notes #1 above) of some help.

We hope the above helps, and we wish you all the best of luck with this exam and all of your other final exams.

## References

Unless otherwise stated, all references above are to Cormen et al. (2001).

• Brassard, G. and Bratley, P. (1996) Fundamentals of Algorithmics. Prentice-Hall; Englewood Cliffs, NJ. [Abbreviated above as B&B]