Algorithms Final Review Questions and Solutions The Final is comprehensive, with a focus on the portion after the Midterm, esp. the connections among DP, priority queues, and graph problems (such as shortest-path). You need to understand the graph interpretation of DP (Viterbi and topological sort). You also need to understand BFS/Dijkstra as a best-first order of solving (some) DP problems. There will be two (2) fill-in-the-blanks coding questions: (a) easy: BFS, DFS, or topological sort (either BFS or DFS style). (b) hard: k-best Viterbi (see p2 below), similar to but much simpler than HW10 part 3. HW10 parts 1-2 will be covered in another similar problem (see p11 below). 1. What's the practical relevance of k-best solutions in DP? A: k-best optimization is used in everyday life because you always want alternatives! E.g., Google Map always presents alternative paths (shortest vs. fastest, etc.). A user sometimes prefers to avoid tolls, freeways, ferries, etc., and often his/her objective function is multi-dimensional (time, distance, traffic, etc.) so s/he would benefit from multiple paths. 2. For a DAG, how to get k-shortest paths using Viterbi? (cf. HW10 part 3: k-best) A: Extend the Viterbi algorithm as follows: each vertex stores k-shortest paths from the source. the update operation becomes the "team USA" problem: merging many sorted lists, but only need top k out of them. time: O(E + V k log d_max) where d_max is the max in-degree of any node (i.e., PQ size). You can speed this up by pruning the initial PQ down to size (at most) k. time: O(E + V k log k). Note: this is not the fastest, but very easy to implement. Note: HW10 part 3 is basically replacing the teams problem here by the midterm problem (teams+nbestc). 3. How to decide whether a graph has a *unique* topological order? Hint: a small change in the bottom-up (or top-down) topological sort. A: in bottom-up, in each iteration, before choosing one node from the queue (i.e., zero-in-degree list), check if the queue is a singleton (i.e., a unique class to take). Follow-up: what about top-down? 4. Give a DAG where Dijkstra fails. A: Example (negative edge): V = {s, u, t}; E = {s -> u: 3, s -> t: 2, u -> t: -2} 5. Dijkstra's algorithm doesn't work if there are negative weight edges. One proposal for how to deal with this case is (a) find the most negative edge, say with value -v, (b) add v to the cost of every edge, so that each edge is non-negative, and (c) solve the shortest path algorithm in the resulting graph. Does this work? If not, give a counterexample. A: No; it only works if *all* source-target paths have the same number of edges. See the example from the previous question. 6. (from KT slides) Suppose that you change the length of every edge of G as follows. For which of the following is every shortest path in G still a shortest path? A. Add 17. B. Multiply by 17. C. Either A or B. D. Neither A nor B. A: B only. 7. What's the time complexity of Dijkstra if you use the following implementations of priority queue? (a) binary heap (does not support decrease_key) (b) indexed heap (hash+heap) such as heapdict (supports decrease_key) (c) hash (d) unsorted array A: (a) (E+E) log E (b) (V+E) log V (c) V^2 + E (d) V^2 + EV Note: (c) is faster than (a-b) for dense graphs, but since there are no naturally-occuring big dense graphs, the default implementation is either (a) or (b). Note: (a) is (E+E)logE not (V+E)logE b/c in the worst case you pop E times and only V of them are unique pops. (see Course Notes for Details). 8. (from CLRS, not relevant to the final; you can skip) Dijkstra with integer weights: suppose all edge weights are in {1..W} where W is a positive integer but not a const. Modify priority queue data structure to achieve O(VW+E) time. (hint: no heap). A: Similar to bucket-sort, your priority queue is actually (V-1)W+1 buckets (0, 1, ..., (V-1)W). Vertex with key k is in the bucket k. Scan these buckets from left to right, and pop one vertex from the first non-empty bucket. 9. (from CLRS) Weird shortest path: find the path whose longest edge is the shortest. motivation: I'm driving a small-tank car in rural Oregon where gas stations are sparse. To be safe, I wanna avoid long distances between two gas stations. O(V+E) (just modify Viterbi) O((V+E)logV) (modify Dijkstra => similar to Prim) -- not required for final. Follow-up: How about the path whose shortest edge is the longest? A: O(V+E): from Viterbi solution for longest-path (hw8), change the two operators from (max, sum) to (min, max). (min, max): for each node u in the topological order: take the min among all incoming edges (v, u), but take the max when you extend d(v): c(v,u) v -------> u d(u) = min_{v: (v,u) in E} max(d(v), c(v,u)) compare it with HW8 (viterbi for longest) d(u) = max_{v: (v, u) in E} (d(v) + c(v,u)) O((V+E)logV): in Dijkstra's priority queue, change the key of node v from "(shortest) distance from source to node v" to "(shortest) longest edge from source to node v". i.e., when you pop node u and use it to update node v along edge (u,v): d(v) min= max(d(u), c(u, v)) compare it with HW9 (dijkstra for shortest): d(v) min= d(u) + c(u, v) Recall that "min=" is like "+=", i.e., a min= b means a = min(a, b). For "whose shortest edge is the longest": Viterbi: change to (max, min) Dijkstra: change the edge weight e_ij to max_e - e_ij, then the question is same as "longest edge is the shortest". -- not required for final. 10. Which DP problems in this course can be solved by *both* Viterbi and Dijkstra? A: the coins problems (either minimizing the number of coins of the number of types of coins); Another classical example would be edit-distance (see p14 below). Follow-up: which algorithm is faster for coins? A: For a target amount where the optimal solution is small (like 50, 100 or 101), Dijkstra is faster, because the target node pops early. Actually coins doesn't even need Dijkstra -- a simple BFS is the fastest. BFS can be viewed as simplified Dijkstra where the priority queue is just a normal queue. Dijkstra can be viewed as fancy BFS where the queue becomes priority queue. 11. Given a boolean expression, count the number of parenthesizations that return T. e.g., input: F + F * T output: 0. reason: impossible input: T + F * T output: 2. reason: (T+(F*T)) ((T+F)*T) O(n^3) or better. A: very similar to HW10 part 2 (count total # of structures). Use the gaps between two T/F as index, from 0 to n. true[i][i+1] = 1 if e[i] is T else 0 false[i][i+1] = 1 if e[i] is F else 0 true[i][j] = sum( true[i][k] * true[k][j], if o[k] is *; true[i][k] * true[k][j] + true[i][k] * false[k][j] + false[i][k] * true[k][j], if o[k] is +; ) for i (i-1,j) | \ | 1| \0 or 1 |1 | \ | v \ v (i, j-1) --1-> (i, j) final: opt[n, m] where n = |s| and m = |t| space: O(n*m) time: same graph or hypergraph: graph, find shortest-path from (0,0) to (n, m) Because it's a shortest-path problem with non-negative weights (0 or 1), and because the graph is a DAG, this problem can be solved by both Viterbi or Dijkstra and the latter could be faster if the final edit distance is small. (but not solvable by BFS because edge weights are NOT always 1!) Example: s = a-cdefh |-||+|* t = abcd-fg t > a b c d f g i/j| 0 1 2 3 4 5 6 ------------------------------- s 0 | 0 1 2 3 4 5 6 v | \ +1 a 1 | 1 0---1 2 3 4 5 | \ c 2 | 2 1 1 1 2 3 4 | \ d 3 | 3 2 2 2 1 2 3 | |+1 e 4 | 4 3 3 3 2 2 3 | \ f 5 | 5 4 4 4 3 2 3 | \+1 h 6 | 6 5 5 5 4 3 3 ------------------------------- When edit distance is small (mostly diagonal copying), Dijkstra is faster: (I've implemented both: distance1() is Viterbi, and distance2() is Dijkstra) In [40]: %time distance1("aaaaaaadaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxaaaaaaaaaaaaaaaaaaaaaa") CPU times: user 12.7 ms, sys: 17 us, total: 12.7 ms Wall time: 12 ms Out[40]: 5 In [41]: %time distance2("aaaaaaadaaaaaaaaaaaaaaaaacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaxaaaaaaaaaaaaaaaaaaaaaa") CPU times: user 6.59 ms, sys: 1.96 ms, total: 8.55 ms Wall time: 7.55 ms Out[41]: 5 Otherwise Viterbi is faster. Historical Note: Edit distance is closely related to the famous Smith-Waterman algorithm for (biological) sequence alignment, named after Professor Michael Waterman (1942-), a local Oregonian from Southern Oregon who received his BS (1964) and MS (1966) in Math at OSU and went on to become one of the founders of the field of computational biology. The Human Genome Project was based on his sequencing algorithms. He will be awarded an honorary doctorate degree at OSU Commencement this year (2024). See President Murthy's email on May 2. https://en.wikipedia.org/wiki/Michael_Waterman Follow-up: Prof. Waterman has been a big supporter of OSU, College of Science, and recently also to my group. I congratulated him on May 3, and he was so pleased that he would have the complete cycle of all three degrees (BS '64, MS '66, Doctorate '24) from OSU.