CS 514, Algorithms, Spring 2024 HW10 - Challenge Problem - RNA Structure Prediction (4% + 2% = 6%) This problem combines dynamic programming and priority queues. Due Sat 6/8 @ 11:59pm. <--- special due date/time. No late submission will be accepted. HW10 will be covered in the Final (Mon 6/10 @ 6pm) in two places. Include in your submission: report.txt, rna.py. To submit: flip $ /nfs/farm/classes/eecs/spring2024/cs514/submit 514 hw10 rna.py flip $ /nfs/farm/classes/eecs/spring2024/cs514/submit 514 hw10 report.txt To see your best results so far: flip $ /nfs/farm/classes/eecs/spring2024/cs514/query hw10 Grading: * report.txt -- 1% * 1-best structure -- 2% * number of structures -- 1% * k-best structures -- extra credit 2% Textbooks for References: [1] KT Ch. 6.5 (DP over intervals -- RNA structure) [2] KT slides: DP I (RNA section) http://www.cs.princeton.edu/~wayne/kleinberg-tardos/ [3] slides and visualization: https://ad-publications.cs.uni-freiburg.de/student-projects/rna-google-2/nussinov.pdf [4] video (Prof. Bryce): https://youtu.be/1zBbqtNqNVc [5] my DP slides (starting from page 11): https://classes.engr.oregonstate.edu/eecs/spring2024/cs514/dp-all.pdf ***Please analyze time/space complexities for each problem in report.txt. 1. Given an RNA sequence, such as ACAGU, we can predict its secondary structure by tagging each nucleotide as (, ., or ). Each matching pair of () must be AU, GC, or GU (or their mirror symmetries: UA, CG, UG). We also assume pairs can _not_ cross each other. The following are valid structures for ACAGU: ACAGU ..... ...() ..(.) .(.). (...) ((.)) We want to find the structure with the maximum number of matching pairs. In the above example, the last structure is optimal (2 pairs). >>> best("ACAGU") (2, '((.))') Tie-breaking: arbitrary. Don't worry as long as your structure is one of the correct best structures. some other cases (more cases at the bottom): GCACG (2, '().()') UUCAGGA (3, '(((.)))') GUUAGAGUCU (4, '(.()((.)))') AUAACCUUAUAGGGCUCUG (8, '.(((..)()()((()))))') AACCGCUGUGUCAAGCCCAUCCUGCCUUGUU (11, '(((.(..(.((.)((...().))()))))))') GAUGCCGUGUAGUCCAAAGACUUCACCGUUGG (14, '.()()(()(()())(((.((.)(.))()))))') CAUCGGGGUCUGAGAUGGCCAUGAAGGGCACGUACUGUUU (18, '(()())(((((.)))()(((())(.(.().()()))))))') ACGGCCAGUAAAGGUCAUAUACGCGGAAUGACAGGUCUAUCUAC (19, '.()(((.)(..))(((.()()(())))(((.)((())))))())') AGGCAUCAAACCCUGCAUGGGAGCACCGCCACUGGCGAUUUUGGUA (20, '.(()())...((((()()))((()(.()(((.)))()())))))()') 2. Total number of all possible structures >>> total("ACAGU") 6 3. k-best structures: output the 1-best, 2nd-best, ... kth-best structures. >>> kbest("ACAGU", 3) [(2, '((.))'), (1, '(...)'), (1, '.(.).')] The list must be sorted. Tie-breaking: arbitrary. In case the input k is bigger than the number of possible structures, output all. Sanity check: kbest(s, 1)[0][0] == best(s)[0] for each RNA sequence s. All three functions should be in one file: rna.py. See more testcases at the end. Debriefing (required!): -------------------------- 0. What's your name? 1. Approximately how many hours did you spend on this assignment? 2. Would you rate it as easy, moderate, or difficult? 3. Did you work on it mostly alone, or mostly with other people? 4. How deeply do you feel you understand the material it covers (0%-100%)? 5. Any other comments? This section is intended to help us calibrate the homework assignments. Your answers to this section will *not* affect your grade; however, skipping it will certainly do. TESTCASES: for each sequence s, we list three lines: best(s) total(s) kbest(s, 10) ACAGU (2, '((.))') 6 [(2, '((.))'), (1, '.(.).'), (1, '..(.)'), (1, '...()'), (1, '(...)'), (0, '.....')] ------ AC (0, '..') 1 [(0, '..')] ------ GUAC (2, '(())') 5 [(2, '(())'), (1, '()..'), (1, '.().'), (1, '(..)'), (0, '....')] ------ GCACG (2, '().()') 6 [(2, '().()'), (1, '(..).'), (1, '()...'), (1, '.(..)'), (1, '...()'), (0, '.....')] ------ CCGG (2, '(())') 6 [(2, '(())'), (1, '(.).'), (1, '.().'), (1, '.(.)'), (1, '(..)'), (0, '....')] ------ CCCGGG (3, '((()))') 20 [(3, '((()))'), (2, '((.)).'), (2, '(.()).'), (2, '.(()).'), (2, '.(().)'), (2, '.((.))'), (2, '((.).)'), (2, '(.(.))'), (2, '(.().)'), (2, '((..))')] ------ UUCAGGA (3, '(((.)))') 24 [(3, '(((.)))'), (2, '((.).).'), (2, '((..)).'), (2, '(.(.)).'), (2, '((.))..'), (2, '.((.)).'), (2, '.((.).)'), (2, '.((..))'), (2, '((..).)'), (2, '((.)..)')] ------ AUAACCUA (2, '.((...))') 19 [(2, '((.)..).'), (2, '(()...).'), (2, '()(...).'), (2, '().(..).'), (2, '()....()'), (2, '.()(..).'), (2, '.()...()'), (2, '.(.)..()'), (2, '.((...))'), (2, '.(.(..))')] ------ UUGGACUUG (4, '(()((.)))') 129 [(4, '(())(.)()'), (4, '(()((.)))'), (3, '(().)..()'), (3, '(().).(.)'), (3, '(().)(..)'), (3, '((.))..()'), (3, '((.)).(.)'), (3, '((.))(..)'), (3, '(())(..).'), (3, '(())(.)..')] ------ UUUGGCACUA (4, '(.()()(.))') 179 [(4, '((()).).()'), (4, '((.)()).()'), (4, '(.()()).()'), (4, '.(()()).()'), (4, '.(()()(.))'), (4, '((()).(.))'), (4, '((.)()(.))'), (4, '((()())..)'), (4, '(.()()(.))'), (3, '((()).)...')] ------ GAUGCCGUGUAGUCCAAAGACUUC (11, '(((()()((()(.))))((.))))') 2977987 [(11, '(()())(((()().))(((.))))'), (11, '(()())(((()()).)(((.))))'), (11, '(()())(((()(.)))(((.))))'), (11, '(()()()((()(.)))(((.))))'), (11, '(((()()((()().)))((.))))'), (11, '(((()()((()(.))))((.))))'), (11, '(()()()((()()).)(((.))))'), (11, '(()()()((()().))(((.))))'), (11, '(((()()((()()).))((.))))'), (10, '(()()()((()().).)((.))).')] ------ AGGCAUCAAACCCUGCAUGGGAGCG (10, '.(()())...((((()()))).())') 560580 [(10, '.(()())...((((())())).)()'), (10, '.(()())...((((()()))).)()'), (10, '.(()())...(((()(()))).)()'), (10, '.(()())...(((()(()))).())'), (10, '.(()())...((((())())).())'), (10, '.(()())...((((()()))).())'), (9, '((.).)(...(.((()()))).)()'), (9, '((.).)(...(((.)(()))).)()'), (9, '((.).)(...(.(()(()))).)()'), (9, '((.).)(...((.(()()))).)()')] ------