# Assignment 3

Due: 3:30 PM on Tuesday, March 25, 2008

## Question 1

Consider the following problem from molecular biology: RNA molecules are polymers of adenine, cytisine, guanine, and uracil ribonucleotide molecules. Each such RNA molecule can be described by a string over the alphabet {A,C,G,U}. Each RNA molecule has a 2-dimensional secondary structure which is created when pairs of ribonucleotide "bases" at different positions in the RNA molecule join together (hybridize). Ribonucleotide base-pair hybridization obeys the following rules:

• No base can hybridize with itself.
• No base can hybridize with an adjacent base.
• Each base can hybridize with at most one other base.
• Base-pairs are nested and cannot cross, e.g., there cannot be base-pairings of the positions (i,j) and (k,l) such that i less than k less than j less than l.

For example, given the RNA molecule GCAGCAUU, the following are legal secondary structures

and the following are illegal secondary structures

Each possible ribonucleotide base-pair hybridization has an associated free energy. The free energy of a secondary structure for a given RNA molecule is the sum of the free energies of that secondary structure's associated set of base-pair hybridizations. For example, given that G-C pairs have free energy -2, A-U pairs have free energy -1, and all other base-pairs have free energy 1, the free energies of the legal secondary structures given above are -1 and -5, respectively. The secondary structure of an RNA molecule in its natural biochemical environment is hypothesized to be the secondary structure which has the lowest possible free energy; hence, given an RNA molecule, we want one of its associated secondary structures with the lowest possible free energy.

Given an RNA sequence S with n bases, define the subproblem-set F(i,j), 1 less than equal to i less than j less than equal to n, where F(i, j) is the minimum free energy of a secondary structure for substring S[i,j]. The recurrence for computing this quantity is

where BHS(x, y) is the free energy associated with a base-pair of RNA bases x and y.

Write and document a Java program that implements the dynamic programming algorithm sketched above to compute the minimum free energy secondary structure of a given RNA sequence. The name of the RNA sequence file and the name of the base-pair hybridization free-energy file should be command-line arguments to your program. The RNA sequence file consist of a single line giving the sequence. Each base-pair hybridization free-energy file consists of four lines, each of which has four values, describing a 4 x 4 matrix BHS such that BHS(i,j), 1 less than equal to i, j less than equal to 4, gives the free energy associated with the ribonucleotide-pair (i,j) where A = 1, G = 2, C = 3, and U = 4. You may assume that all given data files are formatted correctly. Examples of several program runs relative to the RNA sequence files rseq1.dat, rseq2.dat, and rseq3.dat and base-pair hybridization free-energy files bhs1.dat and bhs2.dat are given in script file R2SF.script. It is these results that your program should be able to replicate. Note that you are only required to given the minimum free energy and secondary-structure base-pair list in your program output; the extra output in the script file is given to clarify the operation of the algorithm.

## Hints

Given the similarity of the recurrence of the Matrix Chain Parenthesization (MCP) problem to that given above, you may find it of use to adapt the MCP backpointer and traceback schemes to recover the minimum-energy RNA secondary structure.

## Questions 2-4

The text of these questions is available in PDF format.

## Assignment Submission

Please hand in printed copies of all of your Java source code files for Question #1 along with your answers for Questions #2-4. You must also submit the files for Question #1 electronically using the submit command. Note that each such file must have the following comment block at the top, where the X's are replaced with the appropriate information. For instance, my code for this assignment would begin with the following comment block:
```/////////////////////////////////////////////////////////////////
//  CS 3719 (Winter 2008), Assignment #3, Question #1          //
//  Program File Name: R2SF.java                               //
//       Student Name: Todd Wareham                            //
//              MUN #: 8008765                                 //
/////////////////////////////////////////////////////////////////
```
You do not have to develop your code on our CS departmental systems. However, as your code will be compiled and tested on our CS departmental systems as part of the assignment marking process, you should ensure that your code compiles and runs correctly on at least one of these systems.