For example, given the RNA molecule GCAGCAUU, the following are legal secondary structures
and the following are illegal secondary structures
Each possible ribonucleotide base-pair hybridization has an associated free energy. The free energy of a secondary structure for a given RNA molecule is the sum of the free energies of that secondary structure's associated set of base-pair hybridizations. For example, given that G-C pairs have free energy -2, A-U pairs have free energy -1, and all other base-pairs have free energy 1, the free energies of the legal secondary structures given above are -1 and -5, respectively. The secondary structure of an RNA molecule in its natural biochemical environment is hypothesized to be the secondary structure which has the lowest possible free energy; hence, given an RNA molecule, we want one of its associated secondary structures with the lowest possible free energy.
Given an RNA sequence S with n bases, define the subproblem-set F(i,j), 1 less than equal to i less than j less than equal to n, where F(i, j) is the minimum free energy of a secondary structure for substring S[i,j]. The recurrence for computing this quantity is
where BHS(x, y) is the free energy associated with a base-pair of RNA bases x and y.
Write and document a Java program that implements the dynamic programming algorithm sketched above to compute the minimum free energy secondary structure of a given RNA sequence. The name of the RNA sequence file and the name of the base-pair hybridization free-energy file should be command-line arguments to your program. The RNA sequence file consist of a single line giving the sequence. Each base-pair hybridization free-energy file consists of four lines, each of which has four values, describing a 4 x 4 matrix BHS such that BHS(i,j), 1 less than equal to i, j less than equal to 4, gives the free energy associated with the ribonucleotide-pair (i,j) where A = 1, G = 2, C = 3, and U = 4. You may assume that all given data files are formatted correctly. Examples of several program runs relative to the RNA sequence files rseq1.dat, rseq2.dat, and rseq3.dat and base-pair hybridization free-energy files bhs1.dat and bhs2.dat are given in script file R2SF.script. It is these results that your program should be able to replicate. Note that you are only required to given the minimum free energy and secondary-structure base-pair list in your program output; the extra output in the script file is given to clarify the operation of the algorithm.
///////////////////////////////////////////////////////////////// // CS 3719 (Winter 2008), Assignment #3, Question #1 // // Program File Name: R2SF.java // // Student Name: Todd Wareham // // Login Name: harold // // MUN #: 8008765 // /////////////////////////////////////////////////////////////////You do not have to develop your code on our CS departmental systems. However, as your code will be compiled and tested on our CS departmental systems as part of the assignment marking process, you should ensure that your code compiles and runs correctly on at least one of these systems.
Created: February 22, 2008
Last Modified: March 20, 2008