Evolutionary Tree Reconstruction
Input: An m x n object by
character matrix M.
Output: The tree T with m
leaves that gives the evolutionary history
of speciation events that derives the
m taxa in M from their most
recent common ancestor.
Shortest DNA Sequence Assembly
Input: A set S of m
DNA fragments of maximum length n.
Output: The shortest
DNA sequence r such
that each s in S is a
substring of r.
Optimal Evolutionary Tree under Criterion X
Input: An m x n object-by-character matrix M.
Output: The tree T with m
leaves that has optimal score for criterion
X relative to M.
Note that in the context of computational problems in biology, the optimal solution need not be the right one -- rather, we invoke Occam's Razor / the Principle of Parsimony (albeit in a warped fashion) and consider the optimal solution most likely to be right and hence a good one to investigate first.
In Dr. Marshall's lectures next week, you will see several examples of the first type of problem relative to molecular-sequence data. We will spend most of our remaining lecture-time today looking at examples of the latter two types of problems.
A finer further distinction for pattern-problems is exact vs. approximate pattern matching, i.e., how do you interpret "occurrence" of a pattern in a text?
------------------------- |#####--###-##-####-##--| |--##-####-######-######| -------------------------
------------ ---------|##-######-|######## #########|####-##-##|-------- ------------ ------------ #########|##-######-|-------- ---------|####-##-##|######## ------------
-------------- ######|#-#####--##-|######## ------|####-####-##|-------- --------------
---------- ######|##-##-##|#### ##|#-#####-|####### ----------