ON THE COMPUTATIONAL COMPLEXITY OF INFERRING EVOLUTIONARY TREES Harold Todd Wareham ABSTRACT The process of reconstructing evolutionary trees can be viewed formally as an optimization problem. Recently, decision problems associated with the most commonly used approaches to reconstructing such trees have been shown to be NP-complete [Day87,DJS86,DS86,DS87,GF82,Kri88,KM86]. In this thesis, a framework is established that incorporates all such problems studied to date. Within this framework, the NP-completeness results for decision problems are extended by applying theorems from [CT91,Gas86,GKR92,JVV86,KST89,Kre88,Sel91] to derive bounds on the computational complexity of several functions associated with each of these problems, namely # evaluation functions, which return the cost of the optimal tree(s), # solution functions, which return an optimal tree, # spanning functions, which return the number of optimal trees, # enumeration functions, which systematically enumerate all optimal trees, and # random-selection functions, which return a randomly- selected member of the set of optimal trees. Where applicable, bounds are also presented for the versions of these functions that are restricted to trees of a given cost or of cost less than or greater than a given limit. Based in part on these results and theorems from [BH90,GJ79,KMB81,Kre88], bounds are derived on how closely polynomial-time algorithms can approximate optimal trees. In particular, it is shown using the recent results of [ALMSS92] that no phylogenetic inference optimal-cost solution problem examined in this thesis has a polynomial-time approximation scheme unless P = NP. REFERENCES [ALMSS92] Arora, S., Lund, C., Motwani, R., Sudan, M., and Szegedy, M. "Proof Verification and Intractability of Approximation Problems". Manuscript, 1992. [BH90] Boppana, R. and Halldorsson, M.M. "Approximating Maximum Independent Sets by Excluding Subgraphs". In J.R. Gilbert and R. Karlsson (eds.) SWAT 90. Lecture Notes in Computer Science no. 447, Springer-Verlag, Berlin, 1990. 13-25. [CT91] Chen, Z-.Z. and Toda, S. "On the Complexity of Computing Optimal Solutions". Manuscript, 1991. [Day87] Day, W.H.E. "Computational Complexity of Inferring Phylogenies from Dissimilarity Matrices". Bulletin of Mathematical Biology, 49(4), 461-467, 1987. [DJS86] Day, W.H.E., Johnson, D.S., and Sankoff, D. "The Computational Complexity of Inferring Rooted Phylogenies by Parsimony". Mathematical Biosciences, 81, 33-42, 1986. [DS86] Day, W.H.E. and Sankoff, D. "Computational Complexity of Inferring Phylogenies by Compatibility". Systematic Zoology, 35(2), 224-229, 1986. [DS87] Day, W.H.E. and Sankoff, D. "Computational Complexity of Inferring Phylogenies from Chromosome Inversion Data". Journal of Theoretical Biology, 124, 213-218, 1987. [GJ79] Garey, M.R. and Johnson, D.S. COMPUTERS AND INTRACTABILITY. W. H. Freeman, New York, 1979. [Gas86] Gasarch, W.I. "The Complexity of Optimization Functions". Technical Report no. 1652, University of Maryland, Department of Computer Science, 1986. [GKR92] Gasarch, W.I., Krentel, M.W., and Rappoport, K.J. "OptP as the Normal Behavior of NP-Complete Problems". Manuscript, 1992. To appear in Mathematical Systems Theory. [GF82] Graham, R.L. and Foulds, L.R. "Unlikelihood That Minimal Phylogenies for a Realistic Biological Study Can Be Constructed in Reasonable Computational Time". Mathematical Biosciences, 60, 133-142, 1982. [JVV86] Jerrum, M.R., Valiant, L.G., and Vazirani, V.V. "Random Generation of Combinatorial Structures from a Uniform Distribution". Theoretical Computer Science, 43, 169-188, 1986. [KST89] Kobler, J., Schoning, U., and Toran, J. "On Counting and Approximation". Acta Informatica, 26, 363-379, 1989. [KMB81] Kou, L., Markowsky, G., and Berman, L. "A Fast Algorithm for Steiner Trees". Acta Informatica, 15, 141-145, 1981. [Kre88] Krentel, M.W. "The Complexity of Optimization Problems". Journal of Computer and System Sciences, 36(3), 490-509, 1988. [Kri88] Krivanek, M. "The Complexity of Ultrametric Partitions on Graphs". Information Processing Letters, 27, 265-270, 1988. [KM86] Krivanek, M. and Moravek, J. "NP-Hard Problems in Hierarchical Tree Clustering". Acta Informatica, 23, 311-323, 1986. [Sel91] Selman, A.L. "A Taxonomy of Complexity Classes of Functions". Technical report, State University of New York at Buffalo, Department of Computer Science, 1991. To appear in Journal of Computer and System Sciences. An abbreviated version appeared in Bulletin of the European Association for Theoretical Computer Science, 45, 114-130, 1991.