Sunday, December 18, 2011

Model-based Phylogenetics and Morphology

Hello listeners,

(Sorry for the long hiatus; I'm not much for soapboxes.)

A while ago I found myself in discussions relating to how one should do phylogenetics. Now, one could have arguments for many lifetimes on all the details of how to make a tree, many of which I have no opinion on. One that I do feel strongly about is that more people should at least consider using model-based phylogenetics in morphological systematics; this is in contrast to the more classic use of parsimony-based phylogenetics. For those of you unfamiliar with this distinction, all you need to know that there are different ways biologists use to reconstruct phylogenies based on data about the characters that are shared or differ between lineages. Maximum parsimony just tries to find the tree(s) that have the fewest character changes along the branches (the most parsimonious tree; it infers the least complicated scenario of evolution). Model-based methods take some model of how a trait should change over time and calculates the likelihood/posterior probability of the characters being at their observed states, given a phylogenetic hypothesis. This is commonly in either a maximum likelihood or Bayesian analysis and more commonly with molecular character (ex. DNA) than morphological characters (distinct features of bones/shells/etc). Of course, if you're a paleontologist, you probably deal with morphological characters.

Now, I should point out that even the most simple model of evolution for morphological characteristics has only been around in the published literature for about ten years. This model is Lewis's (2001) Mkv model, which is a description of the number of changes we expect to see in all characters where we see any change at all.

Now, I could (and have) given long arguments about why we should use model-based approaches in morphological phylogenetics, even if they are relatively simple models. But I don't really care and anyway its mostly my opinion versus someone else's opinion. So who cares?

I'd rather talk about something with data. A particular point that came up a month or two ago in a discussion, where a friend claimed that although models of morphological phylogenetics existed, no one used them. I thought he was mostly right at the time. Later, I decided to go see just how many papers I could find where a morphological dataset had been analyzed with model-based phylogenetics. The answer? I found about 50 papers. That's way more than I expected. I also happily saw that a number of them applied them to paleontological datasets. Of course, this would be insignificant compared to the number of parsimony-based morphological studies over the past 10 years, which surely in the hundreds, if not a few thousand.

I sent this list of papers that use the Mkv model or a variant for morphological phylogenetics to a few people but recently decided that people might find it useful in general, so here it is below!

Cheers!
-Dave

Morphological Phylogenetic Analyses that Used the Mkv model or a variant:
Ayache, N. C., and T. J. Near. 2009. The Utility of Morphological Data in Resolving Phylogenetic Relationships of Darters as Exemplified with Etheostoma (Teleostei: Percidae). Bulletin of the Peabody Museum of Natural History 50(2):327-346.
Bergmann, P. J., and A. P. Russell. 2007. Systematics and biogeography of the widespread Neotropical gekkonid genus Thecadactylus (Squamata), with the description of a new cryptic species. Zoological Journal of the Linnean Society 149(3):339-370.
Bergsten, J., and K. B. Miller. 2007. Phylogeny of Diving Beetles Reveals a Coevolutionary Arms Race between the Sexes. PLoS ONE 2(6):e522.
Beutel, R. G., F. Friedrich, T. Hörnschemeyer, H. Pohl, F. Hünefeld, F. Beckmann, R. Meier, B. Misof, M. F. Whiting, and L. Vilhelmsen. 2011. Morphological and molecular evidence converge upon a robust phylogeny of the megadiverse Holometabola. Cladistics 27(4):341-355.
Brandley, M. C., and K. d. Queiroz. 2004. Phylogeny, Ecomorphological Evolution, and Historical Biogeography of the Anolis cristatellus Series. Herpetological Monographs 18:90-126.
Bybee, S. M., T. H. Ogden, M. A. Branham, and M. F. Whiting. 2008. Molecules, morphology and fossils: a comprehensive approach to odonate phylogeny and the evolution of the odonate wing. Cladistics 24(4):477-514.
Cabrero-Sanudo, F. J. 2007. The phylogeny of Iberian Aphodiini species (Coleoptera, Scarabaeoidea, Scarabaeidae, Aphodiinae) based on morphology. Systematic Entomology 32(1):156-175.
Cabrero-Sañudo, F.-J., and R. Zardoya. 2004. Phylogenetic relationships of Iberian Aphodiini (Coleoptera: Scarabaeidae) based on morphological and molecular data. Molecular Phylogenetics and Evolution 31(3):1084-1100.
Ceotto, P., and T. Bourgoin. 2008. Insights into the phylogenetic relationships within Cixiidae (Hemiptera: Fulgoromorpha): cladistic analysis of a morphological dataset. Systematic Entomology 33(3):484-500.
Clarke, J. A., and K. M. Middleton. 2008. Mosaicism, Modules, and the Evolution of Birds: Results from a Bayesian Approach to the Study of Morphological Evolution Using Discrete Character Data. Systematic Biology 57(2):185-201.
Druckenmiller, P. S., and A. P. Russell. 2008. A phylogeny of Plesiosauria (Sauropterygia) and its bearing on the systematic status of Leptocleidus Andrews, 1922. Zootaxa 1863:1–120.
Egge, J. J. D., and A. M. Simons. 2009. Molecules, morphology, missing data and the phylogenetic position of a recently extinct madtom catfish (Actinopterygii: Ictaluridae). Zoological Journal of the Linnean Society 155(1):60-75.
Eklöf, J., F. Pleijel, and P. Sundberg. 2007. Phylogeny of benthic Phyllodocidae (Polychaeta) based on morphological and molecular data. Molecular Phylogenetics and Evolution 45(1):261-271.
Feng, C.-M., S. R. Manchester, and Q.-Y. Xiang. 2009. Phylogeny and biogeography of Alangiaceae (Cornales) inferred from DNA sequences, morphology, and fossils. Molecular Phylogenetics and Evolution 51(2):201-214.
Friedrich, F., B. D. Farrell, and R. G. Beutel. 2009. The thoracic morphology of Archostemata and the relationships of the extant suborders of Coleoptera (Hexapoda). Cladistics 25(1):1-37.
Fröbisch, N. B., and R. R. Schoch. 2009. Testing the Impact of Miniaturization on Phylogeny: Paleozoic Dissorophoid Amphibians. Systematic Biology 58(3):312-327.
Gernandt, D. S., S. Magallon, G. Geada Lopez, O. Zeron Flores, A. Willyard, and A. Liston. 2008. Use of Simultaneous Analyses to Guide Fossil-Based Calibrations of Pinaceae Phylogeny. International Journal of Plant Sciences 169(8):1086-1099.
Giusti, F., V. Fiorentino, A. Benocci, and G. Manganelli. 2011. A Survey of Vitrinid Land Snails (Gastropoda: Pulmonata: Limacoidea). Malacologia 53(2):279-363.
Glenner, H., A. J. Hansen, M. V. Sørensen, F. Ronquist, J. P. Huelsenbeck, and E. Willerslev. 2004. Bayesian Inference of the Metazoan Phylogeny: A Combined Molecular and Morphological Approach. Current Biology 14(18):1644-1649.
Heikkilä, M., L. Kaila, M. Mutanen, C. Peña, and N. Wahlberg. 2011. Cretaceous origin and repeated tertiary diversification of the redefined butterflies. Proceedings of the Royal Society B: Biological Sciences.
Hultgren, K. M., and J. E. Duffy. 2011. Multi-Locus Phylogeny of Sponge-Dwelling Snapping Shrimp (Caridea: Alpheidae: Synalpheus) Supports Morphology-Based Species Concepts. Journal of Crustacean Biology 31(2):352-360.
Jenner, R., C. Dhubhghaill, M. Ferla, and M. Wills. 2009. Eumalacostracan phylogeny and total evidence: limitations of the usual suspects. BMC Evolutionary Biology 9(1):21.
Keck, B. P., and T. J. Near. 2008. Assessing phylogenetic resolution among mitochondrial, nuclear, and morphological datasets in Nothonotus darters (Teleostei: Percidae). Molecular Phylogenetics and Evolution 46(2):708-720.
Lee, M. S. Y., and A. B. Camens. 2009. Strong morphological support for the molecular evolutionary tree of placental mammals. Journal of Evolutionary Biology 22(11):2243-2257.
Lee, M. S. Y., A. F. Hugall, R. Lawson, and J. D. Scanlon. 2007. Phylogeny of snakes (Serpentes): combining morphological and molecular data in likelihood, Bayesian and parsimony analyses. Systematics and Biodiversity 5(04):371-389.
Lee, M. S. Y., and T. H. Worthy. In Press. Likelihood reinstates Archaeopteryx as a primitive bird. Biology Letters.
Muller, J., and R. R. Reisz. 2006. The Phylogeny of Early Eureptiles: Comparing Parsimony and Bayesian Approaches in the Investigation of a Basal Fossil Clade. Systematic Biology 55(3):503-511.
Near, T. J. 2009. Conflict and resolution between phylogenies inferred from molecular and phenotypic data sets for hagfish, lampreys, and gnathostomes. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution 312B(7):749-761.
Nylander, J. A. A., F. Ronquist, J. P. Huelsenbeck, and J. Nieves-Aldrey. 2004. Bayesian Phylogenetic Analysis of Combined Data. Systematic Biology 53(1):47-67.
Ogden, T. H., J. L. Gattolliat, M. Sartori, A. H. Staniczek, T. SoldÁN, and M. F. Whiting. 2009. Towards a new paradigm in mayfly phylogeny (Ephemeroptera): combined analysis of morphological and molecular data. Systematic Entomology 34(4):616-634.
Organ, C., C. L. Nunn, Z. Machanda, and R. W. Wrangham. 2011. Phylogenetic rate shifts in feeding time during the evolution of Homo. Proceedings of the National Academy of Sciences 108(35):14555-14559.
Pérez-Losada, M., M. Harp, J. T. Høeg, Y. Achituv, D. Jones, H. Watanabe, and K. A. Crandall. 2008. The tempo and mode of barnacle evolution. Molecular Phylogenetics and Evolution 46(1):328-346.
Pollitt, J. R., R. A. Fortey, and M. A. Wills. 2005. Systematics of the trilobite families Lichidae Hawle & Corda, 1847 and Lichakephalidae Tripp, 1957: The application of bayesian inference to morphological data. Journal of Systematic Palaeontology 3(3):225-241.
Pyron, R. A. 2011. Divergence Time Estimation Using Fossils as Terminal Taxa and the Origins of Lissamphibia. Systematic Biology 60(4):466-481.
Ravara, A., H. Wiklund, M. R. Cunha, and F. Pleijel. 2010. Phylogenetic relationships within Nephtyidae (Polychaeta, Annelida). Zoologica Scripta 39(4):394-405.
Robovský, J., V. ŘIčánková, and J. Zrzavý. 2008. Phylogeny of Arvicolinae (Mammalia, Cricetidae): utility of morphological and molecular data sets in a recently radiating clade. Zoologica Scripta 37(6):571-590.
Schneider, H., A. R. Smith, and K. M. Pryer. 2009. Is Morphology Really at Odds with Molecules in Estimating Fern Phylogeny? Systematic Botany 34(3):455-475.
Schneider, S. A., and J. S. LaPolla. 2011. Systematics of the mealybug tribe Xenococcini (Hemiptera: Coccoidea: Pseudococcidae), with a discussion of trophobiotic associations with Acropyga Roger ants. Systematic Entomology 36(1):57-82.
Shimizu, A., M. Wasbauer, and Y. Takami. 2010. Phylogeny and the evolution of nesting behaviour in the tribe Ageniellini (Insecta: Hymenoptera: Pompilidae). Zoological Journal of the Linnean Society 160(1):88-117.
Sikes, D. S., R. B. Madge, and S. T. Trumbo. 2006. Revision of Nicrophorus in part: new species and inferred phylogeny of the nepalensis-group based on evidence from morphology and mitochondrial DNA (Coleoptera : Silphidae :Â
Nicrophorinae). Invertebrate Systematics 20(3):305-365.
Sikes, D. S., S. M. Vamosi, S. T. Trumbo, M. Ricketts, and C. Venables. 2008. Molecular systematics and biogeography of Nicrophorus in part—The investigator species group (Coleoptera: Silphidae) using mixture model MCMC. Molecular Phylogenetics and Evolution 48(2):646-666.
Snively, E., A. P. Russell, and G. L. Powell. 2004. Evolutionary morphology of the coelurosaurian arctometatarsus: descriptive, morphometric and phylogenetic approaches. Zoological Journal of the Linnean Society 142(4):525-553.
Straka, J., and P. Bogusch. 2007. Phylogeny of the bees of the family Apidae based on larval characters with focus on the origin of cleptoparasitism (Hymenoptera: Apiformes). Systematic Entomology 32(4):700-711.
Tippery, N. P., C. T. Philbrick, C. P. Bove, and D. H. Les. 2011. Systematics and Phylogeny of Neotropical Riverweeds (Podostemaceae: Podostemoideae). Systematic Botany 36(1):105-118.
Torres-Carvajal, O. 2007. Phylogeny and biogeography of a large radiation of Andean lizards (Iguania, Stenocercus). Zoologica Scripta 36(4):311-326.
Voss, R. S., and S. A. Jansa. 2009. Phylogenetic Relationships and Classification of Didelphid Marsupials, an Extant Radiation of New World Metatherian Mammals. Bulletin of the American Museum of Natural History:1-177.
Wahlberg, N., M. F. Braby, A. V. Z. Brower, R. de Jong, M.-M. Lee, S. Nylin, N. E. Pierce, F. A. H. Sperling, R. Vila, A. D. Warren, and E. Zakharov. 2005. Synergistic effects of combining morphological and molecular data in resolving the phylogeny of butterflies and skippers. Proceedings of the Royal Society B: Biological Sciences 272(1572):1577-1586.
Wiens, J. J., C. A. Kuczynski, T. Townsend, T. W. Reeder, D. G. Mulcahy, and J. W. Sites. 2010. Combining Phylogenomics and Fossils in Higher-Level Squamate Reptile Phylogeny: Molecular Data Change the Placement of Fossil Taxa. Systematic Biology 59(6):674-688.
Winterton, S. L., N. B. Hardy, and B. M. Wiegmann. 2010. On wings of lace: phylogeny and Bayesian divergence time estimates of Neuropterida (Insecta) based on morphological and molecular data. Systematic Entomology 35(3):349-378.
Zaldivar-Riverón, A., M. Mori, and D. L. J. Quicke. 2006. Systematics of the cyclostome subfamilies of braconid parasitic wasps (Hymenoptera: Ichneumonoidea): A simultaneous molecular and morphological Bayesian approach. Molecular Phylogenetics and Evolution 38(1):130-145.

Introducing or examining aspects of Mkv:
Lewis, P. O. 2001. A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data. Systematic Biology 50(6):913-925.
Allman, E. S., M. T. Holder, and J. A. Rhodes. 2010. Estimating trees from filtered data: Identifiability of models for morphological phylogenetics. Journal of Theoretical Biology 263(1):108-119.
Springer, M. S., A. Burk-Herrick, R. Meredith, E. Eizirik, E. Teeling, S. J. O'Brien, and W. J. Murphy. 2007. The Adequacy of Morphology for Reconstructing the Early History of Placental Mammals. Systematic Biology 56(4):673-684.

9 comments:

  1. Looks good. Thanks for putting this list together. I would add only that the reason morphological models are overly simplistic at the moment is because of the arbitrariness of labelled states across characters: in DNA, and "A" is always an "A", but in morphology "1"s have no correspondence across characters. This makes it more difficult (impossible at the moment) do use more general transition matrices (unless one is willing to model each character separately).
    Joseph.

    ReplyDelete
  2. Frank Burbrink just pointed out on Facebook that I missed his 2005 paper that used the mkv model:

    Burbrink, F. T. 2005. Inferring the phylogenetic position of Boa constrictor among the Boinae. Molecular Phylogenetics and Evolution 34(1):167-180.

    ReplyDelete
  3. Hi Dave,

    Bit late to the party, but I just found your blog.

    I can see the merits in model-based approaches, but have a query for you. As far as I understand them (which is still very poorly!) the ML/Bayesian approaches essentially work by optimising with respect to rate of change. Does this mean that for the model to work correctly the input to it should include not just the character states of the terminals, but their ages as well? If so, for fossil taxa they wouldn't all be the same (i.e. the tree is non-ultrametric) and so ultimately the use of model-based approaches also raises the whole should-we-use-stratigraphy-to-infer-phylogeny? debate. Is this the case? And if so are the model assumptions violated if we "pretend" the tree is ultrametric when really it isn't?

    ReplyDelete
  4. Hmm, I could be very wrong here, but I believe programs like Mr. Bayes don't try to force the rates of evolution to be as expect on an ultrametric tree, instead simply trying to find the trees with the best supported combination of branch-lengths and character changes across characters (i.e. is it likely that the branch length here is high under a Markov model, given the current topology and observed number of character changes?).

    Of course, this is different if we're talking about programs that time-scale at the same time; this raises even more issues, of course, but see Pyron, 2011, for an interesting study.

    I think Joseph Brown would know better than me here, so I'll ask him if he could spare a moment to weigh in.

    ReplyDelete
  5. Hey Guys.

    No, there are no assumptions whatsoever that taxa are considered contemporaneous (i.e. that the tree is ultrametric). Indeed, all trees inferred are unrooted, so there is no time axis (relative or otherwise) to identify "up" or "down". Put another way, the trees that come out of such an analysis are never ultrametric. [Using Mk in comparative analyses, however, does assume an ultrametric tree, but this goes for all other models as well]. Mk-flavoured models also do not optimize any sort of rate parameter. In these models (which is a generalization of the Jukes-Cantor 1969 model, if you know nucleotide models) assumes that all character transitions occur at the same rate; the actual rate of transitions does not factor in, as all relative character transition rates are just set to 1. So "rate" is not even an inferred parameter. The only parameters involved are 1) the topology, and 2) branch lengths (of which there are 2*N - 3), which are in units of expected (mean) number of substitutions (changes) per character. If you are optimizing anything (i.e. ML), it is the likelihood of the combination of topology and vector of branch lengths. Of course, the assumption of equal transition rates is a valid point to rail against, but that is a little outside of the discussion here.

    As David mentions, it is a different story altogether if you are trying to *infer* a time-calibrated tree; say, inferring a chronogram in BEAST from a mix of fossils and extant taxa. These methods (all Bayesian, that I'm aware of anyway) allow one to input actual fossil ages (or distribution, if ages involve uncertainty) into the model, so that part is taken care of. A much larger concern is using an Mk-flavoured model to estimate time. As mentioned above, because rate and time are confounded, branch lengths are typically given in expected number of changes per character. In order to extract temporal information, assumptions have to be made about rate. I'll refrain from waxing "molecular clock" here, but in these models "time" is constrained through a function of 1) temporal priors/constraints (such that some ages are inaccessible), and 2) assumptions regarding the nature of rate heterogeneity (e.g. rates may be considered autocorrelated, or the breadth of branch-specific rates can be thought to be described by some distribution). [I'll stop there. I have a book chapter describing relaxed clock methods if either of you are interested]. ANYway, I don't think anyone would argue with the first point (constraining "temporal space"). All of the unease regarding clock models is the second point of making assumptions about the nature of rate heterogeneity. I would argue that we have a good sense of the nature of rate heterogeneity when it comes to genetic sequences (or at least we know where the problems are), but morphology is another story. Assuming that morphological evolution follows any sort of clock, however it may be "relaxed" (especially across a set of characters evolving in potentially wildly different trajectories!), is not something that I think anyone would seriously subscribe to. That said, people are indeed starting to use such "morphological clocks" in their work. I would very much like to see some serious simulation work in this area to show that this is even possible.

    ReplyDelete
  6. Thanks Joseph. This makes me happier about using ML/Bayesian methods. Although I still don't know how to actually do this with a morphological dataset. Can you recommend any good primers?

    Graeme

    ReplyDelete
    Replies
    1. Sorry Graeme, no primers that I can think of. The original Lewis paper from 2001 is very well written (I regard Paul as the best communicators in our field). Nylander's 2004 paper in Systematic Biology is one of the first to use Mk to combine genetic and morphological characters. I've been kicking around the idea of writing a "10-year anniversary" review paper summarizing how things have developed since 2001, but it will be hard to find the time to undertake such a task overtop of other responsibilities.

      Delete
    2. No worries. Thanks for your comments - they've been useful!

      Delete

Note: Only a member of this blog may post a comment.