Thursday, March 8, 2012

Simulating Budding CladogeneticTrait Change

So, an important realization I just had. I think some of you who read this will find it interesting, so here you go:

Imagine you want to simulate cladogenetic trait evolution, where traits change only when a branching event occurs on a tree. (This is what we would expect if Puncuated Equilibrium is valid). For simplicity, let's say the changes act as under a Brownian motion diffusion model. Let's also assume we're looking at a species-level tree, so the branching events are also speciation events, making this also a 'speciational' model of trait evolution.

Most of the time, this is modeled in the comparative methods literature by taking a phylogeny, setting the branches all equal to one and simulating trait evolution across the branches. However, this implicitly assumes that the trait change is symmetric. What if its asymmetric? And, you might be wondering, what the heck do I mean by that?

Well, check out this figure.So what is going on here? Let's start with the first diagram. The vertical axis is time and the horizontal axis is trait space. If you're a paleontologist, you probably see diagrams like this all the time as stratophenetic hypotheses of the relationships among taxa in a group. This describes a general pattern observed in some groups in the fossil record, where we have morphotaxa that are (more or less) morphologically static, with some separation between them. The model of speciation inferred is referred to as budding cladogenesis (e.g. Foote, 1996). When speciation/branching occurs, the 'ancestral' taxon doesn't change at all, while there is considerable change leading to the daughter taxon.

Of course, we don't really know that the (generally unobserved) changes were actually that sudden and happened exactly at the same time that lineage branching (cladogenesis/speciation), occurred, because the fossil record is under-sampled. The important thing is that budding cladogenesis is a possible interpretation of how morphotaxa are related in the fossil record.

In this example, which is meant to represent a simulated dataset of evolution under budding cladogenesis, we have a dataset of four taxa, where A is ancestral to taxa B, C and D. A remained static through the cladogenesis events that led to those four taxa. The second diagram is a time-scaled phylogeny describing the relationships among the populations present at the first occurrence of these four taxa. (If the taxa are static in a model of budding-cladogenesis, then simulating things to their first appearance datum should be okay.)

The problem with the budding cladogenesis model and simulating trait change in this scenario is that its asymmetric: only one of the lineages sees any change at speciation events. I realized this earlier this week when I was using the function simFossilTaxa and taxa2phylo from my paleotree library. taxa2phylo converts taxon ranges into a phylogeny, as depicted by the second diagram above. I was simulating clades under budding cladogenesis (as above), converting the datasets to phylogenies, then simulating trait evolution using standard Brownian Motion (BM) simulations on phylogenies (like rtraitcont or fastBM in the ape and phytools packages, respectively). To simulate cladogenetic trait change, I set all the branches in the second diagram to 1 and simulated BM.

But this was wrong! If I did this for the example above, the trait value of taxa B would be just a function of the trait value of A above and a single speciation event. That's what's we want (look at the first diagram). But for C and D, their trait values also become a function of trait evolution which occurred at previous nodes in the tree, when their trait values should actually be independent of those events: i.e. the values of C and D should simulated just like B, only being a function of their ancestor (A) and single trait value change.

I have another function in paleotree which translates a 'taxa' matrix from simFossilTaxa into a cladogram of nested relationships (assuming that systematic characters don't vary over morphotaxon ranges). However, this is just as useless in this situation, as there are no nested relationships in the four taxa above, so they just form one big polytomy. If you try simulating BM on that using standard methods, the trait values of all the taxa will equally depend on some ancestor (which is not inferred to be A). Yeesh!

Note that if the changes were symmetric in the model (as in a bifurcating cladogenesis model) than all would be fine and dandy: species traits always shift after speciation; there is none of this static ancestor stuff. But for the asymmetric case, you need to have information about which branch experienced the change. One could always just randomly pick one, if all a user had was a topology, but that isn't the case here. I'd like the changes to match to the budding pattern in the simulation I've already done! (Who knows how sampling of morphotaxa in time could affect these things...)

Instea,d we'll need a specialized function that simulates trait evolution using the taxa matrix itself, since that's the only place where we have the data on the ancestor-descendant relationships and the stasis of particular lineages. Using Liam Revell's excellent description (also here and here) of how he constructed fastBM for his phytools package, I was able to cobble together a little function for simulating cladogenetic trait evolution on the output from simFossilTaxa. Here it is (this will be included in the next release of paleotree):

#simulate speciational trait evolution for datasets from simFossilTaxa
#idiot proofing
if(any(taxa[-1,2]>=taxa[-1,1])){stop("Ancestors have higher IDs than Descendants?")}
anctaxa<-sapply(taxa[-1,2],function(x) which(x==taxa[,1]))
for(i in 2:nrow(taxa)){

We can use plotTraitgram to see what the evolutionary pattern would like, under a model where we infer ancestral traits via time-dependent BM (so not cladogenetic/speciational). Obviously, that ain't true for our simulated data, but we wouldn't know that a priori for most datasets so let's do it anyway.


And we get: