Thursday, March 8, 2012

Simulating Budding CladogeneticTrait Change

So, an important realization I just had. I think some of you who read this will find it interesting, so here you go:

Imagine you want to simulate cladogenetic trait evolution, where traits change only when a branching event occurs on a tree. (This is what we would expect if Puncuated Equilibrium is valid). For simplicity, let's say the changes act as under a Brownian motion diffusion model. Let's also assume we're looking at a species-level tree, so the branching events are also speciation events, making this also a 'speciational' model of trait evolution.

Most of the time, this is modeled in the comparative methods literature by taking a phylogeny, setting the branches all equal to one and simulating trait evolution across the branches. However, this implicitly assumes that the trait change is symmetric. What if its asymmetric? And, you might be wondering, what the heck do I mean by that?

Well, check out this figure.So what is going on here? Let's start with the first diagram. The vertical axis is time and the horizontal axis is trait space. If you're a paleontologist, you probably see diagrams like this all the time as stratophenetic hypotheses of the relationships among taxa in a group. This describes a general pattern observed in some groups in the fossil record, where we have morphotaxa that are (more or less) morphologically static, with some separation between them. The model of speciation inferred is referred to as budding cladogenesis (e.g. Foote, 1996). When speciation/branching occurs, the 'ancestral' taxon doesn't change at all, while there is considerable change leading to the daughter taxon.

Of course, we don't really know that the (generally unobserved) changes were actually that sudden and happened exactly at the same time that lineage branching (cladogenesis/speciation), occurred, because the fossil record is under-sampled. The important thing is that budding cladogenesis is a possible interpretation of how morphotaxa are related in the fossil record.

In this example, which is meant to represent a simulated dataset of evolution under budding cladogenesis, we have a dataset of four taxa, where A is ancestral to taxa B, C and D. A remained static through the cladogenesis events that led to those four taxa. The second diagram is a time-scaled phylogeny describing the relationships among the populations present at the first occurrence of these four taxa. (If the taxa are static in a model of budding-cladogenesis, then simulating things to their first appearance datum should be okay.)

The problem with the budding cladogenesis model and simulating trait change in this scenario is that its asymmetric: only one of the lineages sees any change at speciation events. I realized this earlier this week when I was using the function simFossilTaxa and taxa2phylo from my paleotree library. taxa2phylo converts taxon ranges into a phylogeny, as depicted by the second diagram above. I was simulating clades under budding cladogenesis (as above), converting the datasets to phylogenies, then simulating trait evolution using standard Brownian Motion (BM) simulations on phylogenies (like rtraitcont or fastBM in the ape and phytools packages, respectively). To simulate cladogenetic trait change, I set all the branches in the second diagram to 1 and simulated BM.

But this was wrong! If I did this for the example above, the trait value of taxa B would be just a function of the trait value of A above and a single speciation event. That's what's we want (look at the first diagram). But for C and D, their trait values also become a function of trait evolution which occurred at previous nodes in the tree, when their trait values should actually be independent of those events: i.e. the values of C and D should simulated just like B, only being a function of their ancestor (A) and single trait value change.

I have another function in paleotree which translates a 'taxa' matrix from simFossilTaxa into a cladogram of nested relationships (assuming that systematic characters don't vary over morphotaxon ranges). However, this is just as useless in this situation, as there are no nested relationships in the four taxa above, so they just form one big polytomy. If you try simulating BM on that using standard methods, the trait values of all the taxa will equally depend on some ancestor (which is not inferred to be A). Yeesh!

Note that if the changes were symmetric in the model (as in a bifurcating cladogenesis model) than all would be fine and dandy: species traits always shift after speciation; there is none of this static ancestor stuff. But for the asymmetric case, you need to have information about which branch experienced the change. One could always just randomly pick one, if all a user had was a topology, but that isn't the case here. I'd like the changes to match to the budding pattern in the simulation I've already done! (Who knows how sampling of morphotaxa in time could affect these things...)

Instea,d we'll need a specialized function that simulates trait evolution using the taxa matrix itself, since that's the only place where we have the data on the ancestor-descendant relationships and the stasis of particular lineages. Using Liam Revell's excellent description (also here and here) of how he constructed fastBM for his phytools package, I was able to cobble together a little function for simulating cladogenetic trait evolution on the output from simFossilTaxa. Here it is (this will be included in the next release of paleotree):

cladogeneticTraitCont<-function(taxa,rate=1,meanChange=0,rootTrait=0){
#simulate speciational trait evolution for datasets from simFossilTaxa
#idiot proofing
taxa<-taxa[order(taxa[,1]),]
if(any(taxa[-1,2]>=taxa[-1,1])){stop("Ancestors have higher IDs than Descendants?")}
anctaxa<-sapply(taxa[-1,2],function(x) which(x==taxa[,1]))
traits<-rootTrait
for(i in 2:nrow(taxa)){
traits[i]<-traits[anctaxa[i-1]]+rnorm(1,mean=meanChange,sd=sqrt(rate))
}
names(traits)<-paste("t",taxa[,1],sep="")
return(traits)
}

We can use plotTraitgram to see what the evolutionary pattern would like, under a model where we infer ancestral traits via time-dependent BM (so not cladogenetic/speciational). Obviously, that ain't true for our simulated data, but we wouldn't know that a priori for most datasets so let's do it anyway.

library(paleotree)
set.seed(444)
taxa<-simFossilTaxa(0.1,0.1,mintaxa=100,plot=T)
trait<-cladogeneticTraitCont(taxa)
tree<-taxa2phylo(taxa)
plotTraitgram(trait,tree,conf.int=F)

And we get:

Tuesday, March 6, 2012

Presenting: paleotree v1.2

Hello all,
After more than month of having my library up on CRAN and finally feeling satisfied that I've worked out enough of the bugs, I'd like to introduce you to my R library, paleotree! It does many things, but if you like phylogenetics, fossils and particularly the overlap of those two, you may find something in interest! With paleotree, you can time-scale cladograms of fossil taxa, estimate sampling rates from the ranges of fossil taxa, plot diversity curves and simulate diversification and sampling in the fossil record. So, go download version 1.2 and check it out! The entire library will be described in a paper I submitted a week and a half ago to Methods in Ecology and Evolution and apparently is now in review. I'll add a citation file to paleotree when that gets more definite. Until then, cite Bapst, D. (in review)!

http://cran.r-project.org/web/packages/paleotree/index.html

Now, I would like to note that some of the included functions, such as the sampling-rate conditioned time-scaling methods, are not described yet in the primary literature. The methods are included in the current release and you can use them if you wish, but Buyer Beware and all that. The algorithm behind these methods will be described in more detail in a future. For the moment, I would actually suggest you use the more typical time-scaling methods, applicable with timePaleoPhy and bin_timePaleoPhy.

Let me know if you run into any bugs! You can find my email address in the help files for R if you don't know it already. Also, I have tried to be very meticulous with regards to the help files and tried to make them as clear as possible, but I am sure my general muddled-ness will show through. So, if you don't understand anything, let me know so I can clarify the help files in the next release!

So far, in terms of potential errors and bugs in version 1.2, I have discovered that simFossilTaxa can enter an infinite loop under some parameter choices if you try a pure-birth model. I don't know why you'd want to do a pure-birth simulation of the fossil record, but this will be fixed in the next release. Then you can simulate your extinction-less fossil record to your heart's content!

Finally, let me know if you get any good ideas of what I could add that would make analyses in phylogenetic paleobiology easier for you!

Cheers,
-Dave B.