Wednesday, July 30, 2014

Creating pretty plots of trees timescaled with paleotree via strap's geoscalePhylo

For anyone who cares, paleotree was not written for pretty plotting; other things were just more important to me. I don't have a particularly well-honed sense of aesthetics anyway; have you seen my recent Paleobiology paper, A.K.A. the grande parade of boxplots?

Over the years, I've gotten people asking me how they plot their fossil-taxon trees against, for example, a geological time-scale and I haven't been really able to tell these people what to do. But that changed recently, thanks to Mark Bell's and Graeme Lloyd's new package strap.

http://cran.r-project.org/web/packages/strap/

In addition to its own timescaling function and functions for assessing stratigraphic consistency, their package includes the extremely useful function geoscalePhylo which makes use of Mark Bell's geoscale package. This function takes a time-scaled tree and plots it against a geologic scale and, optionally, will display taxon ranges as thickened bars on the terminal branches.

Recently, I got one or two questions about using this function with trees time-scaled using paleotree, and so I decided to post an example using a dataset with a discrete interval range data; i.e. one of those datasets written for bin_timePaleoPhy and associated functions. Its mostly straightforward to do this, particularly as the 'bin_' time-scaling functions additionally outputs the taxon ranges used for a particular tree. we do need to do a minor step where we rename some columns in the range matrix for strap to accept these ranges.

So, here's a short R script that uses the example graptolite dataset from paleotree, as used in the cal3 tutorial from last year (i.e. Bates et al., 2005). See the old blog post for details about constructing the data files.

Now, you can just install strap and paleotree and give this script a whirl...
___________________________________________________________________

library(paleotree)
library(strap)

data(retiolitinae)

#need time-scaled tree
    #randomly draw FADs/LADs from within grapt biozones
timetree<-bin_timePaleoPhy(tree=retioTree,timeList=retioRanges,
    type="basic",add.term=FALSE)

#get the ranges used by the bin_ function
rangesUsed<-timetree$ranges.used
colnames(rangesUsed)<-c("FAD","LAD")

#now plot it
geoscalePhylo(tree=timetree,ages=rangesUsed,ranges=TRUE)




___________________________________________________________________________

Now, this is *much* prettier than anything paleotree makes, but we can do *much* better still.

First, this tree is a single random draw, an insignificant what-if produced by drawing the first and last appearance dates of taxa from the discrete intervals they are known to first and last occur in (here, Silurian graptolite biozones). For plotting purposes this is undesirable because you could remake this plot ten times and get ten slightly different looking plots. A better alternative, one matching plots in the literature, is to display taxa as ranging from the very start of the first interval they are found in to the very end of the last interval they occur in. This can be done easily with bin_timePaleoPhy's argument nonstoch.bin.

Secondly, the simple minimum-node-ages "basic" timescaling method almost always introduces artificial zero-length branches that, when plotted, make dichotomous nodes look confusingly like polytomies. Furthermore, this can obscure any polytomies your phylogeny contained to begin with.  So, for purposes of plotting, we can use one of the simple, arbitrary, post-hoc timescaling methods available in paleotree or strap. These methods rescale branches so there aren't any zero-length branches, but how realistic these various alternatives to "basic" are is another matter (you might want to read my most recent Paleobiology paper). Nevertheless, they are useful for showing relationships, as long as any one looking at it knows the apparent timing of divergences on the figure isn't even a best guess, just a random stab in the dark that's most probably dead wrong.

(Anyone who regularly reads paleontological papers will probably have that understanding.)

Finally, we can make the plot prettier just by playing with the graphical parameters of geoscalePhylo: make the text bigger, widen the branches, adjust the time-scale so we can see the Silurian diversification of the Retiolitinae with respect to the Ordovician and Devonian, etc.

And so...
____________________________________________________________________________

#now let's make it prettier 
    #non-random FADs and LADs
    #also use a 'prettier' time-scaling method

timetree<-bin_timePaleoPhy(tree=retioTree,timeList=retioRanges,
    type="mbl",vartime=1,nonstoch.bin=TRUE,add.term=FALSE)


rangesUsed<-timetree$ranges.used
colnames(rangesUsed)<-c("FAD","LAD")


geoscalePhylo(tree=timetree,ages=rangesUsed,ranges=TRUE,
    cex.tip=0.8,cex.ts=0.55,cex.age=0.5,x.lim=c(413,445),width=2)



_________________________________________________________________

Ta-da! Very pretty!

Enjoy!
-Dave