Wednesday, July 30, 2014

Creating pretty plots of trees timescaled with paleotree via strap's geoscalePhylo

For anyone who cares, paleotree was not written for pretty plotting; other things were just more important to me. I don't have a particularly well-honed sense of aesthetics anyway; have you seen my recent Paleobiology paper, A.K.A. the grande parade of boxplots?

Over the years, I've gotten people asking me how they plot their fossil-taxon trees against, for example, a geological time-scale and I haven't been really able to tell these people what to do. But that changed recently, thanks to Mark Bell's and Graeme Lloyd's new package strap.

http://cran.r-project.org/web/packages/strap/

In addition to its own timescaling function and functions for assessing stratigraphic consistency, their package includes the extremely useful function geoscalePhylo which makes use of Mark Bell's geoscale package. This function takes a time-scaled tree and plots it against a geologic scale and, optionally, will display taxon ranges as thickened bars on the terminal branches.

Recently, I got one or two questions about using this function with trees time-scaled using paleotree, and so I decided to post an example using a dataset with a discrete interval range data; i.e. one of those datasets written for bin_timePaleoPhy and associated functions. Its mostly straightforward to do this, particularly as the 'bin_' time-scaling functions additionally outputs the taxon ranges used for a particular tree. we do need to do a minor step where we rename some columns in the range matrix for strap to accept these ranges.

So, here's a short R script that uses the example graptolite dataset from paleotree, as used in the cal3 tutorial from last year (i.e. Bates et al., 2005). See the old blog post for details about constructing the data files.

Now, you can just install strap and paleotree and give this script a whirl...
___________________________________________________________________

library(paleotree)
library(strap)

data(retiolitinae)

#need time-scaled tree
    #randomly draw FADs/LADs from within grapt biozones
timetree<-bin_timePaleoPhy(tree=retioTree,timeList=retioRanges,
    type="basic",add.term=FALSE)

#get the ranges used by the bin_ function
rangesUsed<-timetree$ranges.used
colnames(rangesUsed)<-c("FAD","LAD")

#now plot it
geoscalePhylo(tree=timetree,ages=rangesUsed,ranges=TRUE)




___________________________________________________________________________

Now, this is *much* prettier than anything paleotree makes, but we can do *much* better still.

First, this tree is a single random draw, an insignificant what-if produced by drawing the first and last appearance dates of taxa from the discrete intervals they are known to first and last occur in (here, Silurian graptolite biozones). For plotting purposes this is undesirable because you could remake this plot ten times and get ten slightly different looking plots. A better alternative, one matching plots in the literature, is to display taxa as ranging from the very start of the first interval they are found in to the very end of the last interval they occur in. This can be done easily with bin_timePaleoPhy's argument nonstoch.bin.

Secondly, the simple minimum-node-ages "basic" timescaling method almost always introduces artificial zero-length branches that, when plotted, make dichotomous nodes look confusingly like polytomies. Furthermore, this can obscure any polytomies your phylogeny contained to begin with.  So, for purposes of plotting, we can use one of the simple, arbitrary, post-hoc timescaling methods available in paleotree or strap. These methods rescale branches so there aren't any zero-length branches, but how realistic these various alternatives to "basic" are is another matter (you might want to read my most recent Paleobiology paper). Nevertheless, they are useful for showing relationships, as long as any one looking at it knows the apparent timing of divergences on the figure isn't even a best guess, just a random stab in the dark that's most probably dead wrong.

(Anyone who regularly reads paleontological papers will probably have that understanding.)

Finally, we can make the plot prettier just by playing with the graphical parameters of geoscalePhylo: make the text bigger, widen the branches, adjust the time-scale so we can see the Silurian diversification of the Retiolitinae with respect to the Ordovician and Devonian, etc.

And so...
____________________________________________________________________________

#now let's make it prettier 
    #non-random FADs and LADs
    #also use a 'prettier' time-scaling method

timetree<-bin_timePaleoPhy(tree=retioTree,timeList=retioRanges,
    type="mbl",vartime=1,nonstoch.bin=TRUE,add.term=FALSE)


rangesUsed<-timetree$ranges.used
colnames(rangesUsed)<-c("FAD","LAD")


geoscalePhylo(tree=timetree,ages=rangesUsed,ranges=TRUE,
    cex.tip=0.8,cex.ts=0.55,cex.age=0.5,x.lim=c(413,445),width=2)



_________________________________________________________________

Ta-da! Very pretty!

Enjoy!
-Dave

Wednesday, February 5, 2014

Raising A White Flag, paleotree v2.0 and the Reciprocal Monophyly of Various Hemichordate Groups

Hello all!

We have a lot to talk about today.

Last time, I bemoaned users who were misunderstanding my functions. Well, I have surrendered. timePaleoPhy is now happy to take taxon dates as min-max bounds. In the newest version of paleotree, version 2.0, you can find this functionality in the new argument dateTreatment for timePaleoPhy and other time-scaling functions.

In fact, a lot has changed in paleotree lately. Last night, I uploaded paleotree 2.0 to CRAN, and it should be making its way to your favorite repository as a binary of your choice soon. This, and the previous update, paleotree 1.9, have changed a lot about paleotree!

All these changes are kinda fitting, given that paleotree is now version 2.0, and the turning over of a new version number usually a rapid change in the structure of a package.  However don't really try to read too much into that, its really just a happy accident. There's a lot of very extreme opinions some people have about version numbering and I was a little naive when I started programming. So, way back in January of 2012, I just decided to go with a simple system where every public release except the most minor would be given a new increment of 0.1, and the first public release would be 1.0. I'm sure some people think that's a hideous way to do version numbering, but whatever. It ain't your package, dudes.

As always, you can look at paleotree's CHANGELOG to see what's new, but here's the last two entries for your reading pleasure.

__________________________________________________

Version - 1.9 - 01-02-14
-Changed all functions to individual R script files rather than single master script. Why didn't I do this years ago??!

-All help files converted to an roxygen2 format within function scripts

-At suggestion of Fabricio Villalobos, added option to expandTaxonTree that allows branch lengths to be retained and the added lower-taxa are connected with zero-length branches. Only useful for very specific purposes, please use with caution.

-Changed all lines which checked for class "phylo" of input objects to use is(obj,"phylo") instead of class(obj)=="phylo" per recommendation of Carl Boettiger
-Added new warning line to taxicDivDisc so people who try to pass it matrices with character strings in the matrix will get more helpful warning messages

-Upgraded probAnc and qSProb2comp based on equations in Foote (1996) for all three modes of origination, also fixed some errors in previous versions

-Added new function pqr2Ps which uses Emily King's exact derivation for the joint probability of a clade being (a) going extinct but sampled on an infinite time scale and (b) never going extinct on an infinite time scale

-Removed internal Ps function from cal3, now pqr2Ps which is exported directly to namespace

-Converted to new way to handle likelihood functions in paleotree, moving to a function-as-an-object system like diversitree.

-Following on last point, added new functions for fitting models of duration frequency data, replacing getSampProbDisc and getSampRateCont

-More models to follow in future versions of paleotree: added new function footeValues which will (eventually) support a release of a function that implements Foote's (2001, 2003, 2005) inverse survivorship models

-Modified diversitree's 'constrain' function to make a paleotree version named constrainParPaleo which is both entirely separate and fulfills needs such as constraining many similarly named parameters to a single value

-See new functions listed in ?modelMethods for manipulating functions in the new model format

-Added some not-exported hidden functions for use by the various model-handling functions

-Added a 'terrible idea' function optimPaleo which simplifies using optim with new parameter bounds functions. This function is entirely for pedagogical reasons and may be removed later.

-Added new function horizonSampRate which uses ML estimator from Solow and Smith (1997) for estimating sampling rate from precise durations in continuous time and number of sampled horizons

-Added new function perCapitaRates which estimates the per-capita origination and extinction rates from discrete interval data, following the methods from Foote (2000)


Version - 2.0 - 02-03-14
-Changed parInit to use uniform distribution to randomly draw initial parameter values between bounds, rather than take mid value between bounds

-Changed how time-scaling functions dealt with node.mins argument; can no longer use node.mins with a dataset that has unshared taxa that are to be dropped

-Altered example for use of node.mins in help files for time-scaling functions; thanks John Clarke of Oxford for the heads up! Also other modifications were made to the help descriptions, clarifying that node.mins can be used to constrain the minimum age for the root node.

-Also on a different issue brought to my attention by John Clarke, added an error message to paleotree when 'equal' is attempted by the edges leading to the root are zero-length (because 'equal' cannot run under this situation!)

-Added new function collapseNodes that collapses specific user-defined nodes, either forward or backward

-Made all lines checking for dichotomous trees check both with is.binary.tree() *and* is.rooted()

-Added new function dateNodes which returns the dates of the internal and tip nodes of a phylogeny on an absolute time-scale, with respect to the $root.time element if one exists

-On a trial basis, I have added new function inverseSurv which attempts to replicate the forward and inverse survivorship modeling applied by Foote (2001, 2003, 2005) and is useable with the newly implemented constrainParPaleo framework implemented in the previous version of paleotree. I am not yet convinced this function is a 100% faithful replicate of the original method.
-fixed error in plotTraitgram where if trait data was entered in same order as tree$tip.label, trait data was not resorted prior to running ace

-'equal' method in timePaleoPhy wasn't returning same result as 'equal' method in DatePhylo from Graeme Lloyd's original code. This turned out to be a result of differences in how we ordered nodes: Graeme ordered them by time or distance-from-the-root (using dist.nodes) and I was using node.depth, which counts number of branching events. This choice shouldn't make a considerable difference on the performance of the algorithm, but does produce some differences in the resulting time-scaled trees. For consistency, I have change timePaleoPhy to match Graeme's algorithm.

-altered timePaleoPhy and cal3timePaleoPhy to allow the point date occurrences with the first and second column of timeData interpreted as bounds instead, using the argument dateTreatment="minMax"

-related to above change, the argument rand.obs was removed from timePaleoPhy and cal3TimePaleoPhy as no longer necessary, this functionality is now available via dateTreatment="randObs"

-Although it may be strange to not report a lack of a change, but still have not added the finite time window approach for durationFreq

________________________________________________________

Hahah, and as you might guess from that last one, I've still got some more new things and changes for paleotree in store in the next few months. I included it here because I was actually partway through adding this option and had to undo those additions through commenting, since I wanted to push this version to CRAM (the undoing worked, I think, but I can't be certain, so best to add it to the CHANGELOG). 

As always, let me know what you think of the new paleotree functionality!

So what else?

Well, hemichordate phylogeny has been shaken back and forth a little lately. You might have missed them, so here's a short list:

http://onlinelibrary.wiley.com/doi/10.1002/jez.b.22510/full
http://link.springer.com/article/10.1007/s00114-013-1117-3
http://www.biolbull.org/content/225/3/194.abstract

The Stach article in particular is covered by Cambrian Mammal's blog:
http://cambrianmammal.wordpress.com/2014/01/27/the-use-of-a-larva

The Cannon et al. is really neat: with greater gene and taxon coverage than a few years ago, it looks like the different hemichordate groups really are reciprocally monophyletic and not nested, and maybe even the Rhabdopleura and Cepholdiscus groups are even reciprocally monophyletic. Big implications for what the stem deuterostome looked like... and even bigger implications for the stem graptolite.... if you're into that sort of thing. ;)

Oh, and finally, I'm also a brachiopod worker now! I've begun a remote post-doc with Sandy Carlson at UC Davis. I'm looking forward to doing some neat things with her and the rest of her lab!

-Dave