Thursday, July 26, 2012

Resolving Polytomies According to Temporal Order

Hello everyone!

A user of paleotree recently asked about resolving non-bifurcating nodes according to the order of stratigraphic appearance. If you're familiar with the library ape, you know there's a function called multi2di, which resolves polytomies randomly into bifurcating nodes. But what if we want to include information we have about WHEN taxa show up in the fossil record, assuming that taxa which show up later are more likely to be closely related?

My library paleotree has the Sampling Rate Calibrated time-scaling methods (SRC timescaling), which will resolve polytomies according to a probabilistic approach about gaps in the fossil record. This function can allow for ancestor-descendant relationships, although that aspect can be shut off or minimized to the user's discretion. However, how SRC methods work and whether it works well isn't something that's known to anyone but me, (maybe) my committee and whoever has asked me about it. Also, it needs an estimate of sampling rate, which isn't always obtainable.

So, as another option for resolving polytomies, I've made a new function called timeLadderTree, which takes polytomies and turns them into little pectinate (ladder-like) sub-trees. Each lineage in the pectinate sub-tree is ordered according to the time of first-appearance datum (FAD) for each lineage, for both clades and single taxa. This method of resolving polytomies assumes that the order of stratigraphic appearance perfectly depicts the order of branching. This may not be a good assumption for poorly sampled fossil records, but hey, maybe it's still better than assuming a completely random solution for a polytomy.

I'd paste the code here, but it's really long. If you really want it, email me or you can get this function either from the public paleotree library source file on github, which I'm pushing to as we speak, or just wait until the next paleotree release (sometime in the next month). Also in the newest release, I'll be including this function as an additional option for timePaleoPhy and bin_timePaleoPhy to resolve the tree with. The new argument will be called 'timeres' and setting it to TRUE will cause trees output with those functions to have polytomies resolved according to time of first appearance. (Obviously, this is incompatible with also setting the argument 'randres' for random resolution via multi2di to TRUE and doing such will return a warning).

Okay, so how does it work? Well, let's say we have data and some stratigraphic ranges that look like so...

...And we get a nice pectinate tree.

What that poorly-scrawled M$Paint image means is we need a tree in R (in 'phylo' format) with polytomies and a set of temporal data, in timeData format as is standard for taxon ranges in continuous time in paleotree. timeData matrices look like the following, where row-names are the taxon names.

            FAD         LAD
t1   170.391157 147.8839782
t2   158.519694 152.6571314
t3   149.415686 128.5232837
t4    ...                  ...

Note that this function is for resolving tree when a continuous time-scale is known. For discrete time-scales, where appearance dates are known from intervals rather than from specific points in time, users should use the function bin_timePaleoPhy, which stochastically produces arrangements of dates in discrete intervals in the course of time-scaling. This would be the preferred way of doing things with datasets where taxa are known on discrete time-scales (most data in paleontology, probably...).

You might wonder how this function handles ties. Well, taxa with the same identical first appearance date will be ordered randomly. Thus, the output could be slightly stochastic (it'll differ each time you run it), but this only occurs when taxa descended from the first node have the same first appearance datum in continuous time exist. This is probably uncommon with real data on continuous time-scales. Thus, resolving polytomies based on time-order will probably produce a single same tree each time you use it, unlike multi2di which is guaranteed to not produce the same tree each time. It's a pretty strong assumption though, to make, that order of appearance perfectly predicts branching order, though!

Because simulating clades in paleotree often produces partially unresolved trees (for reasons I explained last time) we can test this function pretty easily.

library(paleotree)
set.seed(444)
taxa<-simFossilTaxa(0.1,0.1,mintaxa=100)
tree<-taxa2cladogram(taxa)
ranges<-sampleRanges(taxa,r=0.5)
tree1<-timeLadderTree(tree,ranges)

And we can see the difference in our two trees, with the second have some very pectinate-looking regions...

layout(1:2)
plot(ladderize(tree),show.tip.label=FALSE)
plot(ladderize(tree1),show.tip.label=FALSE)
(Apologies for all the white-space, I didn't crop this one before pasting it...)

As always, let me know if you have any new ideas for paleotree that you would find useful in your own work!
-Dave