Okay, so let's pick up where we left off with a small tangent.
First things first: I released a new version of paleotree, version 1.3! It has a number of new elements, particularly one item I will talk to you about today: cryptic cladogenesis.
In the last missive, I argued that any simulator of "diversification in the fossil record" had to be enmeshed with some assumptions (i.e. model) of how morphologically distinguishable taxonomic units arise, as these are the basic units that we (paleontologists) can identify, relate and measure. These morphologically-deliminated, sometimes temporally-extensive units can represent something very different than equivalent taxa in evolutionary biology (Forey et al., 2004; Ezard et al., 2012).
To clarify some of my explanations last time, it may be helpful to think of two general classes of events. First, one could have 'anagenesis', where a lineage experiences a morphological change that is geologically-sudden, producing a new 'descendant' morphotaxon distinguishable from the previous 'ancestor' which no longer appears. As we generally define and distinguish taxa based on discrete or meristic characters, like the presence or absence of spines, one can think about these events as the change of one or more such characters. If they ever change again, then that would be another 'anagenetic' event and another new morphotaxon would origination, and so on.
(Note that this is different from how most people define anagenesis! Most of the literature using this term is referring to changes in continuous traits, particularly traits which don't (generally) get used to distinguish taxa. I'm only interested in shifts between recognizable morphotaxa, so I'm limiting my usage of anagenesis to describe that and being totally agnostic to how non-systematically-informative traits vary within lineages.)
Now let's think about how branching events, cladogenesis, comes into this. Let's limit ourselves to only bifurcating events, which produce two daughter lineages. We can contextualize the 'bifurcating and budding cladogenesis' of the last post by considering these as all part of a system of whether morphological change must happen in one, both or neither of the daughter lineages. Like so:
The function simFossilTaxa can simulate all of these and any mixture of these processes, within the (generally assumed) constraint that diversification and the morphological shifts occur as Poisson processes. The big major change I had to do to allow for cryptic cladogenesis in paletree 1.3 was a new column which describes which morphotaxon each lineage would be assigned to (due to being functionally identical).
With this new feature, we can do fun things like simulating only under cryptic cladogenesis and anagenesis. This gives us patterns like these, using a particularly relevant example.
Okay, now, this post is supposed to be about how we can turn simulated data from simFossilTaxa into cladograms and phylogenies, using the functions taxa2cladogram and taxa2phylo. Just how does paleotree do that, really?
Well, check out this figure that I totally wish I had room for putting in my MEE submission on paleotree.
So, let me walk you through this. In (a), we have three morphotaxa,
related to each other by budding cladogenesis and (b), (c) and (d) are
various phylogenetic interpretations of that data.
particular, (b) is the result of transforming such a dataset into an
uncscaled cladogram with taxa2cladogram. This is an unscaled set of
nesting relationships (i.e. clades), containing all the clades that
could be resolved with morphological data, assuming that shifts in
systematic characters can only occur when new morpho-taxa originate.
(This is a pretty good assumption: if we see shifts in systematic
characters within a lineage, we generally start calling the critters a
new name in the fossil record!) The distinctions between morphotaxa are
captured in all that information output by simFossilTaxa.
Note that in this case, you get a polytomy. For cases where there is a
single ancestor, static in systematic characters and multiple
descendants via budding cladogenesis, you get a polytomy, which was
originally shown by Smith (1994) and Wagner and Erwin (1995). This is
true if you sample two descendants and an ancestor or just three
descendants. You can also get it if you have bifurcating cladogenesis
and sample ancestors. You will end up with more than two taxa that
contain no actual synapomorphies, although in practice this would
actually look like either some poorly-supported relationships (on a set
of most parsimonious trees) or a polytomy (on a consensus tree).
been looking at this issue in great detail lately, with respect to
varying how shifts occur under the various models of morphological
differentiation we've discussed and with varying rates of sampling in
the fossil record. I've decided to write this up as a chapter for my
dissertation, so I can't say much at the moment, but the short answer is
that it could be a very serious issue: some simulations have have very
few resolvable clades at realistic sampling parameters.
Now, (b) and (c) are a little more complicated,
representing different ways of translating (a) into time-scaled
phylogenies using taxa2phylo. The first thing to understand is that
there is no such thing as a single time-scaled tree that will describe
the relationships for lineages that span intervals of time. None!
we can do is talk about the relationships about populations at
particular points in time. We might want to do this, for example, if we
want to simulate continuous traits evolving on the tree using any of the
typically used trait simulators in ape or geiger. We would need to pick
a particular 'time' of observation of our simulated morphotaxa in order
to even have a time-scaled description of relationships at those dates.
That's what taxa2phylo does. It constructs the time-scaled tree which
perfectly describes the set of relationships among for particular points
in time within the simulated ranges of taxa. The taxonomic identity of
branches is lost, leaving only the historical patterns of branching that
get us to our points of interest. 'Ancestral' taxa which have multiple
descendants (like taxon A) get chopped up into segments which become separate branches on the resulting output tree.
The figure above shows how different the result can be for
different choices of 'observation times'. For (b), the time of interest
is the first appearance times of the taxa (I call this the observation
times or 'obs times' in the arguments for the function taxa2phylo). For
(c), these are the mid-points of the taxon ranges. By default, the
observation times used in taxa2phylo are the last appearances times
which are not directly figured above but would essentially produce a
tree with the branch lengths and branching events equivalent to the
simulated dataset, except that taxa which went pseudo-extinct (such as
in a bifurcation or anagenesis event) would be attached to the tree as a
tip with a zero-length terminal branch.
taxa2phylo should not be used for any purpose but simulation: it doesn't represent anything but a perfect
representation of the phylogenetic and temporal relationships. In
particular, this is good for simulating datasets in simulators that
require a tree (like rTraitCont) but not for testing whether a
tree-based analysis works. Using the unscaled partially-unresolved
cladograms (from taxa2cladogram) and sampled fossil occurrences, in
particular on discrete interval time-scales, will be a more accurate
description of the type of data recoverable in the fossil record.
Okay, so that's how I can turn simulated fossil records into trees with paleotree! Next post will be about the aforementioned sampling of the fossil record and how paleotree simulates it!