Nemagraptus gracilis

Ah, thank you Graeme! I've updated the blog po...

2013-02-22T10:47:59.594-08:00

Ah, thank you Graeme! I've updated the blog post above. I haven't been following your website over the last few years and last time I checked it was all dinosaurs, so I didn't think to mention it. It's really great you expanded to cover as much diversity as possible.

So, why is it many invert datasets are listed without links to data? Does that mean you have it in some complicated unfriendly form and users should contact you? Or does it just mean you know that that paper contains a cladistic analysis?

Can I also suggest my list!: http://www.graemetllo...

2013-02-22T10:30:14.718-08:00

Can I also suggest my list!: http://www.graemetlloyd.com/matr.html

I had seen that paper, but I hadn't read it as...

2012-08-05T05:30:26.976-07:00

I had seen that paper, but I hadn't read it as I am kind of blaise about stratigraphic consistency (...I'd rather deal with likelihoods). But I notice their point about polytomies and strict consensus trees: I'll have to print this out and bring it to South Dakota with me.

Thanks Graeme!
-Dave

Hi Dave, Boy, the ancestry thing really affects e...

2012-08-05T00:41:25.361-07:00

Hi Dave,

Boy, the ancestry thing really affects everything...

BTW, have you seen this paper:

http://onlinelibrary.wiley.com/doi/10.1111/j.1096-0031.2010.00320.x/abstract

Graeme

Graeme- As for the first thing, you're correc...

2012-07-27T09:45:21.512-07:00

Graeme-

As for the first thing, you're correct: a small number of polytomies are all that's necessary to completely overload the total number of possible bifurcating solutions. If we just want a bunch of possible bifurcating solutions (without being complete), that's basically what multi2di does: it randomly resolves polytomies and repeated use will produce a large sample of possible resolutions. Also check out allTrees in the phangorn package.

The number of possible bifurcations has a known equation. I think its in Felsenstein's book.

Actually, I'm not certain using MPTs really is the best idea. I have much more to say on this subject than I can say here, but consider a simple case where you really do have several taxa which are descended from a single relatively static ancestor. The result of this in a cladistic analysis would be a polytomy with very poor support for any particular solution, unless there is considerable homoplasy. Using only the MPTs and not the consensus would get rid of the information that the relationships among these taxa are contentious. My SRC methods can actually consider the possibility of this multi-budding scenario if the taxa are input as a polytomy, while using the MPTs wouldn't allow this possibility. Of course, there are also cases where using the MPTs are the best idea. It just isn't clear to me that the case where polytomies are informative are really that uncommon.

The most optimal solution would be to consider support from shared characters and from time-scaling under a model that considers sampling rate intensity, but we're pretty far from that solution at the moment...

Hi Dave, I can see how this is useful. Although d...

2012-07-26T15:51:44.614-07:00

Hi Dave,

I can see how this is useful. Although depending on what you are doing, "biasing" your tree towards something that best fits stratigraphy might not be ideal (e.g. it artificially increases "fit" to stratigraphy, and favours shorter branch lengths for PCM).

What I would actually like (although not enough to spend the time coding it!) is a function that produces all possible random resolutions of a tree with polytom(ies), as a means of constraining an analysis to include all possible bifurcating trees. Of course, in practical terms this may be a vast number, but in theory a short function that simply works out what this number is might be a nice preliminary.

Another thought whilst reading this is that in practical terms polytomies tend to be the result of using a consensus tree. In such cases I wouldn't want to use all random resolutions, nor a temporally scaled resolution, as these would likely contain suboptimal trees (i.e. those with a longer length than the shortest tree under maximum parsimony). In other words, the most parsimonious trees that form a strict consensus are often a smaller set than the number of all possible bifurcating solutions for their consensus. (If the number of possible bifurcations could be calculated - see above - I suspect I could show this with a nice graph using my set of dinosaur analyses.) Of course, when multiple trees are known there is no need for a function to deal with this: just use the most parsimonious trees and not the strict consensus. (This seems obvious to me, but I have seen cases of people randomly resolving a consensus tree to get bifurcating solutions instead of just using the MPTs that ultimately created it!)

Good stuff though!

Graeme

By the way, Schenck, I don't know if you have ...

2012-05-01T08:20:00.680-07:00

By the way, Schenck, I don't know if you have Google+, but I have a 'science' circle I constantly upload new paleo-tree-diversification-plankton-etc stuff to, if you're interested. I understand that some find it useful.

Thanks for the explanation and references!

2012-04-30T15:25:21.750-07:00

Thanks for the explanation and references!

Schenck, glad to hear you've been enjoying pal...

2012-04-30T08:19:52.266-07:00

Schenck, glad to hear you've been enjoying paleotree.

The short answer is no; the purpose of taxa2phylo and its argument 'obs_time' is how we control the overall phylogenetic structure we pull out of a given set of branching hypotheses. The results can differ quite a bit, as figures (c) and (d) show above. The important question is what instants in time each morphologically-distinguishable 'chronospecies' is being observed at?

Give taxa2phylo the answer to that question and it will build the tree. Because in general we're interested in making non-ultrametric trees for simulated fossil data, those 'instants' will all be different. For example, by default, taxa2phylo places the 'obs_times' at the last appearance times.

Think of it like an orchard: each tree can be shaped very differently, despite representing relationships among the same set of taxa.

Time-slicing doesn't really change the structure of the tree. That's like we take a chainsaw to the tree and slice off all the branches of a given height. (Nick Matzke has written a similar function which is actually named 'chainsaw', I think.) This can be really useful, for example, for taking a phylogenetic dataset of fossil taxa and asking 'if I went back to 70 MY ago and made a molecular phylogeny of the ceratopsians alive at that moment, what sort of speciation and extinction rates would I estimate with methods borrowed from evolutionary biology?'

Note that there are many ways of "time-slicing" a tree, by the way. For reasons too complex to go into here, I prefer the way timeSliceTree works, where a time-scaled tree is taken and sliced at a particular instant in time. But there are other ways: check out Ruta et al. (2008? i think) and Tarver and Donoghue (2011).

Great post! I've been using paleotree a bit an...

2012-04-29T15:28:19.774-07:00

Great post! I've been using paleotree a bit and find it fascinating, thanks for putting this package out there!
When you talk about there not being any 1 tree that represents lineages the span intervals of time, is that why you have timeSliceTree, so we can look at the tree at various 'instant' points in time?

Nice. Every R developer should do this. Keep '...

2012-04-09T08:46:24.912-07:00

Nice. Every R developer should do this. Keep 'em coming.

No worries. Thanks for your comments - they've...

2012-03-26T02:35:02.517-07:00

No worries. Thanks for your comments - they've been useful!

Sorry Graeme, no primers that I can think of. The ...

2012-03-12T11:34:18.623-07:00

Sorry Graeme, no primers that I can think of. The original Lewis paper from 2001 is very well written (I regard Paul as the best communicators in our field). Nylander's 2004 paper in Systematic Biology is one of the first to use Mk to combine genetic and morphological characters. I've been kicking around the idea of writing a "10-year anniversary" review paper summarizing how things have developed since 2001, but it will be hard to find the time to undertake such a task overtop of other responsibilities.

Thanks Joseph. This makes me happier about using M...

2012-03-08T15:19:22.706-08:00

Thanks Joseph. This makes me happier about using ML/Bayesian methods. Although I still don't know how to actually do this with a morphological dataset. Can you recommend any good primers?

Graeme

Hey Guys. No, there are no assumptions whatsoever...

2012-03-07T09:57:02.373-08:00

Hey Guys.

No, there are no assumptions whatsoever that taxa are considered contemporaneous (i.e. that the tree is ultrametric). Indeed, all trees inferred are unrooted, so there is no time axis (relative or otherwise) to identify "up" or "down". Put another way, the trees that come out of such an analysis are never ultrametric. [Using Mk in comparative analyses, however, does assume an ultrametric tree, but this goes for all other models as well]. Mk-flavoured models also do not optimize any sort of rate parameter. In these models (which is a generalization of the Jukes-Cantor 1969 model, if you know nucleotide models) assumes that all character transitions occur at the same rate; the actual rate of transitions does not factor in, as all relative character transition rates are just set to 1. So "rate" is not even an inferred parameter. The only parameters involved are 1) the topology, and 2) branch lengths (of which there are 2*N - 3), which are in units of expected (mean) number of substitutions (changes) per character. If you are optimizing anything (i.e. ML), it is the likelihood of the combination of topology and vector of branch lengths. Of course, the assumption of equal transition rates is a valid point to rail against, but that is a little outside of the discussion here.

As David mentions, it is a different story altogether if you are trying to *infer* a time-calibrated tree; say, inferring a chronogram in BEAST from a mix of fossils and extant taxa. These methods (all Bayesian, that I'm aware of anyway) allow one to input actual fossil ages (or distribution, if ages involve uncertainty) into the model, so that part is taken care of. A much larger concern is using an Mk-flavoured model to estimate time. As mentioned above, because rate and time are confounded, branch lengths are typically given in expected number of changes per character. In order to extract temporal information, assumptions have to be made about rate. I'll refrain from waxing "molecular clock" here, but in these models "time" is constrained through a function of 1) temporal priors/constraints (such that some ages are inaccessible), and 2) assumptions regarding the nature of rate heterogeneity (e.g. rates may be considered autocorrelated, or the breadth of branch-specific rates can be thought to be described by some distribution). [I'll stop there. I have a book chapter describing relaxed clock methods if either of you are interested]. ANYway, I don't think anyone would argue with the first point (constraining "temporal space"). All of the unease regarding clock models is the second point of making assumptions about the nature of rate heterogeneity. I would argue that we have a good sense of the nature of rate heterogeneity when it comes to genetic sequences (or at least we know where the problems are), but morphology is another story. Assuming that morphological evolution follows any sort of clock, however it may be "relaxed" (especially across a set of characters evolving in potentially wildly different trajectories!), is not something that I think anyone would seriously subscribe to. That said, people are indeed starting to use such "morphological clocks" in their work. I would very much like to see some serious simulation work in this area to show that this is even possible.

Hmm, I could be very wrong here, but I believe pro...

2012-03-06T15:22:15.398-08:00

Hmm, I could be very wrong here, but I believe programs like Mr. Bayes don't try to force the rates of evolution to be as expect on an ultrametric tree, instead simply trying to find the trees with the best supported combination of branch-lengths and character changes across characters (i.e. is it likely that the branch length here is high under a Markov model, given the current topology and observed number of character changes?).

Of course, this is different if we're talking about programs that time-scale at the same time; this raises even more issues, of course, but see Pyron, 2011, for an interesting study.

I think Joseph Brown would know better than me here, so I'll ask him if he could spare a moment to weigh in.

Hi Dave, Bit late to the party, but I just found ...

2012-03-06T15:05:39.946-08:00

Hi Dave,

Bit late to the party, but I just found your blog.

I can see the merits in model-based approaches, but have a query for you. As far as I understand them (which is still very poorly!) the ML/Bayesian approaches essentially work by optimising with respect to rate of change. Does this mean that for the model to work correctly the input to it should include not just the character states of the terminals, but their ages as well? If so, for fossil taxa they wouldn't all be the same (i.e. the tree is non-ultrametric) and so ultimately the use of model-based approaches also raises the whole should-we-use-stratigraphy-to-infer-phylogeny? debate. Is this the case? And if so are the model assumptions violated if we "pretend" the tree is ultrametric when really it isn't?

this is great, Dave.

2011-12-18T22:40:24.423-08:00

this is great, Dave.

Frank Burbrink just pointed out on Facebook that I...

2011-12-18T16:30:47.414-08:00

Frank Burbrink just pointed out on Facebook that I missed his 2005 paper that used the mkv model:

Burbrink, F. T. 2005. Inferring the phylogenetic position of Boa constrictor among the Boinae. Molecular Phylogenetics and Evolution 34(1):167-180.

Looks good. Thanks for putting this list together....

2011-12-18T15:22:58.867-08:00

Looks good. Thanks for putting this list together. I would add only that the reason morphological models are overly simplistic at the moment is because of the arbitrariness of labelled states across characters: in DNA, and "A" is always an "A", but in morphology "1"s have no correspondence across characters. This makes it more difficult (impossible at the moment) do use more general transition matrices (unless one is willing to model each character separately).
Joseph.

Too cool that you presented! I'm looking forw...

2011-06-23T16:45:28.949-07:00

Too cool that you presented! I'm looking forward to hear about your new radical thoughts!

I really like the idea of an "environment&quo...

2011-04-19T09:18:49.125-07:00

I really like the idea of an "environment" being composed of the physical elements AND the "other" elements (biotic, intanglibes, etc.) of a place. I guess I never really realized that before.

Thanks a lot! This some good stuff...

2011-04-19T09:16:28.310-07:00

This comment has been removed by the author.