Dithyramb of molecular phylogeny. Is the world structured like dill or like rowan?

The series of interviews with Nadezhda Markina about molecular methods in modern biology is summarized by a conversation with Mikhail Gelfand, Ph.D. biol. Sciences, Vice President for Biomedical Research at Skoltech, Member of the European Academy, Honorary Member of the International Society for Computational Biology. Mikhail Sergeevich had to answer some questions “for molecular geneticists” that arose from previous speakers. 

— All the zoologists and botanists I spoke with acknowledge that molecular methods have revolutionized phylogeny. It has been said that the job of morphologists is not to try to construct their own separate phylogeny from morphology, but to try to understand how this morphology could have come about. Although some “buts” were also expressed. First: in molecular trees, some things change quickly, some groups “jump” back and forth across the tree, and shouldn’t we wait until everything settles down before trusting these trees? And the second “but”: molecular trees are built on a probabilistic approach, and probability is not 100% certainty. What can you say to these “buts”? 

— I can answer the first “but” like the lilac foreigner in Torgsin: “I love Karoshi, but I don’t like the bad one.” Bad molecular phylogenies are quickly rearranged, but good molecular phylogenies are not rearranged. Modern methods for constructing trees, among other things, report a measure of their confidence in internal nodes. The fact that we can evaluate the reliability of the reconstruction is the great advantage of molecular trees. With morphological trees it can get to the point of massacre, but there is no quantitative assessment. 

Second. Any new taxonomy based on molecular trees has been under active debate for some time. Well, the hypothesis is that nematodes and arthropods belong to the same group. For several years people were arguing, drawing different trees, then it settled down, and, in my opinion, it has now become commonplace. But it has become a commonplace before our eyes. 

There are different hygiene rules when constructing molecular trees. They need to be checked for stability. You need to check that they do not depend on which representatives of the taxa you took. Bootstraps are made to evaluate them. But if you follow the rules of hygiene, then everything is fine. And here’s the problem with morphological trees. Surprisingly, it turns out that no matter what molecular tree is constructed, classical zoologists and botanists will find morphological characters that will correspond to this tree in a remarkable way. And thus it is clear that trees based on morphology are not crazy good. 

Further: it turned out that molecular trees actually lead to a revision of some very deep concepts at the level of types. And, if I understand correctly, this has more or less settled down. On the other hand, they allow you to do some things that morphologists could not. 

There are two of my favorite stories. First: whales turned out to be the closest relatives of hippopotamuses. Now this is a completely common place, but the story is still wonderful. And secondly, the snakes turned out to be lizards. And I suspect that in fact there are many other similar stories that we simply do not know about, they are less promoted. Although these are vertebrates, and whales are generally mammals. That is, we are talking about very recent events. On the other hand, some taxa appeared that people simply did not distinguish. There was some strange bush, amborella, by external signs they shoved it here and there, but it turned out that this was the earliest branch of flowering plants.

subscription
Amborella trichopoda. Photo by Scott Zona (Wikipedia)

And we are all talking about multicellular organisms that have at least some morphology. And the most wonderful thing happened not to them, but to single-celled animals. Firstly, the protozoa, which were actually the same trash heap as Linnaean’s “worms”, simply threw in everything that was incomprehensible. It turned out that there are a lot of different protozoa; Many new types have been and are being discovered. The relationship between them is not fully understood, but the depth of the division is obvious. 

Then another favorite story of mine: mushrooms ceased to be lower plants and ceased to be plants altogether; they turned out to be closer to animals. It became clear that multicellularity arose many times, which no morphologists apparently had in mind: the classical picture was that multicellular organisms first appeared, and then they divided into plants and animals. But they did the math and showed that everything was wrong. In general, this is an ideological level thing. And this is nothing, but where everything has become completely different is prokaryotes. 

— Yes, there some taxa are only described molecularly. 

— There, all major taxa and the relationships between them are described only molecularly. And when they started making metagenomes, it turned out that for most of the diversity we simply did not suspect that this could happen. If you look at the taxonomy of bacteria, deep nodes are poorly resolved; a very deep reconstruction will be unreliable. I have a student who wants to understand what is primary in bacteria - one membrane or two membranes - and cannot, because the tree is constantly being rebuilt at deep nodes, and this changes the whole scenario. 

—  It’s not very clear why. 

— From purely mathematical considerations, the signal simply disappears there. Depending on the topology of the tree, evolutionary scenarios change greatly. And the tree is unreliable because too many changes have already accumulated. In the limit, if an infinite amount of time has passed, then everyone will turn out to be related to each other at the same level. 

But nevertheless, a lot of interesting things are reliably restored. Archaeal hypothesis - did it start as sequence analysis? They thought that these were such strange and different bacteria, and then Woese, based on molecular data, declared them to be an independent large group. Then the hypothesis of symbiogenesis by Lynn Margulis that mitochondria are the descendants of some bacteria. It was proven by the same molecular trees, and it became clear which bacteria we were talking about. And now, apparently, it has been finally proven that we are a chimera of lokiarchaea, which were only recently molecularly calculated from metagenomes (their genomes were first made, and only then cultivated) and alpha-proteobacteria (however, now Kunin also wants to mix in a virus as a third component 4 - in five years we’ll see if it turned out well). All in all, it's such a wonderful new world. 

Yes, and the last thing is about twin species that morphologists do not distinguish. 

—  Of course, everyone talks about this and admits that molecular methods are simply invaluable here. 

— And then they find morphological features.

—  Yes, then they find it, if they dig hard enough, everything is right. But they also talk about the fact that some groups “move” back and forth along the tree. And about the fact that with large taxa it is more or less clear, in some groups it is clear, but in others not so much. 

— OK then. Firstly, it simply means that not enough data has been collected and the situation has not yet settled down. It’s like with whales - the first trees there were also not incredibly reliable. And then, when enough data appeared, it all lined up. And when some groups walk back and forth, it simply means that either the trees were built poorly, or there really isn’t enough data, you just need to collect a little.

subscription
What do a giraffe, a camel, a wild boar and a killer whale have in common? Molecular genetic methods provide the answer. Now these animals are united into one order: the whale-toed ungulates (Cetartiodactyla). "Wikipedia"

There may be another problem that can only be solved using molecular methods. Let's say, if we are talking about flowers, then there is also hybridization. And then the evolution turns out to be not very arboreal. A favorite example that Maria Logacheva and Alexey Penin are studying with the next generation of graduate students is the shepherd’s purse (Capsella bursa-pastoris). She's wildly successful, growing everywhere. But this is a hybrid of two very isolated - at least for now - species. Each of them is much less universal, it is endemic, sits in its own habitat and does not go anywhere. And the shepherd's purse dangles everywhere. But how can you tell that she is a hybrid? You need to look at every genome - paternal, maternal. I probably don’t know any cheap ways to draw molecular trees taking into account hybridization. Although maybe they exist. 

And then my favorite jokes begin. The Denisovans are a purely molecular object. How much of this Denisovan is left there? One and a half teeth and three phalanges... 

—  Not only. Another jaw was found in Tibet. 

—  Well, yes, still half a jaw. But from the very beginning, the Denisovans are a purely molecular story

subscription
Half of a Denisovan mandible found on the Tibetan Plateau (Dongju Zhang, Lanzhou University)

And besides, all sorts of wonderful stories about the introgression of mitochondria from a brown bear into a polar bear, which means that all polar bears on the maternal side are descendants of one brown mother bear, who is known where and when she lived. And in fact there are many such stories. 

There are also stories about infectious cancers, which are also purely molecular. It is not viruses like human papilloma or Rous sarcoma in birds that cause cancer, but the cancer cells themselves, which have broken away from the original host organism and function as infectious agents. Venereal sarcoma of dogs is a cancer of a single dog, and it is known when it appeared - a tree was drawn and, based on the rate of divergence, they found out when that unfortunate first dog lived that got sick. And the facial cancer of the Tasmanian devil - on the contrary, it is young , and it seems like there are even two different ones, which in itself is surprising. 

Another wonderful story of the same sort. Bivalves in the Atlantic Ocean have infectious cancers. Moreover, one of these mollusks has cancer from a mollusk of another species: cancer cells of one species have now become infectious cancer of another species. It would be interesting to look at morphologists who would find out this in some way other than sequence analysis. 

This is such a praise for molecular phylogeny, which is done by drawing trees. But I understand why classical zoologists feel unsure when the word “probability” is uttered in front of them: because the theory of probability is taught very poorly in natural science departments at universities; just not what it needs. In fact, this is a great advantage - that these trees have a built-in mechanism for assessing their reliability. 

What can molecular trees do? 

— Another “but”. Morphologists understand how everything works in the body, what is responsible for what, and how it all works. And at the molecular level, they ask, do we know the path from molecules to the functioning of a specific organ, and especially an organism? And when will we know? 

— Firstly, an interesting question is whether this understanding is an illusion. And second, more significant. For a very long time, different people said that we need to study evo-devo, evolution-development. Well, there is still an opinion that ontogeny repeats phylogeny, but we know that this is not true. Haeckel, as you know, greatly embellished his pictures. But the original idea was correct: differences between adult creatures are laid down in embryogenesis, and from an evolutionary point of view, these differences are purely molecular - at the level of regulation of gene expression. And now this can be studied meaningfully because we can compare transcriptomes of single cells. 

— We see at what stage certain genes begin to work in them? 

—  Yes, we see at what moment which genes begin to work. For one particular gene this could have been done before, each time in a separate experiment. And, having spent a lot of effort from graduate students, they established the gene networks for the development of Drosophila, sea urchin, and Arabidopsis flower. And now we can make transcriptomes of single cells, so we can see the pathways of cell differentiation. When did the interesting story with phylogeny begin? 

When the density of coverage of existing diversity by known genomes has reached a certain critical level. And then all sorts of non-trivial things became visible. 

—  When about? 

— It was done differently in different groups, but it was done very unevenly. But, in a good way, a qualitative change is, apparently, zero. Although archaea, in general, are also a qualitative breakthrough, obtained by molecular methods, and this is the end of the 1970s. Good developmental series of transcriptomes, first total, then individual tissues, and now single-celled ones - this is happening right now. Recently there was work: they compared the early development of mice and rabbits. Right at the level of how some types of cells replace others. But to do this very well, you need twenty rodents, and then you will understand how rodents develop. And twenty more primates to compare rodents with primates. 

— This is, of course, incredibly cool. But an amateurish question: what does this give us? 

—  Well, this will explain to us why a mouse is a mouse, and a rabbit is a rabbit. Did the classics want to understand how everything works? There was already quite an old experiment, which I really love. They took a gene that regulates the development of the forelimb in mammals. In my opinion, its function is that it determines at what point the cartilage ossifies, such a master regulator of this development. They took this gene from a bat and transplanted it into the genome of a regular mouse. And nothing happened, an ordinary mouse grew up. And then they took the regulatory region in front of this gene, and transplanted it from a bat into the genome of a regular mouse. And the gene itself was left as it was. And immediately the paw became 15% longer. That is, it was not the gene itself that worked, but its regulation.

subscription
Phylogeny of mammals. Ill. Vrata / Wikipedia

Mammals, by and large, differ from each other not in the set of genes or even gene variants, but in when they turn on and off. In fact, people understood this in words for quite a long time - at the level of hand waving. But for half a century it was such chatter, theoretical, and for the last ten years you can touch these things with your hands, look with your eyes, put them into a computer and start comparing. Unfortunately, it's very expensive to do this well, but it gets easier. In the first article about transcriptomes of single cells, there were, in my opinion, about three hundred of them, and it was an article in Nature. And now in normal articles there are thirty thousand of them - or already three hundred thousand. 

You can make genomes of single cells and, for example, look at the phylogeny of neurons. What would be a naive picture? Well, there is a brain, and in each part of the brain the neurons are relatives to each other, because some cell divided and its descendants formed this part. It turns out that there is nothing of the kind: neurons that are located in one place are genetically very distant relatives of each other. So distant that their common ancestor may not have been a neuronal precursor at all, but some earlier cell. And their functional identity is determined by the place they ended up in. 

In hindsight, it is clear that this was a very correct engineering decision. Why? Let's imagine that a region of the brain is the descendants of one cell. This means that if during the process of ontogenesis this cell accidentally died, then an entire area has disappeared, because the cells that should have developed from it did not appear. And the situation when the identity of a neuron is determined by where it ends up is much more stable. Because even if some cells randomly die during ontogenesis, and this is inevitable, others take the place of their descendants. This is much more stable than a rigid hierarchy. 

Another useful application of cellular phylogenetics is reconstructing the history of cancer tumors. These are the same wood methods. You can see in what order the mutations appeared, how the clones differ, and what the origin of the metastases is. 

Thus, the scope of application of molecular trees is actually much wider than it seems at first glance. Let's say embryology at the genome level - that is, which cells come from which. This can be complemented by transcriptomics to describe the differentiation of cell types and tissues. Cancer phylogeny is a reversed embryology, dedifferentiation, regression to early cell types. 

— Is it possible for anyone in Russia to make transcriptomes of single cells? 

— In Russia they work with single cells, although in few places. Colleagues from the Institute of Gene Biology made the chromatin structure in single Drosophila cells; We then processed this data with them - and we got a good article. 

Then - this is my favorite story - it turns out that in insects with complete transformation in the pupa, the transcriptional program of embryo is reproduced. Not as brightly as we would like, but it works a little. So evo-devo, something people have wanted for a long time, can now be done arithmetically. 

“The question turns from a scholastic one into a purely computational one” 

— If we return to general biology. I asked everyone: have the criteria for the species changed? In general, what are considered species? And there was a general consensus that species are certainly a reality, but the criteria for species are being blurred. 

— Well, are Neanderthals a separate species or not? 

— Well, there are different opinions. But crossing occurred. 

— Crossbreeding took place, but the boys were dead.” It was successful because we are his descendants, but it was also not entirely successful because the male hybrids apparently were poorly fertile, and we also see this in genomes. 

— And how can we answer the question of whether this is a separate species? 

— This problem has always been there, only it was solved at the level of conversations. And now some quantitative estimates are possible for it. 

— I tried to bring everyone to quantitative assessments. 

— There is a classic definition that I heard from Alexey Simonovich Kondrashov: a species is a set of individuals between which there is a free exchange of genetic material. 

— It happens that groups in nature are isolated, and free exchange does not occur, but if they are “put in one bucket” (not my expression), then individuals can reproduce. 

— There are many different experimental designs. For example, you put a male of one species and a female of another together, and they begin to reproduce because they have nothing to do. And if the female had a choice, she would take a male of her own species, but would not pay attention to him. This is a classic thing: there must be an experiment with competition. Second option. There are equestrians, two different species, they do not interbreed at all - they do not produce offspring. But they turn into one species if they are fed tetracycline, because Wolbachia does not allow them to interbreed. In general, there are always gray areas in biology. There are understandable extremes and there are more or less broad transitional situations. A normal biologist approaches this without neurosis.

— Zoologists are looking for a quantitative level of molecular differences that could serve as a criterion for a species. And it is different in different groups. 

 Absolutely right. It is different in different groups, and a person differs from a chimpanzee by one time by 100 letters, and two specimens of Drosophila also differ from each other by one time by 100 letters. This means, from the point of view of Drosophila, humans and chimpanzees are one species. But there are no offspring, it has been verified: enthusiasts were engaged in this business a hundred years ago. 

In principle, for each family it is possible to gather people who work on it, and they will decide that, say, in beetles we distinguish one species with such and such similarity of such and such a gene, and in butterflies - with such and such similarity of another gene ... But, firstly, I don’t really understand how this will be useful - this is still not a substantive, but a technical definition. Secondly, there will still be a lot of exceptions, because in biology there are always a lot of exceptions, and taxonomists will still argue. Thirdly, this will actually have to be done separately for each taxon, which in itself, in my opinion, is very ridiculous. There will be a thick reference book: in such an order, a species has so many percentages of similarity for such a gene, and in another order, a species has a different number of percentages for another gene. And on each page there is the seal of the corresponding department. 

The correct approach, but orders of magnitude more expensive, is to sequence the genomes of a noticeable number of individuals of one and another species, and then see if there is a flow of alleles here or there. If there was no flow of alleles, then these are different species. If you see that they have hybridized many times and continue to do so, it means that they are one species, then you can divide them into subspecies as you wish - that’s what they just did with the Far Eastern tigers: they divided them into subspecies according to genomes. It is clear that in the overwhelming majority in most cases there will be no money for this, but for some important and endangered species it is necessary to understand the genetic structure of the population, in particular, in order to correctly plan protective measures: to preserve all subspecies, but not to mix them. 

Returning to evo-devo. We understand that changes in regulation that alter morphology and physiology are essential for speciation. They also occur as a result of mutations, but the proportion of such mutations is small. Therefore, defining a species based simply on sequence similarity is not even very good conceptually, because it measures the time of divergence of populations, but not the substantive differences. 

There are  classic cichlids in African lakes; They undergo explosive speciation, and at the same time they interbreed freely. That is, they just don’t interbreed freely - they don’t want to - but if they are not given a choice, then they will interbreed, and the result will be an ancestral form, a gray one. Since the species there are very young, they are genetically and sequence-wise very similar, but morphologically they are quite different. Again, this means that some specific genes work a little differently in them. Therefore, for example, the shape of the mouth turns out to be different, and in the end someone is a scraper, someone is a predator, and someone is a picker from the bottom or from the surface. And the colors are different - it is important that boys and girls recognize each other, and not others. But at the same time they hybridize quite strongly, this is visible. 

—  But this does not prevent them from being considered species. 

—  I don’t care... I asked the zoologists another question. The species are okay, we will at least somehow define the species. What is genus? And all other taxonomic levels? The most honest zoologists say that yes, of course, the rest are just constructs that we make for convenience. On the one hand, yes. On the other hand... Let's say what an order is in mammals - this is quite clear (except for really some special cases, like with whales). And it’s clear why. Apparently, because at some point they all diverged very quickly. Mammal orders collapse around 70 million years ago. This is a situation like that of a dill inflorescence - a one-time division into many branches, then another - a clear hierarchy. Another option is with rowan: the branching in the inflorescence is chaotic, but the output is still an umbrella. Therefore, if the world were structured like a dill umbrella, then we would have childbirth, families, everything would be fine. If the world is structured like the umbrella of a rowan tree, then there are no genera or families, and we can only formally say that 50% of something is an order. Why 50%? Because it's convenient for us. 

The tongues seem to be structured like dill. I asked linguists, there really is a concept of a language family, it is reasonable, families of approximately the same level of division. And there are no full-fledged hybrid languages (pidgins don’t count). 

Generally speaking, this could be watched, I even know how. What is the difference between dill and rowan? If we project the branching nodes of dill onto the axis that runs along the inflorescence (that is, onto the continuation of the stem), then we will have points at which a lot of branching nodes are projected. And if we project the rowan onto an axis that runs along the inflorescence, then we will not see anything like that, because the branches will be chaotically scattered along this axis and there will be no condensations. 

Why are mammalian orders good? If we take the tree of mammals and project it onto the time axis, then we see a very large condensation at the moment when, in fact, the current orders were formed. And this means that the order of mammals is a reasonable unit, completely objective, with which one can operate. 

That is, in fact, the question of whether a high-level taxon exists meaningfully or is it a purely technical construct without any content is resolved this way: we take a tree, project it onto the time axis and see if there is condensation. 

— With Elena Temereva, professor of the Department of Invertebrate Zoology, Faculty of Biology, Moscow State University, in “Conversations for Life” you talked about the Cambrian: if many, many organisms arose in the Cambrian at once, then they can be considered the ancestors of types. 

— This means that types also exist. The question turns from a scholastic one into a purely computational one. We may not be able to do this for some reason, but at least we can perform a thought experiment.

— I have another unexpected question. Why do you think plants need such complex genomes? It seems that they do not have to solve such complex problems in life as animals do - movement there, learning... 

— Who said that these genomes are complex? 

— Well, first of all, they are very big. 

— And the amoeba has even more, what now? A large genome is not good; a large genome means that selection is ineffective. In addition, we need to look at what functional classes of genes plants have. Let's say they have a more complex metabolism, they produce a lot of secondary metabolites, because everyone gnaws on them, but they cannot escape, and they need to defend themselves with some kind of chemical. And in general, they need genes for all occasions, because they can’t eat anyone, which means they have to synthesize everything, and they have to endure stress, because you can’t hide. 

On the other hand, complexity in our sense is behavioral reactions, and this does not require very many genes. This requires a flexible regulatory system. Again, the germination of neurons - you need to follow some general engineering principles, due to a not very large number of genes. And then it works out on its own. 

Now an interesting question is formulated, the answer to which I do not know: what happens if we compare the complexity of regulatory networks. It may turn out that our regulation is largely combinatorial, and therefore a lot of regulatory genes are not needed, but different combinations of their products are needed. But for flowers it can be flat, when each condition requires its own separate regulator. You just have to watch it.