Big Data ran the numbers on the avian tree of life, in the largest study of its kind to date.
Bird in flight. Image: Bengt Nyman
A massive international effort to map the avian genome has finally come to fruition today, with the publication of dozens of interdisciplinary papers. Hundreds of researchers were involved in the project, which spanned four years, 20 nations, and 80 institutions, and made use of nine supercomputers for a total of 400 years of CPU time.
"It is a big challenge to organize 200 people from various fields and to combine their results together to understand bird evolution," evolutionary geneticist Guojie Zhang told me over email. Zhang was one of the three major leaders of the project, along with neurobiologist Erich Jarvis and evolutionary biologist Thomas Gilbert.
The exhaustive effort succeeded in producing the most comprehensive and accurate avian tree of life ever compiled, as well as the largest whole genome study of any class of animals to date. It has huge implications for future studies in evolutionary genetics.
"Because we singled out the most important questions to be addressed by using all these genomes, we figured out a way to let people working on the same directions," Zhang continued. "We still encouraged individual groups [to work] on their interested topics and produce their own papers."
Indeed, the wealth of new papers addresses everything from the evolution of bird sex chromosomes, to avian vocal ability, to the "big bang" of speciation the clade underwent after the Cretaceous-Paleogene extinction event. Zhang led one of the project's flagship papers—a comparative genome analysis of 48 species of birds, with every major avian lineage represented. It will be among the eight papers from this project to be featured in the December 12 issue of Science.
Among the questions Zhang's team addressed was the mechanism limiting the size of the avian genome. Despite being the most biodiverse tetrapods on the planet, the average avian genome is only about a third of the size of the mammalian genome.
"We tried to get an answer about what happened on the genome of bird ancestors by comparing the genomes of birds and mammals," he continued. "We found that [...] most mammals have at least 20 percent of their genomes consisting of repeat sequences, while most birds have less than five percent of their genomes consisting of repeat DNA. We also found that birds have lost thousands of genes during their ancestral stage."
There are many hypotheses for why birds evolved such compact genomes. "The most famous one is that the small genome may be associated with reducing metabolism cost, thus can provide advantage in adaptation of flight," Zhang said. "It is interesting that a smaller genome size also appears in bats, the only mammalian lineage with flight."
flamingoes were found to be more closely related to pigeons than they are to pelicans
The comparative analysis exposed interesting relationships within the avian family as well. Conventional wisdom would dictate that falcons would be more closely related to birds of prey like eagles and vultures, but they are actually more closely related to parrots and songbirds. In a similarly counterintuitive familial bond, flamingoes were found to be more closely related to pigeons than they are to pelicans, despite the minimal resemblance between the species.
Indeed, the supercomputer analysis revealed that the relationships between species vary from one gene to the next. "As you go across each chromosome, the phylogeny that best describes the evolution of that region can change," co-author Siavash Mirarab, a computational biology graduate student, told me over email.
"So, for example, in one gene, you can have owl being more closely related to falcon than to eagle," he said, "but in another region, owl might be more closely related to eagle than to falcon."
"These genomic regions can have evolutionary histories, represented by trees, that are different from the species tree," added bioengineer and computer scientist Tandy Warnow, who spearheaded the computational side of the project. Warnow and Mirarab developed a technique called "statistical binning" to boost the accuracy of these gene trees, which in turn led to more precise species trees.
"The breakthrough was in the method, which is a combination of computer science and statistics," she told me. "Fortunately this will help many other species tree projects for other groups—not just birds!"
And that might be the truly exceptional upshot of this glut of studies: the potential to apply these new big data techniques to all kinds of difficult evolutionary questions. "These types of studies are going to become more common and our new methods are going to be an important component in accurate analyses of genome-scale phylogenetic data in future," said Mirarab.
"Collaborating with people where each person brings special skills and knowledge to the effort is great," added Warnow. "Everyone is needed, because no one else can do your job! And the people in this group are a lot of fun."
Aside from the eight papers slated for publication in Science, research from the project will be also be published in GigaScience, Genome Biology, BMC Genomics, and several other journals.
It's worth browsing through them to get a sense of just how much information was gleaned from this marriage of supercomputing and genetics, because I was barely able to scratch the surface in these 800-odd words. Needless to say, Christmas has come early for the ornithologically inclined.