Mini Symposium - June 16, 2011

New Ideas in Evolutionary Analysis

June 16, 2011
10am - 15pm
room: U10-146

Schedule

10.00 - 10.45	Mike Steel	“What can probability theory tell us about life's past - and future?”
10.45 - 11.15	Daniel Dörr	“Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions.”
11.15 - 11.30	break
11.30 - 12.00	Pina Krell	“Reconstruction of tumor cell lines from NGS data.”
12.00 - 12.30	Sebastian Jünemann	“Targeting the 'Rare Biosphere': Microbial Community Profiling by Massively Parallel 16S rDNA Tag Sequencing.”
12.30 - 13.30	lunch
13.30 - 14.15	Xu Shuhua	“Population Genomics: Mapping History and Genes.”
14.15 - 14.45	Andreas Dress	“The Dynamics of Pandemics: Can the quasispecies concept explain why they peter out?”
14.45 - …	open discussion

Abstracts

Mike Steel: "What can probability theory tell us about life's past - and future?"

In a landmark 1925 paper [1], George Udny Yule FRS described a simple neutral mathematical process for explaining the observed distribution of species into genera. In this model, each species can give rise to a new species at a constant rate by a random process. Eighty-five years later, this 'Yule process' provides a basis for studying the 'shape' of macroevolution, as well to modeling a wide array of related phenomena in other fields of science. In this talk, I highlight two somewhat surprising results concerning the Yule process in biology. One concerns the distribution of times between speciation events, and its implications for how much 'evolutionary heritage' might be lost over the next century due to extinction. The other is a theorem which demonstrates that the reliable estimation of ancestral information in the distant past depends on whether or not the ratio of speciation to mutation exceeds a critical ratio. For a simple symmetric mutation model, this critical ratio turns out to be the number 6.

[1] G. U. Yule, A mathematical theory of evolution. Based on the Conclusions of Dr. J.C. Willis, F.R.S. Phil. Trans. Roy. Soc. 213 (1925), 21-87.

Daniel Dörr: "Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions."

Distance-based phylogenetic reconstruction methods rely heavily on accurate pairwise distance estimates. There are two separate sources of error in this estimation process:

1. the relatively short sequence alignments used to obtain distance estimates induce a “stochastic error” corresponding to estimation of model parameters from finite data;

2. model misspecification leads to a “fixed error” which does not depend on sequence length.

It is common practice to assume some substitution model over the sequence data and use an additive substitution rate function for that model when computing pairwise distances. In the providential case when the assumed model coincides with the true model, which is typically unknown, the distance estimates will not be afflicted with fixed error. But even then, there is no reason to a-priori enforce a zero fixed error, when this causes elevated rates of stochastic error, especially in the case of short sequence alignments.

This work challenges this paradigm of “using the most additive distance function at any cost”. We do this by studying the contribution and effect of both fixed and stochastic error in distance estimation. We present a formal framework for quantifying the fixed error associated with a specific distance function and a given phylogenetic tree in a homogeneous substitution model. As an example, we study the behavior of the Jukes-Cantor distance formula in homogeneous instances of Kimura's two parameter substitution model. The effects of fixed error are observed through analytic results and experiments on simulated data. In addition, we compare the performance of various distance functions on biological sequences. We evaluate reconstruction accuracy by comparing the reconstructed trees to an independently validated species tree. Our study indicates that often enough simple distance functions outperform more sophisticated functions, despite the fact that the given sequence data appears to have poor fit to the substitution model they assume.

Pina Krell: "Reconstruction of tumor cell lines from NGS data."

Estimating cell relations of higher, multicellular organisms is an unrevealed task in many areas of biology. Fundamental questions from embryogenesis to specific medical topics in cancer development such as tumor growth and metastasis formation can be answered by knowledge of the evolvement pattern of the organisms cell.

Through a repertoire of events, namely cell divisions, cell migration and cell death, cells of a multicellular organism develop from a single cell, the zygote. Each cell thus stands in the path of descent of the fertilized egg and underwent a specified number of cell divisions. Those divisions depict the depth of each cell in particular. The lineage relation of an organism's cells represents the pattern of cell division the organism underwent during its development and the relatedness of its cells. While usually - although there are exceptions - cell division is a binary process, lineage relations can easily be represented as a labeled rooted binary tree.

In the nineteenth century, lineages trees such that of Caenorhabditis elegans were examined by direct observation of cell division. While this simple method required small, transparent and rapidly growing embryos to be applied feasibly, it was not applicable to higher organisms like humans. With the fundamental discovery that the genomic content of each cell undergoes minor changes during each cell division, so-called somatic mutations, depth estimation of a single cell has theoretically been enabled. Assuming that during normal development of a higher organism such somatic mutations accumulate proportionally with cell depth, implicitely the entire cell lineage can be encoded in the cell's DNA. Making cell depth and the distance of the cell from the zygote a strongly correlated measure, cells which share a common developmental path should share mutations that occurred along this path.

Somatic mutations, especially in microsatellites are coupled to DNA replication during cell division. With the arise of high-throughput sequencing this valuable information can be captured in high abundance. Examining a few hundreds of microsatellites in each cell then may suffice to reconstruct portions of a cell lineage tree with known and adapted phylogenetic algorithms.

Reconstruction of lineages trees enables specific lineage analysis but also allows to find common cell division patterns among different lineage trees. Especially in the field of cancer, which is thought to arise from a single founder cell, lineages can help to assemble knowledge about cancer initiation and metastasis formation. Finding new cancer initiating cells not fitting into a tumor specific sublineage or migrated tumor cells, metastasis, fitting into the downstream tumor lineage, can thus yield new conclusion to cancer therapy.

The field of comparative lineage tree analysis further will enable comparison for example of tissue specific lineage trees to find common cell division patterns among different lineage trees.

Sebastian Jünemann: "Targeting the 'Rare Biosphere': Microbial Community Profiling by Massively Parallel 16S rDNA Tag Sequencing."

The central questions in the field of metagenomics is which organisms are present in a microbial community and how they are related to each other. With the rise of next-generation sequencing platforms, community profiling based on conserved regions of the 16S rDNA experienced a renaissance, particularly by metagenomic studies, and still constitutes the gold standard in taxonomic classifications. Recent studies investigating rDNA tag sequencing data support the hypothesis that the species richness and diversity of microbial communities is greatly underestimated, especially regarding non-culturable low abundant organisms, i.e. the 'rare biosphere'. However, 16S rDNA based analyses are subject to yet insufficient analyzed biases, errors and side effects, e.g. the artificial formation of chimeric sequences. Along with the natural limitations of the 16S rDNA this renders the rare biosphere difficult to access. In this talk I give a short introduction into microbial community profiling by 16S rDNA tag sequencing in connection with recent metamicrobiomic surveys. Thereby, I will focus on the limitations as well as the major issues when analyzing and interpreting 16s rDNA data. In a second part I present an overview of my current project, where we are utilizing an artificial metagenome in order to detect and validate tag sequencing related errors.

Xu Shuhua: "Population Genomics: Mapping History and Genes."

Population genomics, in its most general form, refers to the inference of population genetic and evolutionary parameters from genome-wide data sets. In the context of identifying substrates of positive selection, population genomics offers a potential solution to some key limitations of candidate gene studies. Here, I will provide an overview of our recent studies on human genetic history and gene mapping based on population genomic approach. But my focus will be on
(i) the population structure & genetic history of the Uyghurs and the Kazakhs in Xinjiang which is geographically located in Central Asia;
(ii) high altitude adaptation of the Tibetans, an ethnic group with a long-lasting presence on the Tibetan Plateau which is known as the highest plateau in the world.
I am also trying to explore the possibility of collaborations in case anyone is interested in the field of population genomics which, to my understanding, needs great contributions from people in mathematics, statistics and computer science.

Andreas Dress: "The Dynamics of Pandemics: Can the quasispecies concept explain why they peter out?"

The Dynamics of Pandemics: Can the quasispecies concept explain why they peter out? All of the great pandemics (see e.g. Wikipedia/List_of_epidemics for a historical account) except perhaps very few of those that were restricted to small isolated islands have come to a – sometimes even quite unexpected – end before all potential victims were infected. In central Europe, they may have killed 30% to sometimes even 70% of its population, but never just all people. In the lecture, I will discuss the hypothesis that this is due to the fact that infectious agents, when copied within – or even by – the host, give rise to a “quasispecies”, that is, a large and rather heterogeneous variety of mutants sufficiently many of which will eventually, when infecting people, act rather as a protective vaccine than as dangerous pathogen.

Back to Events

Genome Informatics