This is an old revision of the document!
The primary material analyzed by genome informatics are genomic sequences. Beyond the acquisition and basic analysis of these data, the next challenge is to extract the higher-level information encoded in them, which poses the need for sound mathematical models, efficient algorithms, and user-friendly software.
Research in the Genome Informatics group spans a broad spectrum in this exciting field, from the low level of DNA sequence comparison up to the higher levels of comparative genomics, and making better infrastructures.
Comparative genomics often involves the reconstruction of phylogenies. The ever-increasing number of available genomes, many of which are published in an unfinished state or lack sufficient annotation, poses challenges to traditional phylogenetic inference methods that rely on the comparison of marker sequences.
Whole-genome approaches have emerged as a solution to these challenges, but as these approaches are based on pairwise comparisons between genomes, their run time increases quadratically with the number of input sequences, making them unsuitable in large-scale scenarios.
SANS (tool-websiteRempel and Wittler, 2021; Wittler, 2020) is a whole-genome based, alignment- and reference-free approach that does not rely on a pairwise comparison of genomes. In a pangenomic approach, evolutionary relationships are determined based on the similarity of the whole sequences. Sequence segments (k-mers) shared by a subset of genomes are interpreted as a phylogenetic split indicating the closeness of these genomes and their separation from the other genomes.
We integrate the concept of conserved gene clusters into the framework of phylogenetics. Here, the focus is not any more on the discovery of new gene clusters, but on their evolution. Given the topology of a phylogenetic tree and the gene orders of the leaf nodes, our methods reconstruct ancestral gene orders at the internal nodes under different evolutionary (rearrangement) models (see Rococo, RINGO, PhySca).
In addition, the development of ancient DNA (aDNA) sequencing led us to the problem of integrating this additional data in the reconstruction of ancestral genomes, aiming to scaffold fragmented aDNA assemblies and to improve the global reconstruction of all ancestors in the phylogeny.