This is an old revision of the document!

Research

The primary material analyzed by genome informatics are genomic sequences. Beyond the acquisition and basic analysis of these data, the next challenge is to extract the higher-level information encoded in them, which poses the need for sound mathematical models, efficient algorithms, and user-friendly software.

Research in the Genome Informatics group spans a broad spectrum in this exciting field, from the low level of DNA sequence comparison up to the higher levels of comparative genomics, and making better infrastructures.

Efficient Comparison of DNA Sequences (aka team kmers)

Luca

Lucas

Tizian

Computational Comparative Genomics

Leonard

Marilia

Sequence-based Phylogenomics

SANS| Comparative genomics often involves the reconstruction of phylogenies. The ever-increasing number of available genomes, many of which are published in an unfinished state or lack sufficient annotation, poses challenges to traditional phylogenetic inference methods that rely on the comparison of marker sequences. Whole-genome approaches have emerged as a solution to these challenges, but as these approaches are based on pairwise comparisons between genomes, their runtime increases quadratically with the number of input sequences, making them unsuitable in large-scale scenarios.

SANS (tool-website; Rempel and Wittler, 2021; Wittler, 2020) is a whole-genome based, alignment- and reference-free approach that does not rely on a pairwise comparison of genomes. In a pangenomic approach, evolutionary relationships are determined based on the similarity of the whole sequences. Sequence segments (k-mers) shared by a subset of genomes are interpreted as a phylogenetic split indicating the closeness of these genomes and their separation from the other genomes.

Evolution of Gene Clusters

Acestor-Reconstruction| We integrate the concept of conserved gene clusters into the framework of phylogenetics. Here, the focus is not any more on the discovery of new gene clusters, but on their evolution. Given the topology of a phylogenetic tree and the gene orders of the leaf nodes, our methods reconstruct ancestral gene orders at the internal nodes under different evolutionary (rearrangement) models (see Rococo, RINGO, PhySca).
In addition, the development of ancient DNA (aDNA) sequencing led us to the problem of integrating this additional data in the reconstruction of ancestral genomes, aiming to scaffold fragmented aDNA assemblies and to improve the global reconstruction of all ancestors in the phylogeny.

Infrastructure

SANS-ambages

SANS-ambages| We maintain a software tool for alignemt-free, whole-genome based phylogenomics implementing the SANS approach (see above: “Sequence-based Phylogenomics”). The current version “SANS ambages” (abundance-filter, multi-threading and bootstrapping on amino-acid or genomic sequences) provides several new features: besides processing DNA sequences (whole genomes or assemblies), SANS ambages can also work on amino acid level taking protein sequences (translated or untranslated) as input. Further, the ability to process read data has been enhanced by the option to filter out low-abundant sequence segments. Multiple input sequences can be processed in parallel, and bootstrapping allows to augment the output with confidence values. SANS is hosted as a de.NBI service and can easily be obtained from our Gitlab repository.

Genome Informatics