This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
thesistopics [2023/10/19 14:17] leonard |
thesistopics [2025/03/31 17:32] (current) arempel [<TITLE> (Bachelor/Master)] |
||
---|---|---|---|
Line 6: | Line 6: | ||
- | ===== Improvement of Sequence-to-Graph Alignment (Bachelor) ===== | + | ===== SAT solutions for rearrangement problems (Bachelor/Master) ===== |
- | [[https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=70863414|Tizian Schulz]] //(Please also refer to the project page: [[https://gitlab.ub.uni-bielefeld.de/gi/plast|PLAST]])// | + | Genomic rearrangements play a critical role in evolution and adaptation, altering genome structure and organization, thereby influencing phenotypic traits. Understanding these rearrangements helps identify the mechanisms underlying many genetic diseases and evolutionary processes. |
+ | However, even quantifying rearrangements for most genomes that occur in practice is NP-hard under realistic models. | ||
+ | While Integer Linear Programming (ILP) is commonly used to address this challenge, it has been recently observed that SAT formulations of the same problems may offer a faster alternative. The project involves converting an existing ILP solution for genomic rearrangement quantification into a SAT-based solution, leveraging SAT solvers' computational efficiency. | ||
+ | Contact [[https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=129443261 |Leonard]] for details. | ||
+ | ===== Plasmid recovery/assembly from long reads (Bachelor/Master) ===== | ||
+ | Despite advances in long-read sequencing technologies, plasmid assembly remains a significant challenge. Current state-of-the-art assemblers often fail to recover plasmids, especially those smaller than 10kb. This limitation is concerning because small plasmids often harbor important virulence and antimicrobial resistance genes. Popular assemblers such as Flye, Miniasm, Raven, and Canu show variable success, with recovery rates dropping dramatically for smaller plasmids. Furthermore, when plasmids are recovered, they often appear as multiple copies or are misassembled into the chromosome, highlighting fundamental limitations of current assembly algorithms. We are developing a novel pangenomic approach that uses gene identification within long reads and //k//-mers over the gene alphabet to improve the quality of the assembly graph. The main tasks for the students will be to analyze the performance of this method and to potentially improve it. Contact [[https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=108064501|Andreas]] for details. | ||
+ | ===== <TITLE> (Bachelor/Master) ===== | ||
+ | <Project description> | ||
- | PLAST is a new heuristic method to find maximum scoring local alignments of a DNA query sequence to a pangenome represented as a compacted colored de Bruijn graph. The first method has been published [[https://doi.org/10.1093/bioinformatics/btab077|here]], but there exist various ideas how to improve the method. Some are well suited for a Bachelor thesis. Contact Tizian for details. | ||
- | |||
- | |||
- | ===== (Runtime) Heuristic for the Fast Comparison of Genomes (Bachelor) ===== | ||
- | [[ https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=129443261 | Leonard Bohnenkämper]] (Also refer to the [[ https://gitlab.ub.uni-bielefeld.de/gi/ding | gitlab1]]/ [[ https://gitlab.ub.uni-bielefeld.de/lbohnenkaemper/dingiiofficial | 2]]) | ||
- | |||
- | DING ([[ https://doi.org/10.1089/cmb.2020.0434 |publication]]) is an exact ILP solution for an NP-hard problem, comparing arbitrary genomes on a high level under the DCJ-Indel model. It is already very fast for small to medium size genomes. However, for some large or very complex genomes that occur in practice, DING is not able to calculate solutions. There are some ideas how to circumvent this problem using approximate or heuristic methods, which could be developed as a Bachelor thesis or as a Master project module. Contact [[ https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=129443261 | Leonard ]] for details. | ||
- | |||
- | |||
- | ===== DCJ-Indels of Natural Genes (Bachelor) ===== | ||
- | [[http://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=65864 | Jens Stoye]] | ||
- | |||
- | We have developed the tool DING (DCJ-Indels of Natural Genomes), which could also be applied to protein sequences, resulting in a new tool DING (DCJ-Indels of Natural Genes). | ||
- | |||
- | Necessary is basic knowledge in algorithms and sequence analysis, ideally also algorithms in comparative genomics. | ||
- | |||
- | |||
- | ===== Visualizing Phylogenetic Splits (Bachelor) ===== | ||
- | [[https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=3721521 | Roland Wittler]] //(Please also refer to the project page: [[https://gitlab.ub.uni-bielefeld.de/gi/sans|SANS]])// | ||
- | |||
- | {{ :sans.png?nolink|}}SANS is an efficient method for alignment-free, whole-genome based phylogeny estimation that follows a pangenomic approach to efficiently calculate a set of splits in a phylogenetic tree or network. Splits Tree is a tool to visualize such split networks. In this project, the output of SANS should be simplified by replacing individual, textual genome labels by colored bullets that indicate phylogenetic subgroups (or other properties), see example. | ||
- | |||
- | ===== Horizontal Gene Transfer Detection (Master) ===== | ||
- | [[https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=70863414|Tizian Schulz]] or | ||
- | [[https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=3721521 | Roland Wittler]] or | ||
- | [[ https://ekvv.uni-bielefeld.de/pers_publ/publ/PersonDetail.jsp?personId=129443261 | Leonard Bohnenkämper]] | ||
- | |||
- | Horizontal Gene Transfers (HGTs) are events that transfer genetic material from one lineage to another. HGTs are especially common in bacteria and particularly relevant for the spreading of (antibiotic) resistance factors among microbes. | ||
- | SANS is an efficient method for the construction of phylogenies and as a byproduct allows to find candidate sequences that might have been part of a HGT (see also [[https://doi.org/10.1186/s13015-020-00164-3 | Section "Drosophila"]]). There are some ideas how to automatize the process of finding and verifying such HGT candidates, which can be developed into a master thesis - or you can even bring your own! |