Differences

This shows you the differences between two versions of the page.

Link to this comparison view

teaching:2017winter:svseminar [2017/10/14 14:07]
teaching:2017winter:svseminar [2020/02/14 09:07] (current)
Line 1: Line 1:
 +====== Detection of Genomic Structural Variation ======
 +Dr. Roland Wittler\\
 +Seminar: Wednesday, 10.15-11.45 in M3-115\\
 +Office hours: by arrangement \\
 +Office: U10-145\\
  
 +
 +===== Content =====
 +
 +In addition to small mutations in the genome, like the deletion, insertion or substitution of single bases, larger, so called //​structural variations//,​ like the deletion, insertion, rearrangement,​ inversion or duplication of whole segments of the genome sequence, play an important role, e.g., in the development of cancer. High-throughput whole-genome sequencing enables detecting structural variations in several ways.
 +
 +This is a classical literature seminar, i.e., on the first day, topics are introduced and selected by the students. In following sessions, students give a presentation on their topic and afterwards write an essay ("​Hausarbeit"​). Aspects of scientific writing and presenting will be covered as well.
 +
 +Talks and essays can be done in German or English.
 +
 +===== Literature =====
 +
 +A collection of publications discussed in the seminar is provided in the "​Lernraum"​ in the [[https://​ekvv.uni-bielefeld.de/​kvv_publ/​publ/​vd?​id=103587990|eKVV]],​ including some review articles on structural variation detection.
 +
 +https://​bis.uni-bielefeld.de/​sites/​8358/​Start.aspx (You have to register for this seminar in the eKVV by including it into your eKVV schedule.)
 +
 +
 +===== Requirements =====
 +
 +  * Recommended prior knowledge: Sequence Analysis
 +  * Oral presentation (20-45 minutes)
 +  * Essay (8-15 pages)
 +
 +===== Topics =====
 +
 +  * Array, Array CGH
 +  * Read alignment (BWA, BWA-sw, Bowtie2, MrFast)
 +  * Representation and handling of mappings and call sets (samtools, VCF, IGV)
 +  * Genome Analysis Tool Kit (GATK)
 +  * Copy number variation approaches (CNVnator, SegSeq)
 +  * Split-read methods (Pindel, LASER)
 +  * Paired-end mapping approaches, probabilistic (Breakdancer,​ MoDil)
 +  * Paired-end mapping approaches, combinatorial (CLEVER, GASV)
 +  * Assembly-based approaches (SOAP denovo(2))
 +  * Phasing (WhatsHAP, review)
 +  * Long-read mapping (Chaisson, Pendleton)
 +  * Big genome projects (1000 Genomes Project, Genome of the Netherlands)
 +
 +===== Timeline =====
 + 
 +^ Date ^ Topic ^ Who ^
 +| 11.10.2017 ​ | administratives,​ overview on topics and selection | |
 +| 18.10.2017 ​ | | |
 +| 25.10.2017 ​ | | |
 +| 01.11.2017 ​ | -- national holiday --||
 +| 08.11.2017 ​ | | |
 +| 15.11.2017 ​ | | |
 +| 22.11.2017 ​ | Scientific Writing / Read alignment | Roland / Dennis |
 +| 29.11.2017 ​ | Practical session: Mappings and handling of BAM files | Roland |
 +| 06.12.2017 ​ | Split-read methods / Genome Analysis Tool Kit | Lena / Paul B. |
 +| 13.12.2017 ​ | |  |
 +| 20.12.2017 ​ | Paired-end mapping approaches | Timo / Fabienne |
 +| -- X-Mas break -- |||
 +| 10.01.2018 ​ | Assembly-based approaches / Phasing | Manuel / Ilja |
 +| 17.01.2018 ​ | Long-read mapping | Pia |
 +| 24.01.2018 ​ | Big genome projects / Copy number variation tools | Matthias / Dennis |
 +| 31.01.2018 ​ | | |
 +
 +
 +===== Hands on =====
 +
 +Once you are added to the CeBiTec user group "​seqan"​ you have access to the volume:
 +
 +  /​vol/​seqan/​svseminar
 +  ​
 +In the subfolder ''​HG00514'',​ you find Illumina paired-end sequencing data. To be precise, you will find one file for each mate (suffix ''​_1.fastq.gz''​ and ''​_2.fastq.gz''​) as well as a short extract of each (suffixes ''​head.fastq.gz''​) which is easier to handle for test purposes. In the subfolder ''​hg38'',​ you find a reference genome (that has already been indexed to be used by BWA). There is also a folder ''​TEST''​ which you should use to play with the data. Here you find an example script ''​runBWA.sh''​ that runs BWA on the head-version of the read data and also does some SAM/​BAM ​ conversion. Please make your own copy of this script before you modify it. **Do not do any heavy computations on a standard terminal!** Instead submit the job to the compute cluster:
 +
 +  qsub -cwd -P seqan -l idle=1 -pe multislot 4 runBWA.sh
 +
 +You can check the status of your job with ''​qstat''​ and kill it with ''​qdel <job id>''​. The output of the job can be found in files called ''<​scriptname>​.o<​job id>''​ and ''<​scriptname>​.e<​job id>'',​ where the first should be empty and the second contains output and/or error messages of the tools used.
 +
 +If you want to do, say, medium weight computations interactively,​ login on a compute hist with ''​qlogin -P seqan''​.
 +
 +Once you have your BAM file, you could, e.g. do the following things.
 +
 +  * Have a first look at the mappings: ''​samtools view roland.bam | head''​
 +  * Extract all mappings on a certain chromosome: ''​samtools view -o roland.chr1.bam roland.bam chr1''​
 +  * Which chromosome has been hit how many times? ''​samtools view roland.bam | cut -f 3 | sort | uniq -c | sort -n''​
 +  * Extract the fragment lengths: ''​samtools view roland.bam | cut -f 9 > roland.fragmentlengths.tsv''​
 +  * Use ''​R''​ to plot the fragment length statistic:
 +
 +  # read data
 +  fl<​-read.table(file="​roland.fragmentlengths.tsv",​ header=FALSE)
 +  # take absolute values from first (and only) column
 +  fla<​-abs(fl$V1)
 +  # filter for outliers by using quantiles
 +  flaf<​-subset(fla,​ fla>​quantile(fla,​0.01) & fla<​quantile(fla,​0.99))
 +  # plot histogram
 +  hist(flaf,​breaks=50,​ylab="​fragment length"​)
 +  # quit R
 +  quit()
 +  ​
 +
 +If you want to apply one of "​your"​ tools, create an individual subfolder ''​MYTOOL''​ (e.g. ''​LASER'',​ ''​GATK''​ etc.) and make it group writable (''​chmod g+w <​folder>''​).
 +
 +Back to [[:​teaching|Teaching]]