This shows you the differences between two versions of the page.
|
teaching:2017winter:svseminar [2017/10/05 13:36] |
teaching:2017winter:svseminar [2020/02/14 09:07] (current) |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Detection of Genomic Structural Variation ====== | ||
| + | Dr. Roland Wittler\\ | ||
| + | Seminar: Wednesday, 10.15-11.45 in M3-115\\ | ||
| + | Office hours: by arrangement \\ | ||
| + | Office: U10-145\\ | ||
| + | |||
| + | ===== Content ===== | ||
| + | |||
| + | In addition to small mutations in the genome, like the deletion, insertion or substitution of single bases, larger, so called //structural variations//, like the deletion, insertion, rearrangement, inversion or duplication of whole segments of the genome sequence, play an important role, e.g., in the development of cancer. High-throughput whole-genome sequencing enables detecting structural variations in several ways. | ||
| + | |||
| + | This is a classical literature seminar, i.e., on the first day, topics are introduced and selected by the students. In following sessions, students give a presentation on their topic and afterwards write an essay ("Hausarbeit"). Aspects of scientific writing and presenting will be covered as well. | ||
| + | |||
| + | Talks and essays can be done in German or English. | ||
| + | |||
| + | ===== Literature ===== | ||
| + | |||
| + | A collection of publications discussed in the seminar is provided in the "Lernraum" in the [[https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=103587990|eKVV]], including some review articles on structural variation detection. | ||
| + | |||
| + | https://bis.uni-bielefeld.de/sites/8358/Start.aspx (You have to register for this seminar in the eKVV by including it into your eKVV schedule.) | ||
| + | |||
| + | |||
| + | ===== Requirements ===== | ||
| + | |||
| + | * Recommended prior knowledge: Sequence Analysis | ||
| + | * Oral presentation (20-45 minutes) | ||
| + | * Essay (8-15 pages) | ||
| + | |||
| + | ===== Topics ===== | ||
| + | |||
| + | * Array, Array CGH | ||
| + | * Read alignment (BWA, BWA-sw, Bowtie2, MrFast) | ||
| + | * Representation and handling of mappings and call sets (samtools, VCF, IGV) | ||
| + | * Genome Analysis Tool Kit (GATK) | ||
| + | * Copy number variation approaches (CNVnator, SegSeq) | ||
| + | * Split-read methods (Pindel, LASER) | ||
| + | * Paired-end mapping approaches, probabilistic (Breakdancer, MoDil) | ||
| + | * Paired-end mapping approaches, combinatorial (CLEVER, GASV) | ||
| + | * Assembly-based approaches (SOAP denovo(2)) | ||
| + | * Phasing (WhatsHAP, review) | ||
| + | * Long-read mapping (Chaisson, Pendleton) | ||
| + | * Big genome projects (1000 Genomes Project, Genome of the Netherlands) | ||
| + | |||
| + | ===== Timeline ===== | ||
| + | |||
| + | ^ Date ^ Topic ^ Who ^ | ||
| + | | 11.10.2017 | administratives, overview on topics and selection | | | ||
| + | | 18.10.2017 | | | | ||
| + | | 25.10.2017 | | | | ||
| + | | 01.11.2017 | -- national holiday --|| | ||
| + | | 08.11.2017 | | | | ||
| + | | 15.11.2017 | | | | ||
| + | | 22.11.2017 | Scientific Writing / Read alignment | Roland / Dennis | | ||
| + | | 29.11.2017 | Practical session: Mappings and handling of BAM files | Roland | | ||
| + | | 06.12.2017 | Split-read methods / Genome Analysis Tool Kit | Lena / Paul B. | | ||
| + | | 13.12.2017 | | | | ||
| + | | 20.12.2017 | Paired-end mapping approaches | Timo / Fabienne | | ||
| + | | -- X-Mas break -- ||| | ||
| + | | 10.01.2018 | Assembly-based approaches / Phasing | Manuel / Ilja | | ||
| + | | 17.01.2018 | Long-read mapping | Pia | | ||
| + | | 24.01.2018 | Big genome projects / Copy number variation tools | Matthias / Dennis | | ||
| + | | 31.01.2018 | | | | ||
| + | |||
| + | |||
| + | ===== Hands on ===== | ||
| + | |||
| + | Once you are added to the CeBiTec user group "seqan" you have access to the volume: | ||
| + | |||
| + | /vol/seqan/svseminar | ||
| + | | ||
| + | In the subfolder ''HG00514'', you find Illumina paired-end sequencing data. To be precise, you will find one file for each mate (suffix ''_1.fastq.gz'' and ''_2.fastq.gz'') as well as a short extract of each (suffixes ''head.fastq.gz'') which is easier to handle for test purposes. In the subfolder ''hg38'', you find a reference genome (that has already been indexed to be used by BWA). There is also a folder ''TEST'' which you should use to play with the data. Here you find an example script ''runBWA.sh'' that runs BWA on the head-version of the read data and also does some SAM/BAM conversion. Please make your own copy of this script before you modify it. **Do not do any heavy computations on a standard terminal!** Instead submit the job to the compute cluster: | ||
| + | |||
| + | qsub -cwd -P seqan -l idle=1 -pe multislot 4 runBWA.sh | ||
| + | |||
| + | You can check the status of your job with ''qstat'' and kill it with ''qdel <job id>''. The output of the job can be found in files called ''<scriptname>.o<job id>'' and ''<scriptname>.e<job id>'', where the first should be empty and the second contains output and/or error messages of the tools used. | ||
| + | |||
| + | If you want to do, say, medium weight computations interactively, login on a compute hist with ''qlogin -P seqan''. | ||
| + | |||
| + | Once you have your BAM file, you could, e.g. do the following things. | ||
| + | |||
| + | * Have a first look at the mappings: ''samtools view roland.bam | head'' | ||
| + | * Extract all mappings on a certain chromosome: ''samtools view -o roland.chr1.bam roland.bam chr1'' | ||
| + | * Which chromosome has been hit how many times? ''samtools view roland.bam | cut -f 3 | sort | uniq -c | sort -n'' | ||
| + | * Extract the fragment lengths: ''samtools view roland.bam | cut -f 9 > roland.fragmentlengths.tsv'' | ||
| + | * Use ''R'' to plot the fragment length statistic: | ||
| + | |||
| + | # read data | ||
| + | fl<-read.table(file="roland.fragmentlengths.tsv", header=FALSE) | ||
| + | # take absolute values from first (and only) column | ||
| + | fla<-abs(fl$V1) | ||
| + | # filter for outliers by using quantiles | ||
| + | flaf<-subset(fla, fla>quantile(fla,0.01) & fla<quantile(fla,0.99)) | ||
| + | # plot histogram | ||
| + | hist(flaf,breaks=50,ylab="fragment length") | ||
| + | # quit R | ||
| + | quit() | ||
| + | | ||
| + | |||
| + | If you want to apply one of "your" tools, create an individual subfolder ''MYTOOL'' (e.g. ''LASER'', ''GATK'' etc.) and make it group writable (''chmod g+w <folder>''). | ||
| + | |||
| + | Back to [[:teaching|Teaching]] | ||