This shows you the differences between two versions of the page.
teaching:2017winter:svseminar [2017/11/29 14:11] |
teaching:2017winter:svseminar [2020/02/14 09:07] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Detection of Genomic Structural Variation ====== | ||
+ | Dr. Roland Wittler\\ | ||
+ | Seminar: Wednesday, 10.15-11.45 in M3-115\\ | ||
+ | Office hours: by arrangement \\ | ||
+ | Office: U10-145\\ | ||
+ | |||
+ | ===== Content ===== | ||
+ | |||
+ | In addition to small mutations in the genome, like the deletion, insertion or substitution of single bases, larger, so called //structural variations//, like the deletion, insertion, rearrangement, inversion or duplication of whole segments of the genome sequence, play an important role, e.g., in the development of cancer. High-throughput whole-genome sequencing enables detecting structural variations in several ways. | ||
+ | |||
+ | This is a classical literature seminar, i.e., on the first day, topics are introduced and selected by the students. In following sessions, students give a presentation on their topic and afterwards write an essay ("Hausarbeit"). Aspects of scientific writing and presenting will be covered as well. | ||
+ | |||
+ | Talks and essays can be done in German or English. | ||
+ | |||
+ | ===== Literature ===== | ||
+ | |||
+ | A collection of publications discussed in the seminar is provided in the "Lernraum" in the [[https://ekvv.uni-bielefeld.de/kvv_publ/publ/vd?id=103587990|eKVV]], including some review articles on structural variation detection. | ||
+ | |||
+ | https://bis.uni-bielefeld.de/sites/8358/Start.aspx (You have to register for this seminar in the eKVV by including it into your eKVV schedule.) | ||
+ | |||
+ | |||
+ | ===== Requirements ===== | ||
+ | |||
+ | * Recommended prior knowledge: Sequence Analysis | ||
+ | * Oral presentation (20-45 minutes) | ||
+ | * Essay (8-15 pages) | ||
+ | |||
+ | ===== Topics ===== | ||
+ | |||
+ | * Array, Array CGH | ||
+ | * Read alignment (BWA, BWA-sw, Bowtie2, MrFast) | ||
+ | * Representation and handling of mappings and call sets (samtools, VCF, IGV) | ||
+ | * Genome Analysis Tool Kit (GATK) | ||
+ | * Copy number variation approaches (CNVnator, SegSeq) | ||
+ | * Split-read methods (Pindel, LASER) | ||
+ | * Paired-end mapping approaches, probabilistic (Breakdancer, MoDil) | ||
+ | * Paired-end mapping approaches, combinatorial (CLEVER, GASV) | ||
+ | * Assembly-based approaches (SOAP denovo(2)) | ||
+ | * Phasing (WhatsHAP, review) | ||
+ | * Long-read mapping (Chaisson, Pendleton) | ||
+ | * Big genome projects (1000 Genomes Project, Genome of the Netherlands) | ||
+ | |||
+ | ===== Timeline ===== | ||
+ | |||
+ | ^ Date ^ Topic ^ Who ^ | ||
+ | | 11.10.2017 | administratives, overview on topics and selection | | | ||
+ | | 18.10.2017 | | | | ||
+ | | 25.10.2017 | | | | ||
+ | | 01.11.2017 | -- national holiday --|| | ||
+ | | 08.11.2017 | | | | ||
+ | | 15.11.2017 | | | | ||
+ | | 22.11.2017 | Scientific Writing / Read alignment | Roland / Dennis | | ||
+ | | 29.11.2017 | Practical session: Mappings and handling of BAM files | Roland | | ||
+ | | 06.12.2017 | Split-read methods / Genome Analysis Tool Kit | Lena / Paul B. | | ||
+ | | 13.12.2017 | | | | ||
+ | | 20.12.2017 | Paired-end mapping approaches | Timo / Fabienne | | ||
+ | | -- X-Mas break -- ||| | ||
+ | | 10.01.2018 | Assembly-based approaches / Phasing | Manuel / Ilja | | ||
+ | | 17.01.2018 | Long-read mapping | Pia | | ||
+ | | 24.01.2018 | Big genome projects / Copy number variation tools | Matthias / Dennis | | ||
+ | | 31.01.2018 | | | | ||
+ | |||
+ | |||
+ | ===== Hands on ===== | ||
+ | |||
+ | Once you are added to the CeBiTec user group "seqan" you have access to the volume: | ||
+ | |||
+ | /vol/seqan/svseminar | ||
+ | | ||
+ | In the subfolder ''HG00514'', you find Illumina paired-end sequencing data. To be precise, you will find one file for each mate (suffix ''_1.fastq.gz'' and ''_2.fastq.gz'') as well as a short extract of each (suffixes ''head.fastq.gz'') which is easier to handle for test purposes. In the subfolder ''hg38'', you find a reference genome (that has already been indexed to be used by BWA). There is also a folder ''TEST'' which you should use to play with the data. Here you find an example script ''runBWA.sh'' that runs BWA on the head-version of the read data and also does some SAM/BAM conversion. Please make your own copy of this script before you modify it. **Do not do any heavy computations on a standard terminal!** Instead submit the job to the compute cluster: | ||
+ | |||
+ | qsub -cwd -P seqan -l idle=1 -pe multislot 4 runBWA.sh | ||
+ | |||
+ | You can check the status of your job with ''qstat'' and kill it with ''qdel <job id>''. The output of the job can be found in files called ''<scriptname>.o<job id>'' and ''<scriptname>.e<job id>'', where the first should be empty and the second contains output and/or error messages of the tools used. | ||
+ | |||
+ | If you want to do, say, medium weight computations interactively, login on a compute hist with ''qlogin -P seqan''. | ||
+ | |||
+ | Once you have your BAM file, you could, e.g. do the following things. | ||
+ | |||
+ | * Have a first look at the mappings: ''samtools view roland.bam | head'' | ||
+ | * Extract all mappings on a certain chromosome: ''samtools view -o roland.chr1.bam roland.bam chr1'' | ||
+ | * Which chromosome has been hit how many times? ''samtools view roland.bam | cut -f 3 | sort | uniq -c | sort -n'' | ||
+ | * Extract the fragment lengths: ''samtools view roland.bam | cut -f 9 > roland.fragmentlengths.tsv'' | ||
+ | * Use ''R'' to plot the fragment length statistic: | ||
+ | |||
+ | # read data | ||
+ | fl<-read.table(file="roland.fragmentlengths.tsv", header=FALSE) | ||
+ | # take absolute values from first (and only) column | ||
+ | fla<-abs(fl$V1) | ||
+ | # filter for outliers by using quantiles | ||
+ | flaf<-subset(fla, fla>quantile(fla,0.01) & fla<quantile(fla,0.99)) | ||
+ | # plot histogram | ||
+ | hist(flaf,breaks=50,ylab="fragment length") | ||
+ | # quit R | ||
+ | quit() | ||
+ | | ||
+ | |||
+ | If you want to apply one of "your" tools, create an individual subfolder ''MYTOOL'' (e.g. ''LASER'', ''GATK'' etc.) and make it group writable (''chmod g+w <folder>''). | ||
+ | |||
+ | Back to [[:teaching|Teaching]] |