Dr. Roland Wittler
Seminar: Wednesday, 10.15-11.45 in M3-115
Office hours: by arrangement
In addition to small mutations in the genome, like the deletion, insertion or substitution of single bases, larger, so called structural variations, like the deletion, insertion, rearrangement, inversion or duplication of whole segments of the genome sequence, play an important role, e.g., in the development of cancer. High-throughput whole-genome sequencing enables detecting structural variations in several ways.
This is a classical literature seminar, i.e., on the first day, topics are introduced and selected by the students. In following sessions, students give a presentation on their topic and afterwards write an essay (“Hausarbeit”). Aspects of scientific writing and presenting will be covered as well.
Talks and essays can be done in German or English.
A collection of publications discussed in the seminar is provided in the “Lernraum” in the eKVV, including some review articles on structural variation detection.
https://bis.uni-bielefeld.de/sites/8358/Start.aspx (You have to register for this seminar in the eKVV by including it into your eKVV schedule.)
|11.10.2017||administratives, overview on topics and selection|
|01.11.2017||– national holiday –|
|22.11.2017||Scientific Writing / Read alignment||Roland / Dennis|
|29.11.2017||Practical session: Mappings and handling of BAM files||Roland|
|06.12.2017||Split-read methods / Genome Analysis Tool Kit||Lena / Paul B.|
|20.12.2017||Paired-end mapping approaches||Timo / Fabienne|
|– X-Mas break –|
|10.01.2018||Assembly-based approaches / Phasing||Manuel / Ilja|
|24.01.2018||Big genome projects / Copy number variation tools||Matthias / Dennis|
Once you are added to the CeBiTec user group “seqan” you have access to the volume:
In the subfolder
HG00514, you find Illumina paired-end sequencing data. To be precise, you will find one file for each mate (suffix
_2.fastq.gz) as well as a short extract of each (suffixes
head.fastq.gz) which is easier to handle for test purposes. In the subfolder
hg38, you find a reference genome (that has already been indexed to be used by BWA). There is also a folder
TEST which you should use to play with the data. Here you find an example script
runBWA.sh that runs BWA on the head-version of the read data and also does some SAM/BAM conversion. Please make your own copy of this script before you modify it. Do not do any heavy computations on a standard terminal! Instead submit the job to the compute cluster:
qsub -cwd -P seqan -l idle=1 -pe multislot 4 runBWA.sh
You can check the status of your job with
qstat and kill it with
qdel <job id>. The output of the job can be found in files called
<scriptname>.o<job id> and
<scriptname>.e<job id>, where the first should be empty and the second contains output and/or error messages of the tools used.
If you want to do, say, medium weight computations interactively, login on a compute hist with
qlogin -P seqan.
Once you have your BAM file, you could, e.g. do the following things.
samtools view roland.bam | head
samtools view -o roland.chr1.bam roland.bam chr1
samtools view roland.bam | cut -f 3 | sort | uniq -c | sort -n
samtools view roland.bam | cut -f 9 > roland.fragmentlengths.tsv
Rto plot the fragment length statistic:
# read data fl<-read.table(file="roland.fragmentlengths.tsv", header=FALSE) # take absolute values from first (and only) column fla<-abs(fl$V1) # filter for outliers by using quantiles flaf<-subset(fla, fla>quantile(fla,0.01) & fla<quantile(fla,0.99)) # plot histogram hist(flaf,breaks=50,ylab="fragment length") # quit R quit()
If you want to apply one of “your” tools, create an individual subfolder
GATK etc.) and make it group writable (
chmod g+w <folder>).
Back to Teaching