Data Analysis
What we offer
Our process
Nowadays generating data is the easy part, analyzing and interpreting the obtained data sets is the real challenge. We adapted existing pipelines to optimally analyze the data output of different assays. We have access to carefully maintained in-house databases that, for example, allow for advanced Variant interpretation for targeted sequencing.
During any NGS Library Preparation unique Barcode sequences are added to each sample, allowing multiple libraries to be pooled and sequenced together. After the sequencing this information is used to unequivocally assign the sequenced reads to the individual samples (=Demultiplexing), automatically generating sample-specific FASTQ files.
The FASTQ files are the input for the subsequent read Alignment to the reference genome or transcriptome. The Alignment process assigns each sequenced DNA fragment to its matching region in the human genome/transcriptome based on its base sequence. The position of the reads is stored as a sequence Alignment/map (SAM) or binary Alignment/map (BAM) file.
Variant calling
The Alignment result is used to identify deviating positions from the reference genome, producing a list of variant calls, detailed in a variant call format (VCF) file. Single nucleotide variants (SNVs), as well as smaller insertions and deletions can be detected. For larger assays (WGS, WES) the copy number variants (CNV) and structural variants (SV) can also be assessed.
Raw counts - txt file
For transcriptome data we either provide the raw gene counts based on the Alignment results or the transcript counts based on pseudo-Alignment algorithms.
Fusion calling results
Three different fusion callers are used to identify potential fusion transcripts from transcriptome data. Identified fusion transcripts can be annotated with public databases to provide additional information about the transcript.
In order to facilitate Variant interpretation, additional information about the detected variants can be provided. The MLL routinely documents the evaluation of discovered sequence variants and, hence, in addition to clinical databases the in-house database can be assessed to estimate the clinical relevance.
Raw sequencing data from the NovaSeq system is directly streamed into a private AWS instance of Amazon Cloud in Frankfurt with restricted access. The data is completely anonymized and no personal or clinical data is stored in the cloud. The Data security measures comply with the highest standards of the new EU General Data Protection Regulation (GDPR), which has also been verified by external auditors. Raw sequencing data from the MiSeq systems is stored locally without external access.