A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report generated on 2022-08-18, 12:56 based on data in:
/sc/arion/projects/bigbrain/data/ROSMAP/RAPiD_Runs
General Statistics
Showing 5118 samples.
featureCounts
Subread featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Picard
Picard is a set of Java command line tools for manipulating high-throughput sequencing data.
Alignment Summary
Plase note that Picard's read counts are divided by two for paired-end data.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
GC Coverage Bias
This plot shows bias in coverage across regions of the genome with varying GC content. A perfect library would be a flat line at y = 1
.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Insert Size
Plot shows the number of reads at a given insert size. Reads with different orientations are summed.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Mark Duplicates
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
RnaSeqMetrics Assignment
Number of bases in primary alignments that align to regions in the reference genome.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Gene Coverage
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
STAR
STAR is an ultrafast universal RNA-seq aligner.
Alignment Scores
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Gene Counts
Statistics from results generated using --quantMode GeneCounts
. The three tabs show counts for unstranded RNA-seq, counts for the 1st read strand aligned with RNA and counts for the 2nd read strand aligned with RNA.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Trimmomatic
Trimmomatic is a flexible read trimming tool for Illumina NGS data.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
FastQC
FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.
Sequence Counts
Sequence counts for each sample. Duplicate read counts are an estimate only.
This plot show the total number of reads, broken down into unique and duplicate if possible (only more recent versions of FastQC give duplicate info).
You can read more about duplicate calculation in the FastQC documentation. A small part has been copied here for convenience:
Only sequences which first appear in the first 100,000 sequences in each file are analysed. This should be enough to get a good impression for the duplication levels in the whole file. Each sequence is tracked to the end of the file to give a representative count of the overall duplication level.
The duplication detection requires an exact sequence match over the whole length of the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Sequence Quality Histograms
The mean quality value across each base position in the read.
To enable multiple samples to be plotted on the same graph, only the mean quality scores are plotted (unlike the box plots seen in FastQC reports).
Taken from the FastQC help:
The y-axis on the graph shows the quality scores. The higher the score, the better the base call. The background of the graph divides the y axis into very good quality calls (green), calls of reasonable quality (orange), and calls of poor quality (red). The quality of calls on most platforms will degrade as the run progresses, so it is common to see base calls falling into the orange area towards the end of a read.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Per Sequence Quality Scores
The number of reads with average quality scores. Shows if a subset of reads has poor quality.
From the FastQC help:
The per sequence quality score report allows you to see if a subset of your sequences have universally low quality values. It is often the case that a subset of sequences will have universally poor quality, however these should represent only a small percentage of the total sequences.
Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).
Per Base Sequence Content
The proportion of each base position for which each of the four normal DNA bases has been called.
To enable multiple samples to be shown in a single plot, the base composition data is shown as a heatmap. The colours represent the balance between the four bases: an even distribution should give an even muddy brown colour. Hover over the plot to see the percentage of the four bases under the cursor.
To see the data as a line plot, as in the original FastQC graph, click on a sample track.
From the FastQC help:
Per Base Sequence Content plots out the proportion of each base position in a file for which each of the four normal DNA bases has been called.
In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in your genome, but in any case they should not be hugely imbalanced from each other.
It's worth noting that some types of library will always produce biased sequence composition, normally at the start of the read. Libraries produced by priming using random hexamers (including nearly all RNA-Seq libraries) and those which were fragmented using transposases inherit an intrinsic bias in the positions at which reads start. This bias does not concern an absolute sequence, but instead provides enrichement of a number of different K-mers at the 5' end of the reads. Whilst this is a true technical bias, it isn't something which can be corrected by trimming and in most cases doesn't seem to adversely affect the downstream analysis.