Epistem are pleased to announce their acquisition of the Illumina NextSeq 550 next generation sequencer. With tunable output and high data quality, it provides the flexible power you need for whole-genome, transcriptome, and targeted resequencing.
Assessment of Linearity and Reproducibility
To establish the linearity and precision of the instrument Epistem have replicated the work of the third phase of the MAQC project (MAQC-III), also called Sequencing Quality Control (SEQC)1. This work was aimed at assessing the technical performance of next-generation sequencing platforms by generating benchmark datasets with reference samples (A – Universal Human Reference RNA & B – Human Brain Reference RNA).
We used expression measurements from 4 replicates each of total RNA samples, A and B, and mixtures of these two samples at defined ratios of A 3:1 B (C) and A 1:3 B (D).
Samples were indexed and libraries prepared using the Illumina TruSeq Stranded mRNA kit. Samples normalized for multiplexing and sequenced using the Illumina high output kit V2.5.
Data quality was high with 97.89% of bases achieving a sequence quality score of 30 or above. Multiplexing of samples worked well with an average of 17.44M +/- 1.4M reads generated per sample. Alignment of those reads using the STAR alignment algorithm2 was also impressive with 99.5% of reads mapping to the hg19 genome. Counts were extracted using the featureCounts3 package and scaled using the DESeq2 package4.
The samples C and D are 3:1 ratio mixes of the A and B samples, if the signal we are producing is linear it should be possible to predict the values we will see when measuring the C & D samples using the values produced by the A and B samples.
Signal C = 0.75*A + 0.25*B
Signal D = 0.25*A + 0.75*B
These in silico values were calculated for samples C and D and compared to an average the observed values for the 4 replicates of those 2 samples. As can be seen from the figures below the correlation between the observed and predicted values is excellent with and adjusted R2 for both samples of 0.992.
The signal we are detecting from the NextSeq 550 is linear and reproducible.
Assessment of Accuracy
To further assess the reliability of the data we are producing we were able to leverage a panel of ~20,000 confirmatory qPCR reactions performed by the SEQC project. Filtering for genes we detected in our experiment we were able to assess 13468 genes. As the absolute values of the platforms are different the log ratio of the A and B samples was used to compare as this should be stable across platforms. As can be seen in the plot below we see excellent correlation between the NextSeq results and the qPCR of the SEQC experiment, with an adjusted R2 of 0.81.
These data show that the NextSeq platform at Epistem is working with a high degree of consistent accuracy.
RNA-seq and Microarray Comparison
To compare our new platform to our existing technologies Epistem have assessed a hair ex vivo experiment using both the Illumina TruSeq Stranded mRNA kit and the Affymetrix 3' IVT PLUS kit. This work was aimed at comparing the technical performance of next-generation sequencing platforms to microarray.
We exposed anagen hairs from 4 donors ex vivo to a single dose of vehicle or 0.01, 0.1 or 1µM dose of BEZ235 (dual PI3K and mTOR inhibitor) for 24 hrs.
Samples normalized for multiplexing and sequenced using the Illumina high output kit V2.5. Samples prepared for microarray were hybridized to the GeneChip™ Human Genome U133 Plus 2.0 Array.
Data quality was high with 96.63% of bases achieving a sequence quality score of 30 or above. Multiplexing of samples worked well with an average of 18.3M +/- 1M reads generated per sample. Alignment of those reads using the STAR alignment algorithm2 was also impressive with 98.4% of reads mapping to the hg19 genome. Counts were extracted using the featureCounts3 package and scaled using the DESeq2 package4.
The microarray data also showed good consistent signal across the cohort.
Both the RNA-Seq and microarray cohort showed strong separation by treatment dose as can be seen in the principal component analysis (PCA) below.
ANOVA analysis of the cohorts revealed that a much greater amount of differentially expressed gene that survived a multiple testing correction, false discovery rate (FDR) was achieved in the RNA-Seq data as can be seen in the table below. The differential expression of both cohorts showed dose dependent increase in number. At the high dose 1µM the microarray cohort produced only 45% (329) of the differential expression observed in the RNA-Seq cohort.
Examining the overlap between the platforms the effect is even more dramatic. The Venn diagram above intersects the 2 dataset on genes and this reduces the microarray list to 233 unique genes from the 329 probes.
Of those 233 genes 182 were observed in the RNA-Seq data set (78%). However, there are further 547 genes not seen in the microarray cohort in the RNA-Seq data. Subtracting the 51 microarray genes not observed in the RNA-Seq data this leaves 496 genes gained in the RNA-Seq data.
That represents a 2.7 fold increase in the amount of differential achieved using the microarray platform.
Assessment of Target Engagement by RNA-seq and Microarrays
Querying both datasets with Epistem internally derived PI3K signature, as can be seen from the graphs below the level of reduction of the signature increases with concentration of BEZ235. However, despite the fact that the signature was derived from data on the Affymetrix plus 2 chip the RNA-Seq data shows greater engagement of the signature with an average score of -0.44 compared to -0.22 in the microarray data.
Examination of the 1µM differentially expressed list by interrogation of the connectivity map5 a library of gene expression responses to drugs to which you can compare your expression profiles. The table below shows that the top 4 agents which showed a positive enrichment were all agents that target the PI3K pathway.
Overlay of RNA-Seq differential expression onto the PI3K/AKT signaling pathway shows good engagement as seen in the diagram below. The number of engagements of the pathway indicates significant deregulation of the pathway had been achieved.
These data show that even when multiplexing a high number of samples the increased sensitivity of RNA-Seq produces robust biologically meaningful transcriptional data in excess of that generated from a whole transcriptome microarray.
1. SEQC/MAQC-III Consortium (2014). A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol. 32(9):903-14.
2. Dobin A et al (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 1;29 (1):15-21.
3. Liao Y, Smyth GK and Shi W (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30.
4. Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.
5. Lamb J et al (2006). The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 29;313 (5795):1929-35.