Next Generation Sequencing versus Microarray – a Comparison Using a PI-3K Inhibitor

Epistem are pleased to announce their acquisition of the Illumina NextSeq 550 next generation sequencer. With tunable output and high data quality, it provides the flexible power you need for whole-genome, transcriptome, and targeted resequencing.

Assessment of Linearity and Reproducibility

To establish the linearity and precision of the instrument Epistem have replicated the work of the third phase of the MAQC project (MAQC-III), also called Sequencing Quality Control (SEQC)1. This work was aimed at assessing the technical performance of next-generation sequencing platforms by generating benchmark datasets with reference samples (A – Universal Human Reference RNA & B – Human Brain Reference RNA).

ngs-spotlight-figure-1-2

We used expression measurements from 4 replicates each of total RNA samples, A and B, and mixtures of these two samples at defined ratios of A 3:1 B (C) and A 1:3 B (D).

Samples were indexed and libraries prepared using the Illumina TruSeq Stranded mRNA kit. Samples normalized for multiplexing and sequenced using the Illumina high output kit V2.5.

Data quality was high with 97.89% of bases achieving a sequence quality score of 30 or above. Multiplexing of samples worked well with an average of 17.44M +/- 1.4M reads generated per sample. Alignment of those reads using the STAR alignment algorithm2 was also impressive with 99.5% of reads mapping to the hg19 genome. Counts were extracted using the featureCounts3 package and scaled using the DESeq2 package4.

The samples C and D are 3:1 ratio mixes of the A and B samples, if the signal we are producing is linear it should be possible to predict the values we will see when measuring the C & D samples using the values produced by the A and B samples.

Signal C = 0.75*A + 0.25*B

Signal D = 0.25*A + 0.75*B

These in silico values were calculated for samples C and D and compared to an average the observed values for the 4 replicates of those 2 samples. As can be seen from the figures below the correlation between the observed and predicted values is excellent with and adjusted R2 for both samples of 0.992.

The signal we are detecting from the NextSeq 550 is linear and reproducible.

Assessment of Accuracy

ngs-spotlight-figure-3

To further assess the reliability of the data we are producing we were able to leverage a panel of ~20,000 confirmatory qPCR reactions performed by the SEQC project. Filtering for genes we detected in our experiment we were able to assess 13468 genes. As the absolute values of the platforms are different the log ratio of the A and B samples was used to compare as this should be stable across platforms. As can be seen in the plot below we see excellent correlation between the NextSeq results and the qPCR of the SEQC experiment, with an adjusted R2 of 0.81.

These data show that the NextSeq platform at Epistem is working with a high degree of consistent accuracy.

RNA-seq and Microarray Comparison

To compare our new platform to our existing technologies Epistem have assessed a hair ex vivo experiment using both the Illumina TruSeq Stranded mRNA kit and the Affymetrix 3' IVT PLUS kit. This work was aimed at comparing the technical performance of next-generation sequencing platforms to microarray.

We exposed anagen hairs from 4 donors ex vivo to a single dose of vehicle or 0.01, 0.1 or 1µM dose of BEZ235 (dual PI3K and mTOR inhibitor) for 24 hrs.

Samples normalized for multiplexing and sequenced using the Illumina high output kit V2.5. Samples prepared for microarray were hybridized to the GeneChip™ Human Genome U133 Plus 2.0 Array.

Data quality was high with 96.63% of bases achieving a sequence quality score of 30 or above. Multiplexing of samples worked well with an average of 18.3M +/- 1M reads generated per sample. Alignment of those reads using the STAR alignment algorithm2 was also impressive with 98.4% of reads mapping to the hg19 genome. Counts were extracted using the featureCounts3 package and scaled using the DESeq2 package4.

The microarray data also showed good consistent signal across the cohort.

Both the RNA-Seq and microarray cohort showed strong separation by treatment dose as can be seen in the principal component analysis (PCA) below.

ngs-spotlight-figure-4

ANOVA analysis of the cohorts revealed that a much greater amount of differentially expressed gene that survived a multiple testing correction, false discovery rate (FDR) was achieved in the RNA-Seq data as can be seen in the table below. The differential expression of both cohorts showed dose dependent increase in number. At the high dose 1µM the microarray cohort produced only 45% (329) of the differential expression observed in the RNA-Seq cohort.

ngs-spotlight-table-1

 

ngs-spotlight-figure-5

Examining the overlap between the platforms the effect is even more dramatic. The Venn diagram above intersects the 2 dataset on genes and this reduces the microarray list to 233 unique genes from the 329 probes.

Of those 233 genes 182 were observed in the RNA-Seq data set (78%). However, there are further 547 genes not seen in the microarray cohort in the RNA-Seq data. Subtracting the 51 microarray genes not observed in the RNA-Seq data this leaves 496 genes gained in the RNA-Seq data.

That represents a 2.7 fold increase in the amount of differential achieved using the microarray platform.

Assessment of Target Engagement by RNA-seq and Microarrays

Querying both datasets with Epistem internally derived PI3K signature, as can be seen from the graphs below the level of reduction of the signature increases with concentration of BEZ235. However, despite the fact that the signature was derived from data on the Affymetrix plus 2 chip the RNA-Seq data shows greater engagement of the signature with an average score of -0.44 compared to -0.22 in the microarray data.

ngs-spotlight-figure-6

Examination of the 1µM differentially expressed list by interrogation of the connectivity map5 a library of gene expression responses to drugs to which you can compare your expression profiles. The table below shows that the top 4 agents which showed a positive enrichment were all agents that target the PI3K pathway.

ngs-spotlight-table-2

Overlay of RNA-Seq differential expression onto the PI3K/AKT signaling pathway shows good engagement as seen in the diagram below. The number of engagements of the pathway indicates significant deregulation of the pathway had been achieved.

ngs-spotlight-figure-7

These data show that even when multiplexing a high number of samples the increased sensitivity of RNA-Seq produces robust biologically meaningful transcriptional data in excess of that generated from a whole transcriptome microarray.

References

1.  SEQC/MAQC-III Consortium (2014). A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol. 32(9):903-14.

2.  Dobin A et al (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 1;29 (1):15-21.

3.  Liao Y, Smyth GK and Shi W (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30.

4.  Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550.

5.  Lamb J et al (2006). The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 29;313 (5795):1929-35.

Want to ask a question about

Gene Expression or Plucked Hair Analysis?

Epistem's Gene Expression Services

Epistem is your ideal partner for biomarker discovery and validation for use in clinical and pre-clinical studies. We have extensive experience in gene expression analysis with qPCR and microarrays in our GCLP-accredited labs and have recently expanded our next generation sequencing capabilities with the purchase of a NextSeq550 machine. We have considerable experience in analysis of tissues with low RNA input such as liquid biopsies, single cells and hair bulbs and can provide full in house bioinformatics support for all of our studies.

Epistem provides a unique plucked hair biomarker platform for targeting intracellular signaling pathways in oncology, inflammation, fibrosis and other therapeutic areas. Plucked hair provides a minimally invasive surrogate tissue to assess epithelial tissue drug-induced changes. Effects on mRNA and protein expression levels can be analyzed.

We have leveraged over 15 years of histology and IHC expertise to develop RNA-friendly stains for specific cell types making them amenable for gene expression studies using Laser Capture Microdissection. All of our histology and gene expression, LCM and hair IHC applications are GCLP-compliant and we have participated in many clinical studies.

About Epistem

Epistem's contract research service is committed to providing reliable, innovative and transferable pre-clinical models and services to support decision making throughout the drug discovery and development pipeline.

Tel: +44 (0)161 850 7600  Email: info@epistem.co.uk