Patel et al.
High-throughputȱshort-readȱsequencingȱisȱoneȱofȱtheȱ junctions.ȱTopHatȱcanȱfindȱspliceȱjunctionsȱwithoutȱ
latestȱsequencingȱtechnologiesȱtoȱbeȱreleasedȱtoȱtheȱ aȱ referenceȱ annotation.ȱ Byȱ firstȱ mappingȱ RNA-Seqȱ
genomicsȱcommunity.ȱ
readsȱ toȱ theȱ genome,ȱ TopHatȱ identifiesȱ potentialȱ
Typically,ȱ theȱ initialȱ useȱ ofȱ short-readȱ sequencingȱ
exons,ȱsinceȱmanyȱRNA-Seqȱreadsȱwillȱcontiguouslyȱ
wasȱconfinedȱtoȱmatchingȱdataȱfromȱgenomesȱthatȱ
alignȱtoȱtheȱgenome.ȱInȱthisȱstep,ȱweȱhaveȱconsideredȱ
wereȱ nearlyȱ identicalȱ toȱ theȱ referenceȱ genome.ȱ
genomeȱ ofȱ Arabidopsis thaliana ȱ asȱ theȱ referenceȱ
Transcriptomeȱanalysisȱonȱaȱglobalȱgeneȱexpressionȱ
genomeȱforȱTophat2ȱanalysis.ȱThisȱstepȱwasȱdoneȱforȱ
levelȱisȱanȱidealȱapplicationȱofȱshort-readȱsequencing.ȱ
threeȱtimesȱforȱeachȱRNA-seqȱsampleȱandȱallȱresultsȱ
Next-generationȱ sequencingȱ hasȱ becomeȱ aȱ feasibleȱ
ofȱ TopHat2ȱ inȱ .bamȱ fileȱ formatȱ areȱ consideredȱ forȱ
methodȱ forȱ increasingȱ sequencingȱ depthȱ andȱ
furtherȱCufflinksȱanalysis.ȱ
coverageȱwhileȱreducingȱtimeȱandȱcostȱcomparedȱtoȱ
Cufflinks
theȱtraditionalȱSangerȱmethodȱ(LȱJȱCollinsȱ et al. 2008).ȱ
Cufflinksȱ assemblesȱ transcripts,ȱ estimatesȱ theirȱ
Thisȱ studyȱ showsȱ Geneȱ Expressionȱ studyȱ ofȱ threeȱ
abundances,ȱandȱtestsȱforȱdifferentialȱexpressionȱandȱ
differentȱ conditionsȱ ofȱ Arachis hypogaea ȱ L.ȱ whichȱ
regulationȱ inȱ RNA-Seqȱ samples.ȱ Itȱ acceptsȱ alignedȱ
wereȱ treatedȱ withȱ differentȱ methodȱ andȱ analysisȱ
RNA-Seqȱreadsȱandȱassemblesȱtheȱalignmentsȱintoȱaȱ
wasȱ doneȱ byȱ usingȱ variousȱ Bioinformaticsȱ toolsȱ toȱ
parsimoniousȱsetȱofȱtranscripts.ȱCufflinksȱassemblesȱ
getȱdetailȱinformationȱofȱGeneȱExpressionȱwhichȱisȱ
individualȱtranscriptsȱfromȱRNA-seqȱreadsȱthatȱhaveȱ
reportedȱinȱcurrentȱstudy.ȱ
beenȱ alignedȱ toȱ theȱ genomeȱ (Coleȱ Trapnellȱ et al.
Materials and Methods
2012).ȱInȱthisȱstep,ȱweȱhaveȱusedȱgenesȱofȱ Arabidopsis
thaliana ȱasȱreferenceȱ(.gtfȱfile)ȱannotationȱinȱCufflinks.ȱ
Sequence Retrieval
ThisȱstepȱwasȱdoneȱforȱthreeȱtimesȱforȱeachȱRNA-seqȱ
sampleȱinȱ .bamȱfilesȱ whichȱ areȱ outputȱ ofȱ TopHat2ȱ
Asȱ aȱ partȱ ofȱ research,ȱ threeȱ SRAȱ sequencesȱ
andȱallȱresultsȱofȱCufflinksȱ(.gtfȱfileȱformat)ȱwhichȱareȱ
SRR1212866,ȱ SRR1212867ȱ andȱ SRR1212868ȱ areȱ
consideredȱforȱfurtherȱCuffmergeȱanalysis.
downloadedȱfromȱBioProjectȱIDȱ243319ȱfromȱNCBIȱ
databaseȱ forȱ Geneȱ expressionȱ study.ȱ Theseȱ dataȱ Cuffmerge
filesȱsubmittedȱonȱ2-April-2014.ȱTheseȱSRAȱfilesȱareȱ
convertedȱintoȱ.fastqȱfilesȱbyȱSRAȱTOOLKITȱofȱNCBI.
CuffmergeȱusedȱtoȱmergeȱtogetherȱseveralȱCufflinksȱ
assemblies.ȱ Inȱ thisȱ step,ȱ threeȱ .gtfȱ filesȱ ofȱ eachȱ
NGS QC Toolkit
threeȱ Cufflinksȱ resultsȱ andȱ referenceȱ genesȱ fileȱ
ofȱ Arabidopsis thaliana ȱ consideredȱ asȱ referenceȱ
NGSȱ QCȱ Toolkit,ȱ itȱ isȱ anȱ applicationȱ forȱ qualityȱ
annotationȱandȱresultȱwasȱoneȱmergedȱtranscriptȱfileȱ
checkȱ andȱ filteringȱ ofȱ high-qualityȱ data.ȱ Theȱ
ofȱthreeȱtranscripts.
toolkitȱ isȱ comprisedȱ ofȱ user-friendlyȱ toolsȱ forȱ QCȱ
ofȱsequencingȱdataȱgeneratedȱusingȱRocheȱ454ȱandȱ Cuffdiff
Illuminaȱplatforms,ȱandȱadditionalȱtoolsȱtoȱaidȱQCȱ
(sequenceȱformatȱconverterȱandȱtrimmingȱtools)ȱandȱ
Cuffdiffȱ reportsȱ numerousȱ outputȱ filesȱ containingȱ
analysisȱ(statisticsȱtools)ȱ(PatelȱRK,ȱ et al. 2012).ȱNGSȱ
theȱresultsȱofȱitsȱdifferentialȱanalysisȱofȱtheȱsamples.ȱ
TOOLKITȱpackageȱisȱusedȱforȱsequenceȱfilteringȱandȱ
Geneȱ andȱ transcriptȱ expressionȱ levelȱ changesȱ areȱ
filteredȱ sequencesȱ thenȱ uploadedȱ toȱ Galaxyȱ serverȱ
reportedȱinȱsimpleȱtabularȱoutputȱfiles.ȱCuffdiffȱalsoȱ
forȱ FASTQ GROOMER ȱprocess.
reportsȱadditionalȱdifferentialȱanalysisȱresultsȱbeyondȱ
simpleȱ changesȱ inȱ geneȱ expression.ȱ Theȱ programȱ
TopHat2
canȱidentifyȱgenesȱthatȱareȱdifferentiallyȱsplicedȱorȱ
differentiallyȱregulatedȱviaȱpromoterȱswitchingȱ(Coleȱ
TopHatȱ isȱ aȱ programȱ thatȱ alignsȱ RNA-Seqȱ readsȱ
Trapnellȱ et al. 2012).ȱInȱthisȱstep,ȱoneȱmergedȱfileȱofȱ
toȱ aȱ genomeȱ inȱ orderȱ toȱ identifyȱ exon-exonȱ spliceȱ
296