Published by the Students of Johns Hopkins since 1896
May 12, 2024

JHSPH develops RNA analysis software

By Ian Yu | September 23, 2010

Analysis of RNA sequence datasets containing billions of RNA bases usually requires significant amounts of time and incurs significant costs. A newly developed program and method of analyzing RNA sequences may significantly reduce the time and funds needed to analyze these datasets.

Researchers at Hopkins’ Bloomberg School of Public Health have developed Myrna, a program available online that utilizes cloud computing to process RNA sequences. Myrna initially utilizes another RNA sequencing program, Bowtie (developed by one of the study’s coauthors, Ben Langmead).

Bowtie maps millions to billions of “reads,” short basepair strands, to their specific locations in the genome. After counting these reads and how many are assigned to each gene, Myrna conducts statistical analyses that yield additional information.

“Myrna performs downstream statistical analyses to identify the genes that show the biggest difference in read count across biological groups, say the genes most differentially expressed between cancer samples and normal samples,” Jeffrey Leek, assistant professor at the School of Public Health, wrote in an e-mail to The News-Letter.

Cloud model, the basis of the Myrna program, allows a user to process their DNA sequences using a desktop computer as well as a cluster of computers, such as the Amazon cloud computer cluster that the researchers referenced for a cost analysis. Leek explained that the use of clusters helps to reduce the time necessary to process a large amount of data involved in RNA sequences.

“Several of the steps in Myrna — particularly the read mapping — are very computationally intensive. For big data sets (think billions of reads), they would take a long time to analyze on your desktop computer. Myrna lets you run the software on your local desktop, but also is designed to work on local clusters of computers,” he wrote.

Leek’s experiment relied on datasets of RNA sequences that have already been collected. Using Myrna, Leek and his colleagues were able to analyze 1.1 billion reads in a little over an hour and a half using 40 eight-core computers. According to Leek, the cost of this analysis was $73.

“If you ran that same analysis on your desktop computer it would take over a week to run,” Leek wrote. “So the time savings is pretty substantial. Since the experiments costs tens of thousands of dollars to perform, the extra $73 is a relatively minor cost for completing the whole analysis.”

With more and more sequences being identified and datasets assembled of DNA and RNA sequences, the need for greater computing power and ability has also increased greatly.

“Next generation sequencing technology, like the technology used for RNA-sequencing, is producing larger and larger data sets in less and less time. To keep up with this explosion of data, more and more computing power is needed,” Leek wrote.

The cost of RNA sequencing still remains high though, and according to Leek the improvements in sequence analysis aid in minimizing the total cost of these projects, while leading to more powerful means of analysis.

“The computing time is relatively cheap on either our local clusters like HPSCC or on Amazon for people outside Hopkins,” Leek wrote. “Tools like Myrna take advantage of these resources to build software that can be  cheaply and effectively used by researchers to analyze the huge data sets they are producing.”

RNA sequencing has been a very essential tool to identifying the expression levels of various genes in the human genome that spans three billion base pairs. Multiple RNA sequences can arise from a particular gene and the amount of certain RNA strands present can give some insight into the expression levels of various genes.

“If you figure out which genes these sequences come from and then count how many there are in a sample, you can figure out how much each gene in the body is expressed — that is how turned on or turned off it is,” Leek wrote.

Furthermore, RNA sequencing can give further insight into what genes are involved in many diseases that affect humans.

“You can compare this information between samples from healthy people and people with complex diseases like cancer, to try to figure out how the expression of different genes are associated with the disease,” Leek wrote.


Have a tip or story idea?
Let us know!

Comments powered by Disqus

Please note All comments are eligible for publication in The News-Letter.

Podcast
Multimedia
Be More Chill
Leisure Interactive Food Map
The News-Letter Print Locations
News-Letter Special Editions