Published by the Students of Johns Hopkins since 1896
August 14, 2022

Smart software identifies gene promoters

By MICHAEL YAMAKAWA | November 15, 2012

Like Skynet from the movie Terminator, computers in our generation are actually capable of analyzing inputted data and learning how to react differently to it. In the field of genetics, two groups of researchers have designed software that can analyze and learn new data in genomes to ultimately identify which sequence variations can become a health hazard. Each publication focused on different genes — the brain and melanocytes.

The new machine learning approach isolates and juxtaposes sequences while creating new sets of rules for itself to categorize certain gene motifs, based on the gene’s regulatory machinery, like the transcription factor (TF).

The “classifier,” as the software was called in the brain publication, was able to reveal a wide array of new hindbrain gene enhancers with as much as 88 percent validation with biological data. The second study focused on identifying the regulatory control of genes in epidermal melanocytes, which are the pigments responsible for the color of our skin.

Genes are expressed through a process known as transcription and translation; transcription produces an intermediate chain of nucleic acids encoding the sequence of a particular protein, one of many that make up a great portion of our cell machinery. In complex organisms, such as eukaryotes, transcription is accompanied not only by the transcription factor but also by an vast range of other assisting proteins and genes, one of which includes regions called enhancers.

TFs can bind to enhancers, which allow the TFs to communicate optimally with the gene of interest like the hindbrain or melanocytes. The identification of enhancers have proved very difficult in the past, as enhancers reside at a distant location from the genes they regulate and traditional methods restrict researchers to observing sequences that are close to the gene of interest.

As new in silico, or computational, technologies advanced, effective means for studying enhancers were introduced. It was not until recently, however, that the power of computation has grown to the extent of studying at such an unprecedented scale.

In the brain study, researchers questioned whether their new technology could be used to uncover complex cellular machinery in the central nervous system. They decided to study the regulatory control of the hindbrain, the most primitive part of our brain and, if degenerated, can be responsible for disorders like schizophrenia, ADHD and autism. By determining the regulatory control in hindbrain genes, researchers can further help us understand more about the genetic components of disorders.

The software relies on its ability to detect transcription factor binding sites, since TFs bind to enhancers for transcription. After a list of 211 known enhancer genes of the hindbrain were “taught” to the computer, the computer ultimately yielded an 88 percent success rate of predictions for other possible enhancers. This was confirmed by in vivo experiments with zebrafish, a species frequently used for genomic studies. Solely through in silico experiments, these researchers were able to identify about 40,000 supposed hindbrain enhancers.

The melanocyte researchers, using similar techniques, were able to identify about 7,500 possible melanocyte enhancers. Gradually, as the computer sets rules that distinguish certain gene enhancers from others, researchers may find a easy way to study the regulation of genes that make up the rest of our body.

Experiments with the software were conducted by researchers from multiple affiliations, including Hopkins, the National Center for Biotechnology Information and the National Human Genome Research Institute.

Comments powered by Disqus

Please note All comments are eligible for publication in The News-Letter.

News-Letter Special Editions