By Daniel Dunaief
The real and virtual world are filled with so-called “black boxes,” which are often impenetrable to light and contain mysteries, secrets, and information that is not available to the outside world.
Sometimes, people design these black boxes to keep concepts, ideas or tools outside the public realm. Other times, they are a part of a process, such as the thinking behind why we do certain things even when they cause us harm, that would benefit from an opening or a better understanding.
In the world of artificial intelligence, programs learn from a collection of information, often compiling and comparing enormous collections of data, to make a host of predictions.
Companies and programmers have written numerous types of code to analyze genetic data, trying to determine which specific mutations or genetic alterations might lead to conditions or diseases.
Left on their own, these programs develop associations and correlations in the data, without providing any insights into what they may have learned.
That’s where Peter Koo, Assistant Professor at Cold Spring Harbor Laboratory, and his former graduate student Shushan Toneyan come in.
The duo recently published a paper in Nature Genetics in which they explain a new AI-powered tool they designed called CREME, which explored the genetic analysis tool Enformer.
A collaboration between Deep Mind and Calico, which is a unit of Google owner Alphabet, Enformer takes DNA sequences and predicts gene expression, without explaining what and how it’s learning.
CREME is “a tool that performs like large-scale experiments in silico [through computer modeling] on a neural network model that’s already been trained,” said Koo.
“There are a lot of these models already in existence, but it’s a mystery why they are making their predictions. CREME is bridging that gap.”
Award winning research
Indeed, for her work in Koo’s lab, including developing CREME, Toneyan recently was named a recipient of the International Birnstiel Award for Doctoral Research in Molecular Life Sciences.
“I was genuinely surprised and happy that they selected my thesis and I would get to represent CSHL and the Koo lab at the ceremony in Vienna,” Toneyan, who graduated from the School of Biological Sciences, explained.
Toneyan, who grew up in Yerevan, Armenia, is currently a researcher in The Roche Postdoctoral Fellowship Programme in Zurich, Switzerland.
She said that the most challenging parts of designing this tool was to focus on the “interesting and impactful experiments and not getting sidetracked by more minor points more likely to lead to a dead end.”
She credits Koo with providing insights into the bigger picture.
New knowledge
Without taking DNA, running samples in a wet lab, or looking at the combination of base pairs that make up a genetic code from a live sample, CREME can serve as a way to uncover new biological knowledge.
CREME interrogates AI models that predict gene expression levels from DNA sequences.
“It essentially replicates biological or genetic experiments in silico through the lens of the model to answer targeted questions about genetic mechanisms,” Toneyan explained. “We mainly focused on analyzing the changes in models outputs depending on various perturbations to the input.”
By using computers, scientists can save considerable time and effort in the lab, enabling those who conduct these experiments to focus on the areas of the genome that are involved in various processes and, when corrupted, diseases.
If scientists conducted these experiments one mutation at a time, even a smaller length sequence would require many experiments to analyze.
The tool Koo and Toneyan created can deduce precise claims of what the model has learned.
CREME perturbs large chunks of input sequence to see how model predictions change. It interrogates the model by measuring how changes in the input affect model outputs.
“We need to interpret AI models to trust their deployment,” Toneyan said. “In the context of biological applications, we are also very interested in why they make a certain prediction so that we learn about the underlying biology.”
Using ineffective and untested predictive models will cause “more harm than good,” added Koo. “You need to interpret [the AI model’s] programs to trust them for their reliable deployment” in the context of genetic studies
Enhancers
Named for Cis Regulatory Element Model Explanations, CREME can find on and off switches near genetic codes called enhancers or silencers, respectively.
It is not clear where these switches are, how many there are per gene and how they interact. CREME can help explore these questions, Toneyan suggested.
Cis regulatory elements are parts of non-coding DNA that regulate the transcription of nearby genes, altering whether these genes manufacture or stop producing proteins.
By combining an AI powered model such as Enformer with CREME, researchers can narrow down the possible list of enhancers that might play an important genetic role.
Additionally, a series of enhancers can sometimes contribute to transcription. A wet lab experiment that only knocked one out might not reveal the potential role of this genetic code if other nearby areas can rescue the genetic behavior.
Ideally, these models would mimic the processes in a cell. At this point, they are still going through improvements and are not in perfect agreement with each other or with live cells, Toneyan added.
Scientists can use the AI model to aid in the search for enhancers, but they can’t blindly trust them because of their black box nature.
Still, tools like CREME help design genetic perturbation experiments for more efficient discovery.
At this point, the program doesn’t have a graphical user interface. Researchers could use python scripts released as packages for different models.
In the longer term, Koo is hoping to build on the work he and Toneyan did to develop CREME.
“This is just opening the initial doors,” he said. “One could do it more efficiently in the future. We’re working on those methods.”
Koo is pleased with the contribution Toneyan made to his lab. The first graduate student who worked with him after he came to Cold Spring Harbor Laboratory, Koo suggested that Toneyan “shaped my lab into what it is.”