CSHL’s Peter Koo studies how computers find genetic patterns

CSHL’s Peter Koo studies how computers find genetic patterns

Peter Koo Photo from CSHL

By Daniel Dunaief

The goal sounds like a dystopian version of a future in which computers make critical decisions that may or may not help humanity.

Peter Koo, Assistant Professor and Cancer Center Member at Cold Spring Harbor Laboratory, would like to learn how to design neural networks so they are more interpretable, which will help build trust in the networks.

The neural networks he’s describing are artificial intelligence programs designed to link a molecular function to DNA sequences, which can then inform how mutations to the DNA sequences alter the molecular function. This can help “propose a potential mechanism that plays a causal role” for a mutation in a given disease, he explained in an email.

Researchers have created numerous programs that learn a range of tasks. Indeed, scientists can and have developed neural networks in computer vision that can perform a range of tasks, including object recognition that might differentiate between a wolf and a dog.

Koo when he received a COVID vaccination.

With the pictures, people can double check the accuracy of these programs by comparing the program’s results to their own observations about different objects they see.

While the artificial intelligence might get most or even all of the head-to-head comparisons between dogs and wolves correct, the program might arrive at the right answer for the wrong reason. The pictures of wolves, for example, might have all been taken during the winter, with snow in the background The photos of dogs, on the other hand, might have cues that include green grass.

The neural network program can arrive at the right answer for the wrong reason if it is focused on snow and grass rather than on the features of the animal in a picture.

Extending this example to the world of disease, researchers would like computer programs to process information at a pace far quicker than the human brain as it looks for mutations or genetic variability that suggests a predisposition for a disease.

The problem is that the programs are learning in the same way as their programmers, developing an understanding of patterns based on so-called black box thinking. Even when people have designed the programs, they don’t necessarily know how the machine learned to emphasize one alteration over another, which might mean that the machine is focused on the snow instead of the wolf.

Koo, however, would like to understand the artificial intelligence processes that lead to these conclusions.

In research presented in the journal Nature Machine Intelligence, Koo provides a way to access one level of information learned by the network, particularly DNA patterns called motifs, which are sites associated with proteins. It also makes the current tools that look inside black boxes more reliable.

“My research shows that just because the model’s predictions are good doesn’t mean that you should trust the network,” Koo said. “When you start adding mutations, it can give you wildly different results, even though its predictions were good on some benchmark test set.”

Indeed, a performance benchmark is usually how scientists evaluate networks. Some of the data is held out so the network has never seen these during training. This allows researchers to evaluate how well the network can generalize to data it’s never seen before.

When Koo tests how well the predictions do with mutations, they can “vary significantly,” he said. They are “given arbitrary DNA positions important scores, but those aren’t [necessarily] important. They are just really noisy.”

Through something Koo calls an “exponential activation trick,” he reduces the network’s false positive predictions, cutting back the noise dramatically.

“What it’s showing you is that you can’t only use performance metrics like how accurate you are on examples that you’ve never seen before as a way to evaluate the model’s ability to predict the importance of mutations,” he explained.

Like using the snow to choose between a wolf and a dog, some models are using shortcuts to make predictions.

“While these shortcuts can help them make predictions that [seem more] accurate, like with the data you trained it on, it may not necessarily have learned the true essence of what the underlying biology is,” Koo said.

By learning the essence of the underlying biology, the predictions become more reliable, which means that the neural networks will be making predictions for the right reason.

The exponential activation is a noise suppressor, allowing the artificial intelligence program to focus on the biological signal.

The data Koo trains the program on come from ENCODE, which is the ENCyclopedia Of DNA Elements.

“In my lab, we want to use these deep neural networks on cancer,” Koo said. “This is one of the major goals of my lab’s research at the early stages: to develop methods to interpret these things to trust their predictions so we can apply them in a cancer setting.”

At this point, the work he’s doing is more theoretical than practical.

“We’re still looking at developing further tools to help us interpret these networks down the road so there are additional ways we can perform quality control checks,” he said.

Koo feels well-supported by others who want to understand what these networks are learning and why they are making a prediction.

From here, Koo would like to move to the next stage of looking into specific human diseases, such as breast cancer and autism spectrum disorder, using techniques his lab has developed.

He hopes to link disease-associated variance with a molecular function, which can help understand the disease and provide potential therapeutic targets.

While he’s not a doctor and doesn’t conduct clinical experiments, Koo hopes his impact will involve enabling more trustworthy and useful artificial intelligence programs.

Artificial intelligence is “becoming bigger and it’s undoubtedly impactful already,” he said. “Moving forward, we want to have transparent artificial intelligence we can trust. That’s what my research is working towards.”

He hopes the methods he develops in making the models for artificial intelligence more interpretable and trustworthy will help doctors learn more about diseases.

Koo has increased the size and scope of his lab amid the pandemic. He current has eight people in his lab who are postdoctoral students, graduate students, undergraduates and a master’s candidate.

Some people in his lab have never met in person, Koo said. “I am definitely looking forward to a normal life.”