Home Tags Posts tagged with "Peter Koo"

Peter Koo

CSHL’s Peter Koo, former student Shushan Toneyan develop AI tool to test AI genetic program

by Daniel Dunaief - October 14, 2024

0 1175

CSHL's Peter Koon and Shushan Toneyan — Shushan Toneyan and Peter Koo at Cold Spring Harbor Laboratory. Photo by Gina Motisi/CSHL

By Daniel Dunaief

The real and virtual world are filled with so-called “black boxes,” which are often impenetrable to light and contain mysteries, secrets, and information that is not available to the outside world.

Sometimes, people design these black boxes to keep concepts, ideas or tools outside the public realm. Other times, they are a part of a process, such as the thinking behind why we do certain things even when they cause us harm, that would benefit from an opening or a better understanding.

In the world of artificial intelligence, programs learn from a collection of information, often compiling and comparing enormous collections of data, to make a host of predictions.

Companies and programmers have written numerous types of code to analyze genetic data, trying to determine which specific mutations or genetic alterations might lead to conditions or diseases.

Left on their own, these programs develop associations and correlations in the data, without providing any insights into what they may have learned.

That’s where Peter Koo, Assistant Professor at Cold Spring Harbor Laboratory, and his former graduate student Shushan Toneyan come in.

The duo recently published a paper in Nature Genetics in which they explain a new AI-powered tool they designed called CREME, which explored the genetic analysis tool Enformer.

A collaboration between Deep Mind and Calico, which is a unit of Google owner Alphabet, Enformer takes DNA sequences and predicts gene expression, without explaining what and how it’s learning.

CREME is “a tool that performs like large-scale experiments in silico [through computer modeling] on a neural network model that’s already been trained,” said Koo.

“There are a lot of these models already in existence, but it’s a mystery why they are making their predictions. CREME is bridging that gap.”

Award winning research

Indeed, for her work in Koo’s lab, including developing CREME, Toneyan recently was named a recipient of the International Birnstiel Award for Doctoral Research in Molecular Life Sciences.

“I was genuinely surprised and happy that they selected my thesis and I would get to represent CSHL and the Koo lab at the ceremony in Vienna,” Toneyan, who graduated from the School of Biological Sciences, explained.

Toneyan, who grew up in Yerevan, Armenia, is currently a researcher in The Roche Postdoctoral Fellowship Programme in Zurich, Switzerland.

She said that the most challenging parts of designing this tool was to focus on the “interesting and impactful experiments and not getting sidetracked by more minor points more likely to lead to a dead end.”

She credits Koo with providing insights into the bigger picture.

New knowledge

Without taking DNA, running samples in a wet lab, or looking at the combination of base pairs that make up a genetic code from a live sample, CREME can serve as a way to uncover new biological knowledge.

CREME interrogates AI models that predict gene expression levels from DNA sequences.

“It essentially replicates biological or genetic experiments in silico through the lens of the model to answer targeted questions about genetic mechanisms,” Toneyan explained. “We mainly focused on analyzing the changes in models outputs depending on various perturbations to the input.”

By using computers, scientists can save considerable time and effort in the lab, enabling those who conduct these experiments to focus on the areas of the genome that are involved in various processes and, when corrupted, diseases.

If scientists conducted these experiments one mutation at a time, even a smaller length sequence would require many experiments to analyze.

The tool Koo and Toneyan created can deduce precise claims of what the model has learned.

CREME perturbs large chunks of input sequence to see how model predictions change. It interrogates the model by measuring how changes in the input affect model outputs.

“We need to interpret AI models to trust their deployment,” Toneyan said. “In the context of biological applications, we are also very interested in why they make a certain prediction so that we learn about the underlying biology.”

Using ineffective and untested predictive models will cause “more harm than good,” added Koo. “You need to interpret [the AI model’s] programs to trust them for their reliable deployment” in the context of genetic studies

Enhancers

Named for Cis Regulatory Element Model Explanations, CREME can find on and off switches near genetic codes called enhancers or silencers, respectively.

It is not clear where these switches are, how many there are per gene and how they interact. CREME can help explore these questions, Toneyan suggested.

Cis regulatory elements are parts of non-coding DNA that regulate the transcription of nearby genes, altering whether these genes manufacture or stop producing proteins.

By combining an AI powered model such as Enformer with CREME, researchers can narrow down the possible list of enhancers that might play an important genetic role.

Additionally, a series of enhancers can sometimes contribute to transcription. A wet lab experiment that only knocked one out might not reveal the potential role of this genetic code if other nearby areas can rescue the genetic behavior.

Ideally, these models would mimic the processes in a cell. At this point, they are still going through improvements and are not in perfect agreement with each other or with live cells, Toneyan added.

Scientists can use the AI model to aid in the search for enhancers, but they can’t blindly trust them because of their black box nature.

Still, tools like CREME help design genetic perturbation experiments for more efficient discovery.

At this point, the program doesn’t have a graphical user interface. Researchers could use python scripts released as packages for different models.

In the longer term, Koo is hoping to build on the work he and Toneyan did to develop CREME.

“This is just opening the initial doors,” he said. “One could do it more efficiently in the future. We’re working on those methods.”

Koo is pleased with the contribution Toneyan made to his lab. The first graduate student who worked with him after he came to Cold Spring Harbor Laboratory, Koo suggested that Toneyan “shaped my lab into what it is.”

CSHL’s Peter Koo studies how computers find genetic patterns

by Daniel Dunaief - March 27, 2021

0 1886

Peter Koo Photo from CSHL

By Daniel Dunaief

The goal sounds like a dystopian version of a future in which computers make critical decisions that may or may not help humanity.

Peter Koo, Assistant Professor and Cancer Center Member at Cold Spring Harbor Laboratory, would like to learn how to design neural networks so they are more interpretable, which will help build trust in the networks.

The neural networks he’s describing are artificial intelligence programs designed to link a molecular function to DNA sequences, which can then inform how mutations to the DNA sequences alter the molecular function. This can help “propose a potential mechanism that plays a causal role” for a mutation in a given disease, he explained in an email.

Researchers have created numerous programs that learn a range of tasks. Indeed, scientists can and have developed neural networks in computer vision that can perform a range of tasks, including object recognition that might differentiate between a wolf and a dog.

Koo when he received a COVID vaccination.

With the pictures, people can double check the accuracy of these programs by comparing the program’s results to their own observations about different objects they see.

While the artificial intelligence might get most or even all of the head-to-head comparisons between dogs and wolves correct, the program might arrive at the right answer for the wrong reason. The pictures of wolves, for example, might have all been taken during the winter, with snow in the background The photos of dogs, on the other hand, might have cues that include green grass.

The neural network program can arrive at the right answer for the wrong reason if it is focused on snow and grass rather than on the features of the animal in a picture.

Extending this example to the world of disease, researchers would like computer programs to process information at a pace far quicker than the human brain as it looks for mutations or genetic variability that suggests a predisposition for a disease.

The problem is that the programs are learning in the same way as their programmers, developing an understanding of patterns based on so-called black box thinking. Even when people have designed the programs, they don’t necessarily know how the machine learned to emphasize one alteration over another, which might mean that the machine is focused on the snow instead of the wolf.

Koo, however, would like to understand the artificial intelligence processes that lead to these conclusions.

In research presented in the journal Nature Machine Intelligence, Koo provides a way to access one level of information learned by the network, particularly DNA patterns called motifs, which are sites associated with proteins. It also makes the current tools that look inside black boxes more reliable.

“My research shows that just because the model’s predictions are good doesn’t mean that you should trust the network,” Koo said. “When you start adding mutations, it can give you wildly different results, even though its predictions were good on some benchmark test set.”

Indeed, a performance benchmark is usually how scientists evaluate networks. Some of the data is held out so the network has never seen these during training. This allows researchers to evaluate how well the network can generalize to data it’s never seen before.

When Koo tests how well the predictions do with mutations, they can “vary significantly,” he said. They are “given arbitrary DNA positions important scores, but those aren’t [necessarily] important. They are just really noisy.”

Through something Koo calls an “exponential activation trick,” he reduces the network’s false positive predictions, cutting back the noise dramatically.

“What it’s showing you is that you can’t only use performance metrics like how accurate you are on examples that you’ve never seen before as a way to evaluate the model’s ability to predict the importance of mutations,” he explained.

Like using the snow to choose between a wolf and a dog, some models are using shortcuts to make predictions.

“While these shortcuts can help them make predictions that [seem more] accurate, like with the data you trained it on, it may not necessarily have learned the true essence of what the underlying biology is,” Koo said.

By learning the essence of the underlying biology, the predictions become more reliable, which means that the neural networks will be making predictions for the right reason.

The exponential activation is a noise suppressor, allowing the artificial intelligence program to focus on the biological signal.

The data Koo trains the program on come from ENCODE, which is the ENCyclopedia Of DNA Elements.

“In my lab, we want to use these deep neural networks on cancer,” Koo said. “This is one of the major goals of my lab’s research at the early stages: to develop methods to interpret these things to trust their predictions so we can apply them in a cancer setting.”

At this point, the work he’s doing is more theoretical than practical.

“We’re still looking at developing further tools to help us interpret these networks down the road so there are additional ways we can perform quality control checks,” he said.

Koo feels well-supported by others who want to understand what these networks are learning and why they are making a prediction.

From here, Koo would like to move to the next stage of looking into specific human diseases, such as breast cancer and autism spectrum disorder, using techniques his lab has developed.

He hopes to link disease-associated variance with a molecular function, which can help understand the disease and provide potential therapeutic targets.

While he’s not a doctor and doesn’t conduct clinical experiments, Koo hopes his impact will involve enabling more trustworthy and useful artificial intelligence programs.

Artificial intelligence is “becoming bigger and it’s undoubtedly impactful already,” he said. “Moving forward, we want to have transparent artificial intelligence we can trust. That’s what my research is working towards.”

He hopes the methods he develops in making the models for artificial intelligence more interpretable and trustworthy will help doctors learn more about diseases.

Koo has increased the size and scope of his lab amid the pandemic. He current has eight people in his lab who are postdoctoral students, graduate students, undergraduates and a master’s candidate.

Some people in his lab have never met in person, Koo said. “I am definitely looking forward to a normal life.”

CSHL’s Peter Koo tackles deep learning

by Daniel Dunaief - October 23, 2019

0 1812

cshlkooheadshot — Peter Koo. Photo by ©Gina Motisi, 2019/ CSHL

By Daniel Dunaief

We built a process that works, but we don’t know why. That’s what one of the newest additions to Cold Spring Harbor Laboratory hopes to find out.

Researchers have applied artificial intelligence in many areas in biology and health care. These systems are making useful predictions for the tasks they are trained to perform. Artificial intelligence, however, is mostly a hands-off process. After these systems receive training for a particular task, they learn patterns on their own that help them make predictions.

How these machines learn, however, has become as much of a black box as the human brains that created these learning programs in the first place. Deep learning is a way to build hierarchical representations of data, explained Peter Koo, an assistant professor at the Simons Center for Quantitative Biology at CSHL, who studies the way each layer transforms data and the next layer builds upon this in a hierarchical manner.

Koo, who earned his doctorate at Yale University and performed his postdoctoral research at Harvard University, would like to understand exactly what the machines we created are learning and how they are coming up with their conclusions.

“We don’t understand why [these artificial intelligence programs] are making their predictions,” Koo said. “My postdoctoral research and future research will continue this line of work.”

Koo is not only interested in applying deep learning to biological problems to do better, but he’s also hoping to extract out what knowledge these machines learn from the data sets to understand why they are performing better than some of the traditional methods.

“How do we guide black box models to learn biologically meaningful” information? he asked. “If you have a data set and you have a predictive model that predicts the data well, you assume it must have learned something biologically meaningful,” he suggested. “It turns out, that’s not always the case.”

Deep learning can pick up other trends or links in the data that might not be biologically meaningful. In a simplistic example, an artificial intelligence weather system that tracked rain patterns during the spring might conclude, after seven rainy Tuesdays, that it rains on Tuesdays, even if the day of the week and the rain don’t have a causative link.

“If the model is trained with limited data that is not representative, it can easily learn patterns that are correlative in the training data,” Koo said. He tries to combat this in practice by holding out some data, which is called validating data. Scientists use it to evaluate how well the model generalizes to new data.

Koo plans to collaborate with numerous biologists at Cold Spring Harbor Laboratory, as well as other quantitative biologists, like assistant professors Justin Kenney and David McCandlish.

In an email, Kenney explained that the Simons Center is “very interested in moving into this area, which is starting to have a major impact on biology just as it has in the technology industry.”

The quantitative team is interested in high-throughput data sets that link sequence to function, which includes assays for protein binding, gene expression, protein function and a host of others. Koo plans to take a “top down” approach to interpret what the models have learned. The benefit of this perspective is that it doesn’t set any biases in the models.

Deep learning, Koo suggested, is a rebranding of artificial neural networks. Researchers create a network of simple computational units and collectively they become a powerful tool to approximate functions.

A physicist by training, Koo taught himself his expertise in deep learning, Kenney wrote in an email. “He thinks far more deeply about problems than I suspect most researchers in this area do,” he wrote. Kenney is moving in this area himself as well, because he sees a close connection between the problem of how artificial intelligence algorithms learn to do things and how biological systems mechanistically work.

While plenty of researchers are engaged in the field of artificial intelligence, interpretable deep learning, which is where Koo has decided to make his mark, is a considerably smaller field.

“People don’t trust it yet,” Koo said. “They are black box models and people don’t understand the inner workings of them.” These systems learn some way to relate input function to output predictions, but scientists don’t know what function they have learned.

Koo chose to come to Cold Spring Harbor Laboratory in part because he was impressed with the questions and discussions during the interview process.

Koo, daughter Evie (left) and daughter Yeonu (right) during Halloween last year. Photo by Soohyun Cho

He started his research career in experimental physics. As an undergraduate, he worked in a condensed matter lab of John Clarke at the University of California at Berkeley. He transitioned to genomics, in part because he saw a huge revolution in next-generation sequencing. He hopes to leverage what he has learned to make an impact toward precision medicine.

Biological researchers were sequencing all kinds of cancers and were trying to make an impact toward precision medicine. “To me, that’s a big draw,” Koo said, “to make contributions here.”

A resident of Jericho, Koo lives with his wife, Soohyun Cho, and their 6-year-old daughter Evie and their 4-year old-daughter Yeonu.

Born and raised in the Los Angeles area, he joined the Army Reserves after high school, attended community college and then transferred to UC Berkeley to get his bachelor’s degree in physics.

As for his decision to join Cold Spring Harbor Laboratory, Koo said he is excited with the opportunity to combine his approach to his work with the depth of research in other areas.

“Cold Spring Harbor Laboratory is one of those amazing places for biological research,” Koo said. “What brought me here is the quantitative biology program. It’s a pretty new program” that has “incredibly deep thinkers.”

Peter Koo

CSHL’s Peter Koo, former student Shushan Toneyan develop AI tool to test AI genetic program

CSHL’s Peter Koo studies how computers find genetic patterns

CSHL’s Peter Koo tackles deep learning

Most Viewed

Brookhaven Town Councilwoman Bonner attends 20th Annual Blessing of the Fleet at Mount Sinai Yacht Club

Letters to the Editor: June 26, 2025

Miller Place – Unique Home on 1.38 Acre With Expansive LI Sound Views!