BNL’s Yoo uses machine learning to climb mountains of data

BNL’s Yoo uses machine learning to climb mountains of data

Shinjae Yoo with his son Erum

By Daniel Dunaief

He works with clouds, solar radiation and nanoparticles, just to name a few. The subjects Shinjae Yoo, a computational scientist at Brookhaven National Laboratory, tackles span a broad range of arenas, primarily because his focus is using large pieces of information and making sense of them.

Yoo helps refine and make sense of searches. He develops big data streaming algorithms that can apply to any domain where data scalability issues arise. Integrating text analysis with social network analysis, Yoo did his doctoral research at Carnegie Mellon University, where he also earned a master’s degree, on creating systems that helped prioritize these electronic messages.

“If you are [traveling and] in the airport, before you get into your plane, you want to check your email and you don’t have much time,” he said. While this isn’t the main research work he is doing at the lab, this is the type of application for his work. Yoo developed his technical background on machine learning when he was at Carnegie Mellon. He said he continues to learn, improve and develop machine learning methods in various science domains. By using a statistical method that combines computational science skills, statistics and applied math, he can offer a comprehensive and, in some cases, rapid analysis of information.

Colleagues and collaborators suggested Yoo has made an impact with his work in a wide range of fields. His “contribution is not only in the academic field, but also means a lot on the industrial and academic field,” Hao Huang, a machine learning scientist at GE Global Research, wrote in an email. “He always focuses on making good use of data mining and machine learning theory on real world [areas] such as biology, renewable energy and [in the] material science domain.”

Yoo explained how a plant biologist can do stress conditioning for a plant with one goal in mind. That scientist can collect data over the course of 20 years and then they can “crunch the data, but they can’t always analyze it,” which might be too unwieldy for a bench scientist to handle. Using research from numerous experiments, scientists can study the data, which can provide a new hypothesis. Exploring the information in greater detail, and with increased samples, can also lead to suggestions for the best way to design future experiments.

Yoo said he can come to the scientist and use machine learning to help “solve their science data problem,” giving the researchers a clearer understanding of the broad range of information they collected. “Nowadays, generated data is very easy,” but understanding and interpreting that information presents bigger challenges. Take the National Synchrotron Light Source II at BNL. The $912 million facility, which went live online earlier this year, holds considerable promise for future research. It can look at the molecules in a battery as the battery is functioning, offering a better understanding of why some batteries last considerably longer than others. It can also offer a look at the molecular intermediaries in biochemical reactions, offering a clearer and detailed picture of the steps in processes that might have relevance for disease, drug interactions or even the creation of biological products like shells. He usually helps automate data analytics or bring new hypotheses to scientists, Yoo said. One of the many challenges in experiments at facilities like the NSLS II and the Center for Functional Nanomaterials, also at BNL, is managing the enormous flow of information that comes through these experiments.

Indeed, at the CFN, the transmission electron microscopy generates 3 gigabytes per second for the image stream. Using streaming analysis, he can provide an approximate understanding of the information. Yoo received a $1.9 million, three-year Advanced Scientific Computer Research grant this year. The grant is a joint proposal for which Yoo is the principal investigator. This grant, which launched this September, is about high-performance computing enabled machine learning for spatio-temporal data analysis. The primary application, he said, is in climate. He plans to extend it to other data later, including, possibly for NSLS II experiments.

Yoo finds collaborators through emails, phone calls, seminars or anywhere he meets other researchers. Huang, who started working with Yoo in 2010 when Huang was a doctoral candidate at Stony Brook, appreciates Yoo’s passion for his work. Yoo is “dedicated to his research,” Huang explained. “When we [ran] our proposed methods and got results that [were] better than any of the existing work, he was never satisfied and [was] always trying to further explore to get even better performance.”

When he works with collaborators in many disparate fields, he has found that the fundamental data analysis methodologies are similar. He needs to do some customization and varied preprocessing steps. There are also domain-specific terms. When Yoo came to BNL seven years ago, some of his scientific colleagues around the country were not eager to embrace his approach to sorting and understanding large pools of data. Now, he said other researchers have heard about machine learning and what artificial intelligence can do and they are eager to “apply those methods and publish new papers.”

Born and raised in South Korea, Yoo is married to Hayan Lee, who earned her PhD at Stony Brook and studies computational biology and specializes in genome assembly. They have a four-year old son, Erum. Yoo calls his son “his great joy” and said he “gives me a lot of happiness. Hanging around my son is a great gift.”

When Yoo was entering college in South Korea, he said his father, who had worked at the National Institute of Forest Science, played an important role. After his father consulted with people about different fields, he suggested Yoo choose computer science over chemistry, which would have been his first choice. “He concluded that computer science would be a new field that would have a great future, which is true, and I appreciate my dad’s suggestion,” Yoo said.