CSHL’s Jason Sheltzer and Google’s Joan Smith team up in cancer study

CSHL’s Jason Sheltzer and Google’s Joan Smith team up in cancer study

Jason Sheltzer. Photo by ©Gina Motisi, 2018/CSHL

By Daniel Dunaief

A diagnosis of cancer brings uncertainty and anxiety, as a patient and his or her family confront a new reality. But not all cancers are the same and not all patients are the same, making it difficult to know the severity of the disease.

As doctors increasingly focus on individual patient care, researchers are looking to use a wealth of information available through new technology to assist with everything from determining cancer risks, to making early diagnoses, to providing treatment and aftercare.

Jason Sheltzer, a fellow at Cold Spring Harbor Laboratory, and his partner Joan Smith, a senior software engineer at Google, have sought to use the genetic fingerprints of cancer to determine the likely course of the disease.

By looking at genes from 20,000 cancer patients, Sheltzer and Smith found that a phenomenon called copy number variation, in which genes add copies of specific long or short sequences, is often a good indicator of the aggressiveness of the cancer. Those cancers that have higher copy number variation are also likely to be the most aggressive. They recently published their research in the journal eLife.

While the investigation, which involved work over the course of four years, is in a preliminary stage, this kind of prognostic biomarker could offer doctors and patients more information from which to make decisions about treatment. It could also provide a better understanding of the course of a disease, as copy number variation changes as cancer progresses.

“The main finding is simply that copy number variation is a much more potent prognostic biomarker than people had realized,” Sheltzer said. “It appears to be more informative than mutations in most single genes.”

Additionally, despite having data from those thousands of patients, Sheltzer and Smith don’t yet know if there’s a tipping point, beyond which a cancer reaches a critical threshold.

Some copy number changes also were more problematic than others. “Our analysis suggested that copy number alterations affecting a few key oncogenes and tumor suppressors seemed to be particularly bad news for patient prognosis,” Sheltzer said, adding that they weren’t able to do a clinical follow-up to determine how genes changed as the cancer progressed. 

“Hopefully, we can follow up this study, where we can do a longitudinal analysis,” he said.

Joan Smith. Photo by Seo-young Silvia Kim

Smith, who has written computer code for Twitter, Oracle and now Google, wrote code that’s specific to this project. “The analysis we’ve done here is new and is on a much more significant scale than the analysis we did in the past,” she said.

Within the paper, Smith was able to reuse parts of code that were necessary for different related experiments. Some of the reusable code cleaned up the data and provided a useful format, while some of the code searched for genetic patterns.

“There is considerable refinement that went into writing this code, and into writing code in general,” she explained in an email.

Smith has a full-time job at Google, where she has to clear any work she does with Sheltzer with the search engine. Before publication, she sent the paper to Google for approval. She works with Sheltzer “on her personal time,” and her efforts have “nothing to do with Google or Google Tools.”

The search engine company “tends to be supportive of employees doing interesting and valuable external work, as long as it doesn’t make use of any Google confidential information or Google owned resources,” including equipment supplies or facilities, she explained in an email.

The phenomenon of copy number variation occurs frequently in people in somatic cells, including those who aren’t battling a deadly disease Sheltzer said. “People in general harbor a lot of normal copy number variation,” he added.

Indeed, other types of repetitive changes in the genome have played a role in various conditions.

Some copy number variations, coupled with deletions, can be especially problematic. A tumor suppressor gene called P53, which is widely studied in research labs around the world, can accumulate copy number variations.

“Patients who have deletions in P53 tend to accumulate more copy number alterations than patients who don’t,” Sheltzer said. “A surprising result from our paper is that copy number variation goes above and beyond P53 mutations. You can control for P53 status and still find copy number variations that act as significant prognostic biomarkers.”

The copy number variations Sheltzer and Smith were examining were affecting whole genes, of about 10,000 bases or longer.

“We think that is because cancers are Darwinian,” explained Sheltzer. “The cells are competing against one another to grow the fastest and be the most aggressive. If a cancer amplifies one potent oncogene, it’s good for the cancer. If the cancer amplifies 200 others, it conveys a fitness penalty in the context of cancer.”

Smith is incredibly pleased to have the opportunity to contribute her informatics expertise to Sheltzer’s research, bringing together skill sets that are becoming increasingly important to link as technology makes it possible to accumulate a wealth of data in a much shorter term and at considerably lower expense.

Smith has a physics degree from MIT and has been in the technology world ever since.

“It’s been super wonderful and inspiring to get to do both” technology work and cancer research, she said.

The dynamic scientific duo live in Mineola. They chose the location because it’s equidistant between their two commutes, which are about 35 minutes. When they are not working, the couple, who have been together for eight years and have been collaborating in their research for almost all of them, enjoy biking, usually between 30 to 60 miles at a time.

Sheltzer greatly appreciates Smith’s expertise in using computer programs to mine through enormous amounts of data.

They are working on the next steps in exploring patient data.

NO COMMENTS

Leave a Reply