March 19, 2024 | Scientists in the Georgia Tech Integrated Cancer Research Center (ICRC) are proposing a new, more realistic approach to diagnosing cancer that provides a probabilistic statement about the likelihood of developing it, much like how cholesterol tests are used to assess heart disease risk. Probabilities are based on an individual’s metabolic profile and the first population group to be targeted are women at high risk for ovarian cancer, according to John McDonald, Ph.D., professor emeritus in the school of biological sciences at Georgia Institute of Technology and founding director of the ICRC.
“We chose [blood] metabolites because they’re the endpoints of all molecular processes going on in the cell,” McDonald says. “The metabolic profiles are very close to the phenotype.”
Since only about 7% of the more than 34,000 metabolites in the bloodstream have been characterized, the testing focus is on patterns in metabolic profiles that correlate with the presence of cancer in large datasets, says McDonald. Teasing out that sort of information is an ideal exercise for artificial intelligence (AI).
“The whole idea was to use a series of [machine learning] tools and quickly train them using state-of-the-art algorithms to extract robust results which are generalizable,” explains Regents’ Professor Jeffrey Skolnick, Ph.D., who also serves as chair in Georgia Tech’s school of biological sciences and Georgia Research Alliance Eminent Scholar in Computational Systems Biology. The tool is effectively looking for the proverbial needle in the haystack—the one in every 78 women who get ovarian cancer during their lifetime—based on the distinct metabolic pattern seen in afflicted individuals.
In a newly published study, the predictive model achieved 93% accuracy when tested on blood samples from 564 women across four different geographies in the U.S. and Canada (Gynecologic Oncology, DOI: 10.1016/j.ygyno.2023.12.030). Most of them were active ovarian cancer patients.
The test had better accuracy in detecting ovarian cancer than existing tests for women clinically classified as normal, says Skolnick. It was particularly good at identifying early-stage ovarian disease in that cohort.
Up to now, efforts to develop diagnostics for ovarian cancer just haven’t worked, says McDonald. “In fact, diagnostics in any cancer is not that good.” Even with the widely used PSA test for prostate cancer, false-positive test results are common—and concerning.
A probabilistic statement is both a more accurate and personalized approach to cancer diagnostics than the traditional binary (yes/no) tests, he says. “If you have cancer according to some test but the test is only 60% accurate, where does that leave you? You don’t know what to do.”
Cancer is a complex disease having multiple paths of development within each subtype, McDonald continues. “It is [therefore] difficult to find a single biomarker that is going to uniformly characterize the disease across the full spectrum of patients.”
Little is known about the cause and effects of cancers both because of this heterogeneity and the fact that molecular problems can arise on many levels, including genetic mutations, epigenetic changes, and protein modifications, says McDonald. Using one over the other as a disease biomarker identifies only the cases that started for that reason.
Metabolites, on the other hand, collectively reflect all changes that could be occurring on the molecular level to affect the phenotype, he adds.
From a computational standpoint, it is critical that AI tools be adjusted to the size of the datasets on which they’re to be deployed and trained using a variety of different relatively sensitive methods to ensure “all roads lead to Rome” with as few false positives as possible, says Skolnick. The vast world of metabolites could thereby be reduced to hundreds correlated with some degree of frequency and importance value to ovarian cancer.
Equally imperative is that the metabolites be accurately measured by mass spectrometry, which is subject to significant “instrument drift,” says McDonald. “Even if you start an experiment in the morning, by the afternoon the instrument will have drifted making the baseline values change, so ... how can I compare results I do on Monday with results I do on Wednesday, never mind comparing assays done in a laboratory in Atlanta to assays that were done in a laboratory in New York?”
His answer was to run a control sample for every 10 patient samples analyzed and do so across instruments and laboratories to normalize values to that standard, McDonald says. The best computer scientist in the world won’t get anywhere without reliable data, he notes.
McDonald and Skolnick have their sights on the early detection of cancer, but not necessarily all early-stage cancers. It’s a subtle but important difference based on recent discoveries about the disease.
Clinically speaking, early stage means the tumor is confined to the tissue of origin and late stage means the cancer has spread throughout the body, McDonald says. The somewhat “unjustified assumption” is that cancers go from stage 1 to stage 4 as a reflection of their temporal progression. “We now know that may not always be the case.”
With type 1 ovarian cancer, for instance, a tumor remains in the ovaries for an indefinite period and some of them will never progress—making them less medically concerning, says McDonald. Type 2 ovarian cancer, on the contrary, will quickly spread from the day of its arrival.
It’s the same story with prostate cancer, Skolnick points out. With the slow-growing variety, “you can live to 106 and something else kills you.”
In the latest study, the predictive accuracy of the AI-based consensus classifier was not only stage-agnostic but performed a bit better on the early-stage than the late-stage samples, he adds. Overall, its positive predictive value was better for cancer than non-cancer with very few false positives. Consistent with the idea of two ovarian cancer types, the metabolic profiles of the stage 1 ovarian cancers also didn’t all look the same.
Machine learning only learns what you put in it, says McDonald, referencing a decade-old experiment where highly accurate predictions made on 40 patient samples from Atlanta couldn’t be repeated on samples collected elsewhere by different groups. “So, the larger your dataset the better because we are trying to capture all of the variable profiles that exist among ovarian cancer patients.”
This is likely one of the big reasons the predictive model worked so well in the latest study, he says. Samples were collected from hundreds of women in different age and ethnic groups to capture as much heterogeneity as possible. Accuracy of the test would likely improve considerably with, say, a dataset of 10,000 patients, adds McDonald. It might then be possible to tease out any differences based on demographic characteristics.
As it is, around 1,000 patients have to date been profiled by the new probabilistic model. The plan is to expand the dataset by an additional 1,000 patients next year, focusing on women at high risk of developing ovarian cancer, he reports. That means women who have a better than one-out-of-78 chance of getting ovarian cancer in their lifetime, says Skolnick, which include those with a BRCA 2 mutation who have a roughly 50% lifetime probability of developing the disease. Such women might be identified through genetic testing services like 23andMe and Ancestry.com.
“The goal initially is to offer the diagnostic tool for free to women... through their clinician” to learn their probability of having ovarian cancer, he says. Their doctor might then use the results in choosing their next steps, which might include a recommendation that they periodically be retested or referred for advanced screening.
The large-scale study with the high-risk women could provide the proof of principle needed to support further clinical development, says Skolnick. “This could be like a pap smear where a woman goes to see her gynecologist, gets her blood drawn... and within a week or so the clinician knows the likelihood of that patient having ovarian cancer.”
OC Dx, as it is being called, will initially be a laboratory developed test (LDT), he adds. Longer term he and his team plan to seek its approval by the Food and Drug Administration (FDA).
Patient samples are already being collected via partnerships with clinicians in Atlanta, McDonald says, and similar collaborations will be established around the country to reach the enrollment goal for the pilot test. Since many women who test BRCA positive opt for prophylactic surgery, the test-takers may also include individuals at high risk by virtue of having a family history of ovarian or breast cancer regardless of BRCA status.
McDonald says he invites contact from clinicians interested in making this testing option available to their high-risk patients, which requires informed consent and other regulatory paperwork. With a proposed rule announced last fall, the FDA has made plain its intentions to start providing greater oversight of LDTs.
He and Skolnick are in the process of establishing a startup company, called MyOncoDx, through which clinical testing of the diagnostic will be done. Over the next year, they will start looking at multiple other cancer types—specifically, triple-negative breast cancer and prostate cancer—where they hope to recapitulate the accuracy of the probabilistic approach.