By Melissa Pandika
March 10, 2020 | Although artificial intelligence has generated plenty of buzz in biomedical research, thought leaders at the Molecular Medicine Tri-Conference in San Francisco last week showed that it’s more than just a flashy catchphrase. In a session entitled “Decoding Diseases in the Era of Precision Health Using AI and Machine Learning,” they discussed how applying AI to big data can yield valuable insights into disease that could enable more precise, proactive healthcare.
Atul Butte, the Priscilla Chan and Mark Zuckerberg Distinguished Professor at the University of California, San Francisco, opened the session by sharing his team’s efforts to integrate electronic health records (EHRs) across the six University of California health systems, which together form a single accountable care organization and clinically integrated network. He also discussed the use of analytics to transform this EHR data into evidence of drug efficacy.
Early in his talk, Butte pointed out the United States’ massive spending on EHRs, citing headlines about how Sutter Health, Kaiser Permanente, and Partners HealthCare have invested billions in the Epic EHR system. “If we don’t use this data to improve the practice of medicine, it will be a national tragedy given how much money we’ve spent on this,” he said.
Butte’s team has already used this data to visualize the sequence of drugs prescribed to each type 2 diabetes patient across the UC Health system. “We don’t just do comparative effectiveness of the drugs with each other, but comparative effectiveness of different strategies,” he explained. “We have so much natural experimentation going on here.” In fact, Butte and colleagues have used their approach to conduct multi-center, long-term comparative effectiveness studies, with financials, for type 2 diabetes and more. Their findings will appear in forthcoming papers.
The next step is to use these data to predict who will do well on a certain drug, potentially through a decision tree. Computers “are great at chasing down these moves to figure out how to win,” he said. “I think we can do this with medicine too.” He mentioned how his team has developed a concept known as the deep-learning healthcare system, in which deep-learning methods learn the best treatment decisions from EHR data, a potentially faster and more accurate approach than relying on a human expert to arrive at the best treatment decision.
Butte said that he also wanted to use this EHR data to create maps of patients’ progression from disease to death, as well as where they lie on this map. He shared one such map, which illustrated how patients who show up with a heart attack can return a year later with heart failure, and later, diseases of the lung and eventually septicemia, or blood poisoning, before death. Many end up dying of septicemia three years later, not the heart attack, suggesting that a fever a few years after a heart attack should probably raise more concern than it does currently.
“Now you start to realize we need to predict what’s going to happen in the next 90 days, what’s going to happen in the next year, and what are we going to do with [this information]?” Butte said. “That, to me, is the definition of an accountable care organization, using data and digital health to accountably care for all its patients.”
A central tenet of precision medicine is that individual gene mutations offer insight into disease, yet many genetic variants influence disease and disease risk in cancer. John Quackenbush, professor and chair of biostatistics at Harvard TH Chan School of Public Health, discussed his team’s use of network methods to tease apart this complexity.
Quackenbush described a method that displays single nucleotide polymorphisms (SNPs) and their associated genes as a bipartite graph, and uses the graph’s modular structure to reveal how SNPs influence a given phenotype. He and his team applied this method to chronic obstructive pulmonary disease and found 52 communities of highly interconnected SNPs and genes, 11 of them enriched for genes in specific functional categories. “These genes aren’t correlated in their expression, but they are correlated in how the SNPs regulate them” Quackenbush said. In other words, rather than a single gene being controlled by a single SNP, his team showed that a family of SNPs control a specific process. “They work together to change a process that leads you toward a phenotypic state.” These findings appeared in PLOS Computational Biology in 2016.
Quackenbush and colleagues used a similar approach to look at cancer-associated SNPs. As detailed in PNAS in 2017, they found that cancer risk SNPs map onto a small number of communities of genes that have functions associated with disease development and progression. In skin, for instance, they map onto communities associated with cell division, epithelium differentiation, and immunity, not just oncogenes or tumor suppressor genes. In other words, groups of SNPs regulate groups of genes involved in similar biological processes. What’s more, not only gene mutations, but also how those mutated genes are regulated, modify cancer risk.
Next, Jie Cheng, director of exploratory statistics at AbbVie, discussed the pharmaceutical company’s novel machine learning-based method to mine clinical trial datasets to identify subgroups of patients with different treatment effects. “What we’re interested in is finding predictive factors — factors that predict treatment effect,” Cheng said.
He explained that AbbVie’s method is based on patient subgroups defined by simple rules and uses an algorithm that involves training by exhaustively evaluating all possible subgroups at a certain search depth using an appropriate statistic and finding the subgroup with the best z-score. Once the training and testing procedures are defined, AbbVie does a cross validation to find out how this subgroup performed, and does repeated cross-validation to ensure the result is reliable. They also do multiplicity control to safeguard against data fishing.
Cheng described applying AbbVie’s method to data from the International Stroke Trial, a large, prospective randomized open treatment to determine whether early administration of aspirin, heparin, both, or neither affected the clinical course of acute ischemic stroke. The trial included patients who’d been clinically diagnosed with acute ischemic stroke with onset within the last 48 hours. The primary endpoints were death within 14 days, and death or dependency at six months.
Using its machine learning-based method, AbbVie scientists identified clinical trial patient subgroups with high z-scores. The top two candidate subgroups included patients older than 67 without atrial fibrillation, and patients without leg or foot deficits upon admission. Their findings suggested that aspirin may benefit these subgroups. Indeed, among patients older than 67 without atrial fibrillation, the rate of death within 14 days was lower in those who had taken aspirin than in those who hadn’t. Among patients without a leg or foot deficit, the rate of death or dependency at six months was lower in those who had taken aspirin than in those who hadn’t.
Cheng then described more broadly how AbbVie scientists apply this subgroup identification method. For instance, they apply it retrospectively to data from failed clinical trials and identify subgroups from current studies to be validated in future studies. They also prospectively incorporate subgroup identification into clinical trial design. Additionally, they can design a clinical trial without a predesignated subgroup; if the trial fails, and the final sample size is large enough, they can try to identify a subgroup and potentially claim success for that subgroup.
Together, these discussions point to AI and machine learning as promising approaches to delivering healthcare that’s precise not only in its treatment of patients, but its prediction of their disease risk, response to therapy, and other outcomes.