By Allison Proffitt
June 19, 2018 | 2018 Bio-IT World Best Practices Award-Winner | It was about three years ago when Alexion’s Head of Research and Development asked Paul McDonagh and other Alexion data scientists how many rare diseases there were. Turns out, it wasn’t a simple question and it launched a database project to record disease characteristics for 4,000 to 5,000 rare diseases.
The next step was prompted by a toy: University Games' 20 Questions Ball. The game asks players 19 questions about a secret word before delivering the right answer. Can we do that with rare diseases, the Alexion team wondered? Is there a 20 questions approach that would deliver diagnoses to physicians?
Paul McDonagh, Senior Director Data Sciences and Informatics at Alexion, was skeptical. Rare diseases are notoriously complicated. Disease presentation changes by age and sex. Sometimes a disease can take months to years to fully present itself; physicians talk about diseases “blossoming”, McDonagh explained. And some patients have more than one rare disease.
With 4,000 to 5,000 rare diseases on the table, McDonagh expected a successful diagnostic tool to rely on some pretty sophisticated machine learning, fed by massive amounts of patient data, not just a database of textbook definitions. “But we’d already made this investment [in a database], and the data was already organized,” he said. “It seemed we should at least try. If it worked in some capacity, it would be great.”
The approach actually worked quite well and earned the team a 2018 Bio-IT World Best Practices Award.
Finding Play In The Machine
With help from the EPAM Software Engineering team, McDonagh and his Alexion colleagues built SmartPanel as a three-legged platform: the Alexion database of rare diseases, patient presentation data, and math. The three together comprise the predictive algorithm, McDonagh explained. By tweaking one leg, the team can watch outputs change. “I knew the approach would give birth to where the answer was,” he said.
For the patient presentation data, two sets of anonymous de-identified patient phenotypes were used. The third patient dataset was a simulated dataset: “how these patients would probably present if the disease descriptions were really good,” McDonagh explains.
For the final leg, the team solicited 25 different approaches from mathematicians at EPAM, Tessella, and SEMA4. “A friendly competition was proposed,” McDonagh said, and each group worked with the SmartPanel platform to present and test their own mathematical approaches.
“At the start, the trick was to keep all of these collaborators working toward a common goal, while preserving the independence of ideas and the integrity of the real patient data,” McDonagh said. The SmartPanel platform insulates the real data from the developers; it keeps privacy intact. But developers are able to see the simulated data, and when they submitted their algorithm to the SmartPanel framework, they received a statistical score for their performance against the real data. It was a bit frustrating for them, McDonagh acknowledged. “They always want the real data. But that’s the name of the game.”
After the initial round of competition, the teams worked together. “At the end, we shared some of the methodology, and hybrid and ensemble approaches started to form. It was one of these hybrid approaches that was the narrow winner.”
But this is where it got interesting for McDonagh.
McDonagh purposefully kept the mathematicians isolated to get as much diversity as possible. By varying the math, he hoped, “to see whether it was a math problem or predominately a problem of disease and patient information.”
Against the simulated data, most of the contributed algorithms performed well. “But when we use real data—and the only thing that we change is the real presentation of the patients—then we see rather large differences in performance,” McDonagh said.
“We have a range of some really simple statistics through to some bordering-on-artificial intelligence reasoning and differential diagnosis,” McDonagh said. “Different math can handle the play in the machine to different levels,” McDonagh explained. Simple statistics tests don’t do very well; AI methods do better, though there’s still room for improvement, he said. Different artificial intelligence approaches delivered “about the same performance.”
The approaches revealed a range of answers with some surprising distinctions. “Some performed better with real patients from [certain] institutions, and some of them work better on patients from other institutions,” McDonagh said.
To isolate the institutional impact, the teams standardized the math and the disease database and compared results across different patient inputs. “We’ll see different performance of those algorithms between institutions, which tells us that the way that those patients are being described in the medical record is also important,” McDonagh explains.
It turns out, McDonagh said, some of the most important work in rare disease diagnoses lies in standardizing how we describe patients and diseases in the medical record. “If there was a standard process to do that, it would make the math behind patient matching so much easier and so much more accurate,” he said. “In our opinion, if we could standardize the way the patients are being turned into a computable format, that’s going to take out one of the variables in the process. It’s going to be faster than someone reading your notes and hand-coding it.”
So far, SmartPanel has used mostly phenotype data for patients. But genomic data is an obvious next step. “We’re now using it to evaluate different ways of processing genetic data. Beyond mutations there are also structural variants and small deletions and micro deletions. We can use that information from the genome sequencing to sort of further correct and constrain the diseases that the algorithms are thinking over,” McDonagh said.
World Records & Real Patients
Findings from the SmartPanel competition platform are already being put to good use. In February of this year, The Manton Center for Orphan Disease Research and Alexion announced a collaboration to apply one of the algorithms borne of the SmartPanel competition— called the 20 Rare-Disease Questions (20RDQ) algorithm —with The Manton Center’s own internally developed software to create computable descriptions of patients that can be combined with rapid genome sequencing to produce a prioritized list of suspect genetic variants of rare diseases for consideration by a diagnosing physician.
The Manton Center and Alexion will create new sets of computable descriptions of hundreds of diagnosed and undiagnosed patients with rare diseases to train the artificial intelligence engine within 20RDQ to improve diagnosis speed and accuracy while creating actionable insights and disease intervention options in a clinically meaningful timeframe.
Alexion also returned SmartPanel findings to Rady Children’s Hospital for patients, and was part of the team awarded a Guinness Book of World Records title for the fastest genetic diagnosis earlier this year, McDonagh said.
“For Alexion, this is a mission for all rare diseases, but we’re not just focused on diseases that Alexion medicines can treat. We’re focused on just over 3,400 computationally diagnosable diseases wherever they are.”