By Paul Nicolaus 
December 8, 2020 | A group of researchers from IBM and Pfizer has developed an AI model that uses small samples of language to predict the onset of Alzheimer’s disease (AD) in healthy individuals.  
The model, which uses a short one to two-minute speech sample gathered from a standard cognitive test, led to predictive feedback with an accuracy of 70%. The results suggest that language performance can reveal early signs of progression to AD and could help lead to eventual breakthroughs in prevention and treatment. 
As the research team explained in their paper, published Oct. 22 in The Lancet eClinicalMedicine (DOI: 10.1016/j.eclinm.2020.100583), one priority in AD research is the ability to harness early intervention strategies that could reduce risk, delay onset, or slow progression of this disease. However, a significant challenge is that early interventions can only be effectively implemented if the people who stand to benefit can be identified. 
Plenty of variables have been tied to the risk of AD, but there is still a need for inexpensive and reliable biomarkers, and language might be a solution. Age-related cognitive decline reveals itself in many aspects of language because even seemingly mundane linguistic tasks, like the naming of objects, involve extensive brain networks. Because of this, linguistic capabilities can easily become disrupted, turning language competence into “a sensitive indicator of mental dysfunction.”  
The researchers trained their AI models on data from the Framingham Heart Study, which was first initiated in 1948. The study included about 700 samples from 270 participants, out of which a dataset made up of a single sample from 80 individuals was held out for testing. All samples in the test set were gathered during the cognitively normal period. Half the participants in the test set developed AD symptoms before the age of 85; the other half did not. 
The most significant aspect of these newly published findings is that it “proves the potential for using automated language analysis for tracking cognitive decline, even before the clinically accepted onset of mild cognitive impairment,” IBM Research’s Elif Eyigoz told Diagnostics World.
AI Model Picks Up on Subtleties Humans May Overlook 
To arrive at their findings, the research team analyzed the transcriptions of participants’ language samples as they described a picture that depicts a woman washing dishes while two children raid a cookie jar behind her back. The researchers made use of natural language processing, which made it possible to pick up on subtleties that may otherwise go unnoticed.   
According to Eyigoz, one example is that access to many descriptions of the same image from different participants made it possible to infer a representation for how typical (or atypical) any given description is and to use that to score the description with language modeling.  
For a human to make a similar observation, that person would need to sit and read a large number of picture descriptions beforehand, accurately remember what they had read, and then consistently provide scoring for descriptions based on how typical the wording is. And that scoring could vary depending on various factors, like whether that person is irritated, hungry, or tired. On the other hand, a computer can analyze millions of picture descriptions without getting bored or forgetting what it had observed regardless of the size of the data involved.  
Another example is the use of more and less specific terminology. The use of more general terms (like boy, girl, or woman) instead of more specific terms (like son, daughter, or mother) to refer to the subjects in the picture can be a discriminatory variable for an AI model, Eyigoz explained. In contrast, a human medical professional speaking with a patient may not pay attention to or be able to accurately quantify the frequency of each type of use. 
Study Moves AD Research Forward, Builds Upon Larger Effort
This work moves the field forward in several ways, according to the research team. This is one of the first major studies, for instance, to predict outcomes in healthy people who have no other risk factors. By contrast, most existing Alzheimer’s prediction research has zeroed in on people who are already beginning to show signs of cognitive decline or those with risk factors like family history.  
Also, access to data from original participants of the Framingham Study—in addition to their spouses and children—made for a larger dataset than those used in most other studies, Guillermo Cecchi of IBM Research pointed out in a blog post dated Oct. 22, and also made it possible to verify their model’s predictions against actual outcomes. 
This work builds upon IBM Research’s broader undertaking of gathering biomarkers, like speech, from a person’s natural environment to provide insight into a whole range of mental health issues. 
According to Jeff Rogers, who leads Cognitive IoT for Healthcare, these study findings are an example of how the signs of a person’s health are often present in the things they do day in and day out. “With the right insights and tools,” he explained, “healthcare can be reimagined with doctors having the information they need to treat patients early when interventions can be the most helpful.”
Outside Experts Weigh in
Mallar Chakravarty, an associate professor of psychiatry at McGill University and a computational neuroscientist at the Douglas Research Centre (who was not involved with the study), found this work interesting because it makes use of data that could be gathered from anywhere using a simple task.  
Much of the work done in this early detection domain involves biologically driven hypotheses where researchers are collecting biomarkers from the blood or even more invasively from cerebral spinal fluid. “In my case, we’ve used genetics and MRI,” he told Diagnostics World.  
The real aspiration, though, is trying to find biomarkers that can be very easily accessed, and with this recently published study, there isn’t much more that’s needed for the work other than the task and a tape recorder. “That’s a lot cheaper than getting MRIs and is a lot less invasive than collecting cerebral spinal fluid,” he added. 
The study highlights the need to assess linguistic abilities as part of a diagnostic work-up for cognitive decline, according to Sarah C. McEwen (who was not involved with the study), and especially the presence of telegraphic speech, repetitiveness, and agraphia since these may appear as some of the earliest and most relevant markers of a future diagnosis of dementia.
The accurate, early detection of AD is an important objective in both clinical research and neurology clinical practice, explained McEwen, a director of research and programming for Pacific Neuroscience Institute at Providence Saint John’s Health Center and associate professor of Translational Neurosciences and Neurotherapeutics at the John Wayne Cancer Institute in Santa Monica, Calif. 
A primary focus in this realm of research is figuring out how to identify and intervene at the earliest stages, she told Diagnostics World, when a personalized intervention could have the biggest benefit and help alter the cognitive trajectory for an individual.
“Clinically, the ability to identify a cognitive skill that would not be flagged during a standard neuropsychological assessment in an otherwise cognitively healthy individual is a promising and low-risk measurement that could help that individual prioritize their cognitive health if they scored poorly on this assessment and had other risk markers present,” McEwen said.  
In addition, this new research could help those who are conducting clinical trials in the realm of AD research to enrich their sample for people at a higher risk of converting to the disease—the same individuals who could benefit the most from the intervention being tested. 
“Although the test shows initial evidence of efficacy, it needs further replication in a larger and prospective sample which could use a more well-developed test of language competence while accounting for individual differences in linguistic competence,” McEwen pointed out. 
And although the initial data is promising, blood-based tests designed to detect the conversion and presence of AD could be more sensitive indicators of disease onset and pathology. Recent ones, she noted, show an accuracy of classification in the 89% to 98% range. By comparison, this study was able to predict classification of the conversion to AD with 70% accuracy. 
Short-Term Applications and Long-term Aspirations 
The big hope is that these AI models will help lead to the creation of simple ways of enabling clinicians to determine a patient’s risk of developing AD if even they have no symptoms or risk factors. If the models are further developed and trained on racially and geographically diverse sets of data, it could lead to less invasive, more accessible preventative testing for AD. 
“The computational methods that made this study possible are mature,” Eyigoz explained, “but we do not have large scale datasets that combine language data with medical records.” This prevents any near-future application of these methods when predicting the likelihood of future onset of AD. 
Unfortunately, there aren’t any large-scale datasets that would enable the application of a richer set of AI methods to this research problem. But this study provides a strong argument for collecting such datasets, she added, which would take decades (because aging is a slow process).
This type of dataset ought to include language samples collected from participants while cognitively healthy along with diagnosis dates when they do eventually develop impairment and AD.
While this may sound fairly simple, it’s anything but. If the researchers began collecting this information in 2020, Eyigoz indicated it would be useful in 2040 for inferring the likelihood of the future onset of AD 20 years in advance. Similarly, if they began collecting data today, it would be useful in 2060 for inferring the likelihood of the future onset four decades in advance. 
“Personally, this last scenario is what I would like to see happen one day,” she said, noting her hopes that this study encourages the collection of large-scale longitudinal datasets of language production beginning from middle age.  
There are substantial challenges that would need to be overcome, including the amount of time and money required to collect large-scale longitudinal datasets. Beyond that, gathering datasets made up of language samples spanning decades involves legal complexities related to accessing medically sensitive data such as diagnosis dates. There are also privacy and security issues associated with collecting writing samples and voice recordings. 
Despite these difficulties, Eyigoz believes it can be accomplished. Meanwhile, she thinks there are possible applications in the more immediate future, like providing an approach for enhancing clinical trials or monitoring treatment. 
One practical application of the methods outlined in the study is the use of its results in a more extensive system that would include other automated assessments of cognitive decline, she added, since “sophisticated decision making in artificial intelligence usually involves combining results obtained from multiple methods.”
**
Paul Nicolaus is a freelance writer specializing in science, nature, and health. Learn more at www.nicolauswriting.com.