December 19, 2019 |Researchers have known for a very long time that proteins circulating in human blood can, like the proverbial canary in the coal mine, be signals of health and disease risk. Until recently, the association has been exploited only in one-at-a-time experiments, usually with mass spectrometry technology. Protein biomarkers have also been considered an adjunct to other information sources, such as behavioral risk factors and imaging test results, not the sole focus.
SomaLogic has deliberately flipped that logic on its head, according to Chief Medical Officer Stephen Williams, M.D. The precision health information company prefers to “rigorously pre-plan” its analysis and deploy a lot of predictive, protein-based models simultaneously. Had proteomics been invented earlier, he says, everyone might be asking how much can be learned from proteins on their own.
The answer, it seems, is quite a lot. The company already has seven of its SomaSignal tests in clinical use by concierge physicians, currently in the Boulder, Colorado area but expanding to others nationwide in another few months, Williams says. Concierge practices emphasize preventive health.
The tests are specific to primary and secondary cardiovascular risk, liver fat, cardiorespiratory fitness, body composition, alcohol impact and glucose tolerance—and individually constitute between 12 and 122 proteins, says Williams. In a just-published proof-on-concept study (DOI: 10.1038/s41591-019-0665-2) in Nature Medicine, the primary cardiovascular risk model performed well relative to gold-standard tests and in a 2016 study (DOI: 10.1001/jama.2016.5951) in JAMA the secondary cardiovascular risk model was found to do even better than standard assessments, he notes.
Another three predictive tests for visceral fat (stored around the organs), glucose tolerance (without the need for giving glucose or fasting) and resting energy rate (number of calories burned at rest) are in the final stages of development and will be available next month, Williams says.
SomaLogic is a central laboratory certified under the Clinical Laboratory Improvement Amendments of 1988, allowing it to deliver laboratory-developed tests as regulated by the Centers for Medicare & Medicaid Services. Approval by the U.S. Food and Drug Administration (FDA) is therefore not required.
Proteomics is a potentially huge game-changer. “When you look at most diseases that kill people… the proportion of the disease that can be explained even by the biggest combination of genetic markers is typically 10% or 15%,” says Williams. Many diseases are also known to have a big lifestyle component that isn’t signaled by genetics, he adds, “so maybe it is not so surprising that proteins carry so much information.”
In the Nature Medicine study, the company’s SomaScan platform scanned 5,000 proteins from a single blood test on each of the almost 17,000 trial participants. A machine learning program was then used to decode the patterns and see how well they mimicked results of the “truth standard,” such as an ultrasound or treadmill test.
Eleven of 13 protein-based models linked to a health indicator—liver fat, kidney filtration, percentage body fat, visceral fat mass, lean body mass, cardiopulmonary fitness, physical activity, alcohol consumption, cigarette smoking, diabetes risk and primary cardiovascular event risk. The two unsuccessful model attempts were for predicting future body weight and macronutrient intake.
The successful models each incorporated between 13 and 375 protein measurements, as reported in the paper. Across all models, nearly 900 unique human proteins were included.
In addition to the unexpectedly high success rate, Williams says, “we were shocked at how good some of the models really were.” The body fat model, for example, captured 90% of the information produced by a DEXA scan. Genetic testing company 23andMe, by comparison, offers a 381-gene score that accounts for 7% of a person’s body fat information, he adds.
Overall, the degree to which proteins in any one model overlapped with others was a mean of 12%, says Williams. The most frequently selected individual protein was leptin, which was informative for predictive models of percentage body fat, visceral fat, physical activity and fitness.
Williams says that it is relatively unimportant if proteins are represented across models or why they’re not. “I don’t mind what proteins get chosen, as long as they work and are consistent when applied in new situations.”
Study findings support Williams’ vision of a “liquid health check” that can be routinely deployed in clinics everywhere, which would be more convenient and less expensive than using standard techniques and testing.
Concierge physicians are paying an introductory price of $199 for the current battery of SomaSignal tests. To capture the same information in the traditional fashion would cost most patients thousands of dollars between physician visits, laboratory testing, exercise stress testing and imaging assessments, he notes.
“We’re launching the tests as a self-pay add-on,” says Williams. “People visiting a concierge doctor are already used to paying for consultations and not necessarily having them covered by insurance. It’s the easiest steppingstone to adoption and learning and getting feedback from patients and doctors.”
Roughly 20,000 genes make structural protein, but no one knows how many of them might later get modified or their number of variants because no current technology can interrogate the entire proteome, Williams says. So SomaScan looks for the 5,000 most popular with researchers, meaning a highly purified version is available for making the reagent that detects the presence of the protein in a plasma sample.
SomaLogic’s measurements depend on using aptamers, which are lengths of DNA that can stick to proteins. In order to generate these, SomaLogic has built libraries of random sequences of aptamers—fragments of DNA that bind to the target protein—at a scale of ten to the fifteenth power, Williams says. “We don’t even know what they are. We made them by combinational chemistry in almost every imaginable sequence.”
The process of finding the best aptamer begins by repeatedly “incubating” a highly purified protein with the random library, keeping the good binders and discarding the rest. “With each cycle we’re enriching more for the aptamer that binds the most strongly to the target protein,” he explains. At the end, perhaps a few hundred aptamers remain in the enriched library and the “best bets” get genetically sequenced, produced at scale and added to the SomaScan “menu” when a new version of the assay is qualified (typically when at least 1,000 new reagents are ready to be added).
What’s new is use of the platform for discovery and medical information delivery to patients at the same time, eliminating all the translational risk, Williams continues. “If you’re using one platform to discover your marker of interest [i.e., mass spectrometry] and then deliver information to patients using something else [i.e., an immunoassay], the measurements aren’t the same.”
The combined approach also means any number of models can be added to the SomaScan platform to help patients, together with their medical care team, manage their health, says Williams. “The incremental cost of adding a new test is almost nothing because we’re already measuring 5,000 things.”
The other paradigm shift explored in the newly published study is that researchers were protein-agnostic when validating the predictive models, Williams says. “We didn’t choose proteins based on their name, history, favoritism or the literature.” Little is known about many individual proteins, let alone their interactions with one another, so “we just let machine learning choose the best combinations.”
The biology of the proteins were added as a table in the paper at the insistence of reviewers, Williams says. The original submission didn’t name a single protein. The reason the machine learning algorithm chose certain proteins for a model may be to “correct for a demographic factor or a piece of physiology like kidney function,” not because of their association to the target disease state. “We didn’t tell the algorithms whether [the proteins] were from a man or a woman, if they were tall or short, or what their ethnic background was.”
Beyond the initially discovered health-associated protein patterns, SomaLogic has another 100 in the pipeline, says Williams. Each can be turned into software that can be added to the information load delivered to patients—including the impact of lifestyle changes on their health status.
Risk-bearing health systems managing patients with chronic diseases, for whom a different suite of tests are under development, is the next target market, Williams says. The first tests to launch will be specific to the management of cardiovascular disease, heart failure and nonalcoholic steatohepatitis (NASH).
The NASH test will mimic each component of a liver biopsy using proteins in the blood, he says. NASH CRN, established by the National Institute of Diabetes and Digestive and Kidney Diseases, will be collaborating.
Very large sample sets are needed to develop the tests, says Williams. Since April 2018, SomaLogic has run roughly 150,000 samples from clinical assays obtained from academic collaborators, providing the data needed to build many new models predictive of health conditions, issues and behaviors. Practitioners will simply choose the bundle of tests most helpful to their panel of patients.
For the most highly consequential tests, SomaLogic will voluntarily seek FDA approval, he adds. “Tests are viewed as more dependable and valuable when they’ve been vetted by a trusted external entity.”