By Maxine Bookbinder
October 3, 2018 | Every year, 50,000 Americans die from pneumonia and 1 million more are hospitalized with the disease. Chest X-rays, one of the most widely-performed imaging procedures in the US, are the best way to diagnose pneumonia. They are also subject to an unknown number of false negatives.
The Stanford University School of Medicine has completed a study, using technology developed by Unanimous AI, showing that a handful of doctors, connected by algorithms, achieved greater diagnostic accuracy than machine learning alone.
Conventional AI in radiology uses machine learning, training networks on hundreds of thousands of X-ray datasets. Stanford did this with its software, CheXNet in 2017. CheXNet is a 121-layer convolutional neural network that detects pneumonia from chest X-rays and for the first time exceeded the ability of human radiologists (Rajpurkar, Irvin, et al., “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,”).
The study with Unanimous AI took a different approach. Instead of replacing doctors with machine-learning algorithms, a group of eight board-certified radiologists were connected together into a “super-intelligence” using the company’s Swarm AI technology. This swarm of doctors were asked to view 50 chest X-rays to determine the probability of each patient having pneumonia. The radiologists were scattered throughout the US and internationally, each working anonymously.
“Groups of doctors with different expertise can find the best possible treatment plan,” says Unanimous AI Chief Scientist and CEO Louis Rosenberg. “The goal is to amplify human diagnostic accuracy. We can build a super doctor by combining the wisdom, knowledge, and insights of a medical team in optimal ways. Any instance in which doctors need to draw on their own experience to make subjective judgements on complex matters is a good place to apply technology, build a better system, and find the best possible diagnosis and treatment plan.”
Rosenberg uses terms like “hive mind” and “swarm” that are derived from nature. He says countless species make themselves smarter in groups, such as bird flocks and bee swarms, whether determining how to fly south or where to establish a hive. “They don’t take a vote or use Survey Monkey to make decisions. They work better together as a system than individually. This is a natural path for social organisms.”
Swarming empowers animals, insects, and humans to optimize their collective knowledge, wisdom, and intuition in real time. Rosenberg says that Swarm AI enables the same benefits in human groups. Despite negative depictions by over-zealous science fiction writers, swarming significantly amplifies collective human abilities while maintaining the individuality of each member.
“Most AI research is aimed at taking people out of the loop; I think this is a mistake,” says Rosenberg. “We need to develop systems that leverage AI but keep people relevant by connecting human groups into super-intelligent [systems] rather than replacing people altogether. Let’s use AI in a different way. This is a safer approach for AI systems.”
Swarm Of Radiologists
For the Stanford pneumonia study, the goals were threefold: to determine if Swarm AI could increase diagnostic accuracy and subsequently reduce errors; to determine if Swarm AI could achieve greater accuracy that Stanford’s CheXNet; and to validate the hypothesis that people, in this case radiologists, make better decisions as a system than individually.
“We use a lot of AI and machine learning at Stanford,” says Safwan Halabi, Clinical Associate Professor of Pediatric Radiology at Stanford. “We wanted to see how Swarm AI would do against multiple radiologists looking at X-rays at the same time.”
The study took about two hours, and started with the radiologists simultaneously viewing one chest X-ray at a time on the Internet. After each X-ray, the radiologists chose a percentage probability that the patient had pneumonia: 0-40%, 40-60%, 60-80%, and 80-100%. If the range of probability was greater than 40% for a particular X-ray, the radiologists immediately reassessed the image to see if that probability was lower or higher (for example, closer to 40% or 60%) and to determine if the swarm could agree within a 20% consensus. The doctors each had equal weight and could see the others’ annotations as well as the final consensus.
The results were impressive: Swarm AI, integrating algorithms and humans, performed 22% more accurately than CheXNet. “The sum is greater than individual parts,” says Halabi, proving that AI should not be viewed as replacing humans but as augmenting intelligence. It also shows that humans are needed to improve AI. “Like replays, AI enhances, makes [data] more structured and standardized. It still needs a human component.”
It also means that radiologists can sleep better. “It’s keeping human radiologists relevant in a world in which AI is becoming more real,” says Halabi. “AI brings radiologists together. AI connects people to allow them to form a diagnosis.”
Rosenberg confers. “The idea is to make people smarter, not to replace them, and in the process keep human values and sensibilities in the loop. After all, the idea that we can instill human values into AI algorithms is misguided. Whose values are we talking about? If researchers had taken this on 30 years ago, they would have modeled very different social values than what we see in today’s society. It is much safer to have people integrated alongside AI and keeping up to date with values and perspectives. We are making decisions about patients. We must keep people in the loop.”
X-Rays And Race Horses
Rosenberg is an entrepreneur, professor at California Polytechnic, and holds more than 300 patents. He didn’t start with a focus on diagnostics. He hypothesized that by imitating animal systems, he could create a safer and more technologically-advanced Artificial Intelligence (AI) and amplify human intelligence in a whole host of applications. He launched Unanimous AI in 2014 to do this.
In 2016, CBS Interactive challenged Rosenberg to predict the first four horses in the Kentucky Derby. He created a swarm of 20 horse enthusiasts. None of the individuals in the swarm named all four winners; brought together in real time on computers, however, they did. (Rosenberg’s $20 horse bet won him $11,000.) In 2017, Newsweek challenged Unanimous AI to predict that year’s Oscar winners. Rosenberg gathered a swarm of 50 non-expert movie fans. Individually, they were 40% accurate. By taking a vote, they were 47% accurate. By thinking together as a Swarm AI system, they were 76% accurate. One year later, in 2018, Unanimous AI was 94% accurate in predicting the Oscars, outperforming every major news organization that uses professional movie critics.
The Stanford X-ray study was Unanimous AI’s first in-depth medical study, and the company is planning more testing with Stanford researchers and at least one study with Harvard. Future studies will test for different variables. Do smaller swarms achieve the same accuracy as larger ones? Do more X-rays improve accuracy? How do changes within the swarm improve accuracy, such as giving some members more weight, revealing swarm participant identities, examining decision-making trends, or adding a “plant” to determine, for instance, if radiologists remain independent or follow a world-renown doctor. Future tests could also compare first-year students to their higher-level cohorts, subspecialty radiologists to general radiologists, or derive a consensus as a team on an unknown case.
Planned studies will also include more types of imaging, including CT scans, MRIs, tumor boards, and treatment planning to reduce diagnostic errors. Mammograms, in particular, are subject to a higher rate of false positives, says Unanimous AI’s Rosenberg. “If Swarm AI can reduce the error rate even by a small amount, then we can eliminate large numbers of procedures, patient trauma, and billions of dollars spent on false positives.”
The current study used images from NIH public data; the next test will substitute Stanford data, but Halabi says there are currently no plans to use Swarm AI for live diagnoses or medical assessments on real patients.
Unanimous AI will also continue to experiment in non-medical ventures, including sales forecasting and financial, marketing, and competitive analyses, such as predicting if a marketing message will work, if a price change will impact sales, or if certain TV ads will increase a restaurant’s business. “Anywhere there is a business team,” says Rosenberg, “a swarm can make them significantly smarter. We’re smarter together than alone.”