Latest News

With Machine Learning, Acoustical Phenomenon Could Be Diagnostic

By Deborah Borfitz

December 20, 2022 | The application of sound in diagnostic medicine was among a miscellany of topics on the agenda at the recent Acoustical Society of America meeting in Nashville, Tennessee. These included the use of machine learning (ML) to detect diarrhea and pneumonia by listening to, respectively, excretion events and coughs.

First up with a “feces thesis” presentation was Maia Gatlin, an aerospace engineer at Georgia Tech Research Institute (GTRI), who helped come up with an ML algorithm to identify diarrhea noises. The idea here, she says, is not to assist people in recognizing when they have diarrhea—most people know when they do—but to track spikes in cases within a community so health officials can take appropriate action to slow the spread of water-borne illnesses such as cholera that affect millions of people annually and result in up to 143,000 deaths worldwide.

Specifically, in low-resource areas a sensor might be deployed in public toilets and latrines to capture excretion events in a discreet fashion, she explains. The prototype system involves using a microphone, which collects the audio data and is connected to a microprocessor such as a Raspberry Pi with a pretrained ML algorithm onboard.

Temporal signals are recorded in 10-second increments and converted to mel (melody) spectrograms, or image representations, which the algorithm evaluates and decided whether the event was or was not diarrhea, Gatlin continues. Data for training the ML model came from publicly available sources that notably included YouTube.

“We did not know anything about the folks that made those sounds originally,” she emphasizes. Some of the full-length YouTube videos were 10 hours long, making them a surprisingly rich source of sound data for the training exercise.

Image Input

The ML algorithm chosen for the job is called a convolutional neural network (CNN) and it takes images as input, says Gatlin. It was trained on data in different classes—including defecation, urination, and flatulence as well as diarrhea—none of which sound the same or look the same in a spectrogram. The harmonics (aka “trills”) are specific to each event type. All this information was passed, in a single bucket, to the ML algorithm to help it learn to differentiate between the four classes.

Training of the ML algorithm consumed 70% of the online sound data and validation, which occurred simultaneously, another 10%, she adds. The remaining 20% was reserved exclusively for evaluating the model’s performance.

Gatlin and her team also did augmentation, she says, fabricating new data points by slightly altering real ones. This allowed them to create more realistic data samples by, for example, masking time or frequency or adding in background noise. “It is a deterrent to something called overfitting, where the model is essentially just memorizing the input instead of actually learning the features. So, if it gets random things... it is not expecting, then it is able to be a little more robust in its learning.”

For the training data, every real data point had seven augmented ones, Gatlin reports. But for the test data, the ratio was two to one and, to keep it real, the model was only augmented with the background sounds one might expect to hear in a bathroom environment such as people shuffling or talking a bit.

In the first of two tests where the ML algorithm was given data with no background noise that it didn’t see during training, the model accurately predicted whether an excretion event was diarrhea 98% of the time, says Gatlin. In the second, harder test where some background noise was introduced, performance slipped a bit to 96%.

Since the project needed to be completed within 10 weeks, she continues, the team wanted to generate some fake sounds that might trick the ML model during the final testing phase. For this they devised the synthetic human acoustic reproduction testing (SHART) machine. “We had loads of misclassifications because it’s not real data, but we were able to trick the ML algorithm about 72% of the time.”

In a brief demo of the sensor attached to a public toilet, Gatlin points out a green light indicating the diarrhea detector is recording the sounds that a microphone right above it is taking in. It takes only seconds for the light to turn red, signaling diarrhea.

Privacy Advantage

Admittedly, the device has room for improvement. Researchers want to improve the ML model with more real data and improve their simulated noise, says Gatlin, noting the unavoidable time constraint of getting institutional review board approval to conduct such studies. “The acoustical environment that we look to deploy the sensor might be something more like a latrine” than the toilets they dealt with in the study, she adds.

During the Q&A, Gatlin points to the privacy advantage of acoustic monitoring for detecting diarrhea. “We wanted it to be noninvasive, so folks are hopefully a little more comfortable than potentially having pictures of their excrement taken... or anything identifiable.” With the sensor system, it would be hard to tell who the noise came from unless people used their name while in the restroom.

In the future, the tracking device could have individual-level utility, she says. As reported by others, “there can be some geometric changes in the rectum that might also have changes acoustically. You may not know that your farts or your poo... is changing, but we could look at tracking that in terms of rectal cancer.”

The project was funded by the GTRI Research Internship Program and was the inspiration of fellow research engineer Alexis Noel, Gatlin says. Noel was already working on a “reinventing the toilet” project for the Bill & Melinda Gates Foundation that aims to bring safe, clean sanitation services to poor people in the developing world.

Coming up with the algorithm was only half the problem. Moving forward, says Gatlin, further development of the diarrhea-detecting sensor will need to address how to wirelessly transmit the data, as well as generate a database and keep it updated.

Characterizing Coughs

Presenting on the pneumonia diagnosis algorithm was Jin Yong Jeon, an academic researcher in the department of architectural engineering at Hanyang University in Seoul, South Korea. The scientific contribution described is enhanced performance of a CNN algorithm designed to analyze the pattern of cough sounds to identify a variety of respiratory diseases.

He explains that pattern analysis is accomplished in part with image data conversion using Gramian Angular Field (GAF), whereby time series signals get mapped to an image. Other enabling factors are a data augmentation technique based on room impulse responses using cough sounds, generated by computer simulation of spatial models according to sound source and receiver, and efficient process configuration using CNN transfer learning.

The algorithm seeks to enhance recognition accuracy through feature extraction and image classification, Jeon says. As suggested by a recent study, “better performance can be found by considering the characteristics of acoustic signals and data application methods for different environments.”

For the “cough detection network” (GAF-CNN model) deployed here, clinical cough sounds were collected from pneumonia patients and augmented in various rectangular and non-rectangular spaces, says Jeon. Time-dependent psychoacoustic data was then calculated and converted into two-dimensional image data through the GAF while mel spectral images were being created. The two heterogeneous data were then combined to form a three-dimensional image dataset and the CNN model was implemented.

“The concept of room impulse response is used in architectural acoustics, to identify the sound environment of a space, and to obtain various acoustic information,” Jeon continues. “In our study, impulse responses were collected in rectangular and irregular spaces, then psychoacoustic parameters were used to characterize the cough sounds.”

Those were classified into three categories—modulation, strength, and spectral content—based on roughness, loudness, and pitch. “The key part of our study is to combine the indicators,” he notes.

Researchers collected impulse responses for data augmentation by configuring different room sizes in time increments ranging from 0.5 to 2 seconds, capturing changes in volume, says Jeon. Classification test results indicated an approximately 10% improvement in accuracy, sensitivity, and specificity when using impulse response convolution compared to the normal image shifting technique.

Space Agnostic

When looking at changes in accuracy over time, impulse response in most irregular spaces showed good performance, he says, whereas rectangular space data was less reliable due to acoustic fluctuations. To get good performance with data augmentation in a rectangular space required all psychoacoustic indicators to be present together, while in the case of irregular space high accuracy could be achieved when only two acoustic parameters were combined: tonality and fluctuations in strength.

Importantly, the longer the reverberation time, the more effective the augmentation, he adds. So, spaces with shorter reverberations exhibit higher reliability of the data.

Jeon additionally shares that augmentation with impulse responses achieved an accuracy rate of 99.5% in irregular spaces and 97.5% in rectangular ones.

In the future, Jeon says, he imagines the ML model being used to detect various respiratory diseases, including chronic obstructive pulmonary disease and asthma as well as pneumonia. It may be possible to pick up on subtle differences in the cough sounds of patients as they go about their daily activities and thereby address problems sooner.

The research team has datasets specific to different kinds of lung diseases, he explains, enabling the algorithm to tell a patient cough sound from any sort of background noise. The approach is targeted to individuals with a particular condition who would record their own cough sounds and is agnostic to the environment, indoors or out, they happen to be in.