An international team of computing scientists and engineers have developed a new system that is capable of reading lips with accuracy even when the speaker is wearing a facemask. The system combines artificial intelligence with radio-frequency to identify lip movements.
The system integrated with hearing aid technology could help tackle the ‘cocktail party effect’, a shortcoming of customary hearing aids.
Currently, hearing aids help hearing-impaired people by amplifying the sounds around them. But in noisy environments like cocktail parties, their broad spectrum of amplification makes it difficult for their users to focus on specific sounds, such as conversation with one person.
A solution to this is the creation of ‘smart’ hearing aids which combine traditional audio amplification with another device to gather additional data for enhanced performance.
Although some researchers have been successful in using cameras to help in lip reading, videoing people without their consent raises concerns of privacy. Cameras are unable to read lips through masks, a valid concern for people who wear face coverings for cultural purposes, or because of the COVID-19 virus.
In a paper published in the journal of Nature Communications, the team led by the University of Glasgow explained how they harnessed edge sensing technology to read lips. Their system only collects radio-frequency data, with no video footage, thereby preserving privacy.
To design the system, the researchers asked the volunteers to repeat the five vowel sounds while unmasked, and while wearing a surgical mask. When they repeated the vowel sounds, their faces were scanned using radio-frequency signals from a radar sensor and Wi-Fi transmitter. Their faces were also scanned when their lips were still.
The 3,600 samples of data collected during the survey and scan were used to teach the deep learning and machine learning algorithms how to identify the characteristic mouth and lip movements associated with each vowel sound. The algorithms could learn how to read the vowel formation of masked users because the radio-frequency signals easily passes through the masks.
Wi-Fi data was correctly interpreted by the algorithms 95% of the time for unmasked lips, and 80% for masked lips. The radar data was interpreted 91% of the time without a mask, and 83% with a mask.
Dr. Qammer Abbasi of the University of Glasgow’s James Watt School of Engineering, the lead author said, ‘around five percent of the world’s population—about 430 million people— have some kind of hearing impairment.’
‘Hearing aids have provided transformative benefits for many hearing-impaired people. A new generation of technology which collects a wide spectrum of data to augment and enhance the amplification of sound could be another major step in improving hearing-impaired people’s quality of life,’ he added.
‘With this research, we have shown that radio-frequency signals can be used to accurately read vowel sounds on people’s lips, even when their mouths are covered. While the results of lip-reading with radar signals are slightly more accurate, the Wi-Fi signals also demonstrated impressive accuracy,’ he said. ‘Given the ubiquity and affordability of Wi-Fi technologies, the results are highly encouraging which suggests that this technique has value both as a standalone technology and as a component in future multimodal hearing aids.’
Professor Muhammad Imran, co-author, added that ‘this technology is an outcome from two research projects funded by the Engineering and Physical Sciences Research Council (EPSRC), called COG-MHEAR and QUEST.’
‘Both aim to find new methods of creating the next generation of healthcare devices, and this development will play a major role in supporting that goal.’
By Marvellous Iwendi.
Source: University of Glasgow