Google details AI work behind Project Euphonia’s more inclusive speech recognition

As part of new efforts towards accessibility, Google announced Project Euphonia at I/O in May: An attempt to make speech recognition capable of understanding people with non-standard speaking voices or impediments. The company has just published a post and its paper explaining some of the AI work enabling the new capability.
The problem is simple to observe: The speaking voices of those with motor impairments, such as those produced by degenerative diseases like amyotrophic lateral sclerosis (ALS), simply are not understood by existing natural language processing systems.
You can see it in action in the following video of Google research scientist Dimitri Kanevsky, who himself has impaired speech, attempting to interact with one of the company’s own products (and eventually doing so with the help of related work Parrotron):

The research team describes it as following:
ASR [automatic speech recognition] systems are most often trained from ‘typical’ speech, which means that underrepresented groups, such as those with speech impairments or heavy accents, don’t experience the same degree of utility.
…Current state-of-the-art ASR models can yield high word error rates (WER) for speakers with only a moderate speech impairment from ALS, effectively barring access to ASR reliant technologies.
It’s notable that they at least partly blame the training set. That’s one of those implicit biases we find in AI models that can lead to high error rates in other places, like facial recognition or even noticing that a person is present. While failing to include major groups like people with dark skin isn’t a mistake comparable in scale to building a system not inclusive of those with impacted speech, they can both be addressed by more inclusive source data.
For Google’s researchers, that meant collecting dozens of hours of spoken audio from people with ALS. As you might expect, each person is affected differently by their condition, so accommodating the effects of the disease is not the same process as accommodating, say, a merely uncommon accent.

Live transcription and captioning in Android are a boon to the hearing-impaired
See also:
Leave a comment
  • Latest
  • Read
  • Commented
Calendar Content
«    Февраль 2023    »