Siri

Apple training Siri to detect and adapt to users who stutter using research and podcast catalog

Zac Hall | Feb 25 2021 - 12:03 pm PT

HomePod mini will help boost smart speaker sales

Apple is investigating ways to enhance its Siri voice assistant for users with atypical speech patterns, the company confirms to the Wall Street Journal. According to the report, Apple is leveraging its podcast library for speech samples that could train Siri to adapt to users who speak with a stutter.

Here’s the news from the Wall Street Journal report:

The company is now researching how to automatically detect if someone speaks with a stutter, and has built a bank of 28,000 audio clips from podcasts featuring stuttering to help do so, according to a research paper due to be published by Apple employees this week that was seen by the Wall Street Journal.

For now, Apple relies on its Hold to Talk feature as a method for interacting with Siri without the voice assistant cutting off users with slower speech patterns than its tuned for, but physically interacting with a device isn’t always convenient.

Siri can be voice activated on iPhones, iPads, and Macs, and especially HomePod and HomePod mini, using the “Hey Siri” voice command followed by a request. For users who stutter, however, the current version of Siri commonly interprets pauses in speech as the end of a voice command. In turn, this prevents the voice assistant from reaching its full potential for a collection of customers.

Friend of the site Steve Aquino pointed to the Apple research paper referenced in the WSJ report.

As a lifelong stutter who is extraordinarily self-conscious about it, this @KatieDeighton story for the WSJ is huge news. Apple has done extensive research on this, as has Amazon and Google.

Apple research paper: https://t.co/DbJMSCv3du https://t.co/ScjNZMH5pt
— Steven Aquino (he/him) (@steven_aquino) February 25, 2021

Here’s the abstract for Apple’s research:

The ability to automatically detect stuttering events in speech could help speech pathologists track an individual’s fluency over time or help improve speech recognition systems for people with atypical speech patterns. Despite increasing interest in this area, existing public datasets are too small to build generalizable dysfluency detection systems and lack sufficient annotations. In this work, we introduce Stuttering Events in Podcasts (SEP-28k), a dataset containing over 28k clips labeled with five event types including blocks, prolongations, sound repetitions, word repetitions, and interjections. Audio comes from public podcasts largely consisting of people who stutter interviewing other people who stutter. We benchmark a set of acoustic models on SEP-28k and the public FluencyBank dataset and highlight how simply increasing the amount of training data improves relative detection performance by 28% and 24% F1 on each. Annotations from over 32k clips across both datasets will be publicly released.

The research paper acknowledged that the current approach to tuning Siri for dysfluency is one approach, there remains an opportunity to improve the effort using language models and other methods.

Lastly, Apple concludes that while its current research focuses on users who stutter, future research should explore other categories like dysarthria that have different characteristics.

Statement by Jane Fraser, President, the Stuttering Foundation on tech company efforts to include stuttering in speech patterns recognized by voice assistants:

“We’re thrilled to learn of recent efforts by tech companies to be more inclusive of the stuttering community in their voice assistant technologies. For people who stutter, being heard and understood can be a lifelong struggle. The evolution of technology to account for what people say, rather than how they say it, opens the door for tens of millions of people who struggle with stuttering.”