Apple details personalized ‘Hey Siri’ voice recognition in latest Machine Learning Journal entry

Zac Hall | Apr 16 2018 - 8:23 am PT

Apple’s Siri team has published a new Machine Learning Journal entry that details some of the process behind making voice-activated ‘Hey Siri’ work with just our voice. Apple previously documented part of the process behind pulling off voice-activated Siri in general last fall, and the first Machine Learning Journal entry of this year focuses on the challenge of speaker recognition.

As referenced in the previous entry, Apple says the phrase ‘Hey Siri’ was chosen in part because a number of users were already using it naturally when activating Siri with a hardware button.

The phrase “Hey Siri” was originally chosen to be as natural as possible; in fact, it was so natural that even before this feature was introduced, users would invoke Siri using the home button and inadvertently prepend their requests with the words, “Hey Siri.”

The new entry describes three challenges with activating Siri by voice: the main user saying a similar phrase to Hey Siri, another user saying Hey Siri, or another user saying a similar phrase to Hey Siri.

By limiting activation to the main user’s voice, the design ideally prevents two out of those three issues. The entry touches on the surface of how Apple approaches that problem:

We measure the performance of a speaker recognition system as a combination of an Imposter Accept (IA) rate and a False Reject (FR) rate. It is important, however, to distinguish (and equate) these values from those used to measure the quality of a key-phrase trigger system.

As with each Machine Learning Journal entry, the piece then takes a relatively detailed look at Apple’s implementation before touching on the unsolved problems with the feature: using Hey Siri in a noisy environment or large room.

One of our current research efforts is focused on understanding and quantifying the degradation in these difficult conditions in which the environment of an incoming test utterance is a severe mismatch from the existing utterances in a user’s speaker profile.

Voice-activated Siri started with the iPhone 6 as the piece notes, although the original version only worked when the device was charging. Today Hey Siri works on new iPhones, iPads, and Apple Watches without charging, and it’s the primary controller for HomePod. In the future, the same Hey Siri feature may be how we interact with AirPods as well.

The full entry — which is based on research submitted for the International Conference on Acoustics, Speech, and Signal Processing — offers a rare close look at the amount of thinking behind a feature that hopefully feels natural to the user.