The United States Patent & Trademark Office (USPTO) on Tuesday granted a new Apple patent related to advanced text-to-speech features that may or may not be fully realized in future mobile devices and computers or brought to the company’s existing lineup via a software update. Apple first filed for this patent back in February of 2006, almost a year before the original iPhone announcement. This suggests the company sought to improve the quality of machine-generated speech on its devices well ahead of this month’s introduction of Siri, whose text-to-speech and speech-to-text interfaces are outlined in Apple’s patent applications from 2009 and 2011. Even though Mac OS X has had text-to-speech capabilities for years, the quality of machine-generated voice and pronunciation wouldn’t improve notably until Lion was let out of the cage this summer. Lion’s high-quality text-to-speech led watchers to suspect that Apple licensed Nuance technology. Speaking to Patently Apple’s Jack Purcher via email, he told me “in time we’ll figure out (or not) as to why Apple felt that they had to go to Nuance to get the job done, but it does indicate that they just don’t have what it takes alone to have a viable solution for the iPhone”.
This month’s introduction of the iPhone 4S and its personal assistant Siri – with text-to-speech as one of its components – is another indication of a possible licensing of Nuance technology for a wide-scale roll out across Apple’s entire lineup. The new patent is entitled “Multi-unit approach to text-to-speech synthesis” and describes the process of matching units of a received input string to a library of audio segments that include metadata such as articulation relationships between phrases and words. Noting that speech from conventional text-to-speech applications typically sounds artificial or machine-like when compared to human speech, Apple claims that its invention provides more human sounding speech. Plus, it also supports a client-server architecture, a perfect fit for iCloud.
As Siri co-founder Norman Winarsky exclusively told 9to5Mac, Siri’s modular architecture allows Apple to replace Nuance’s text-to-speech component with any other speech synthesis technology – including, eventually, its own. Given Apple’s reluctance to use technologies it doesn’t own, it’s fair to speculate they’re at least researching a potential Nuance replacement for future iOS and Mac OS X releases. It wouldn’t be unheard of. Remember, Apple booted the Skyhook location-gathering service in April 2010, replacing it with its own crowd-sourced solution that would later spark the iPhone location tracking scandal. What else is there to be excited about Apple’s patent?
For starters, it’s mostly a highly technical read. However, Apple does praise in numerous places the quality of the resulting speech which takes into account prosody characteristics, including the tune and rhythm of the speech. Even better, Apple’s solution can be trained with a human voice, resulting in even more convincing speech. This also means the system could, theoretically, learn from and adapt itself to the user’s voice in a manner the AI-driven Siri gets better over time the more you use it. The company says its text-to-speech can run on “both general and special purpose microprocessors, and any one or more processors of any kind of digital computer”, indicating high sophistication and optimization. The issued patent credits Apple engineers Matthias Neeracher, Devang K. Naik, Kevin B. Aitken, Jerome R. Bellegarda and Kim E.A. Silverman. To retrieve a detailed description of the patent, type in its ID number 8036894 into the USPTO search engine.