The United States Patent & Trademark Office (USPTO) on Tuesday granted a new Apple patent related to advanced text-to-speech features that may or may not be fully realized in future mobile devices and computers or brought to the company’s existing lineup via a software update. Apple first filed for this patent back in February of 2006, almost a year before the original iPhone announcement. This suggests the company sought to improve the quality of machine-generated speech on its devices well ahead of this month’s introduction of Siri, whose text-to-speech and speech-to-text interfaces are outlined in Apple’s patent applications from 2009 and 2011. Even though Mac OS X has had text-to-speech capabilities for years, the quality of machine-generated voice and pronunciation wouldn’t improve notably until Lion was let out of the cage this summer. Lion’s high-quality text-to-speech led watchers to suspect that Apple licensed Nuance technology. Speaking to Patently Apple’s Jack Purcher via email, he told me “in time we’ll figure out (or not) as to why Apple felt that they had to go to Nuance to get the job done, but it does indicate that they just don’t have what it takes alone to have a viable solution for the iPhone”.

This month’s introduction of the iPhone 4S and its personal assistant Siri – with text-to-speech as one of its components – is another indication of a possible licensing of Nuance technology for a wide-scale roll out across Apple’s entire lineup. The new patent is entitled “Multi-unit approach to text-to-speech synthesis” and describes the process of matching units of a received input string to a library of audio segments that include metadata such as articulation relationships between phrases and words. Noting that speech from conventional text-to-speech applications typically sounds artificial or machine-like when compared to human speech, Apple claims that its invention provides more human sounding speech. Plus, it also supports a client-server architecture, a perfect fit for iCloud.

As Siri co-founder Norman Winarsky exclusively told 9to5Mac, Siri’s modular architecture allows Apple to replace Nuance’s text-to-speech component with any other speech synthesis technology – including, eventually, its own. Given Apple’s reluctance to use technologies it doesn’t own, it’s fair to speculate they’re at least researching a potential Nuance replacement for future iOS and Mac OS X releases. It wouldn’t be unheard of. Remember, Apple booted the Skyhook location-gathering service in April 2010, replacing it with its own crowd-sourced solution that would later spark the iPhone location tracking scandal. What else is there to be excited about Apple’s patent?

For starters, it’s mostly a highly technical read. However, Apple does praise in numerous places the quality of the resulting speech which takes into account prosody characteristics, including the tune and rhythm of the speech. Even better, Apple’s solution can be trained with a human voice, resulting in even more convincing speech. This also means the system could, theoretically, learn from and adapt itself to the user’s voice in a manner the AI-driven Siri gets better over time the more you use it. The company says its text-to-speech can run on “both general and special purpose microprocessors, and any one or more processors of any kind of digital computer”, indicating high sophistication and optimization. The issued patent credits Apple engineers Matthias Neeracher, Devang K. Naik, Kevin B. Aitken, Jerome R. Bellegarda and Kim E.A. Silverman. To retrieve a detailed description of the patent, type in its ID number 8036894 into the USPTO search engine.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

2 Responses to “Apple researching its own text-to-speech, perhaps so it can do without Nuance”

  1. […] is working on something big for the system’s next update. This isn’t really a new idea: rumors have been swirling since 2011 that Apple was investigating its own speech-to-text solution. That same year, Siri co-founder […]

    Like

  2. […] 2010 年 4 月,苹果以 1.5~2.5 亿美元收购了 Siri,吸纳了Siri的24名员工,此后一直使用Nuance的技术服务。不过,从2011年开始,就有传言表示,苹果将舍弃Nuance,自己做语音识别。当时Siri的联合创始人Norman Winarsk就已经有了危机感,接受媒体采访时就透露苹果很可能自己来做语音识别或者收购一家语音识别公司,只不过暂时还没有找到比Nuance更好的技术。 […]

    Like