The iPod shuffle’s new voice feature is a great navigation tool for screenless browsing of your music collection. But how does that little device read all of your music titles/artist/genre, etc. to you? That takes a lot of CPU power for a device the size of a tie clip.
The answer? It doesn’t. All of the voice rendering will be done on the PC or Mac in iTunes 8.1. That’s why the PC voice will be a woman (built in voice system) and the Mac will be a man’s voice. iTunes 8.1 will put some extra voice data in your music files (see below) to include the name of the band and title of the song. While this will be relatively small in size, the space changes will take up more space than before, thus growing the size of your library. Eventually, Apple might include this audio data in itunes downloads. Heck, they might even have Jimmy Page’s voice tell you the next song is Led Zepplin. A value add!
Apple filed a patent for this a few years ago as well…see more below…
digg_url = ‘http://9to5mac.com/itunes-8-1’;
From the Patent Application
In order to achieve portability, many hand-held devices use user interfaces that present various display screens to the user for interaction that is predominantly visual. Users can interact with the user interfaces to manipulate a scroll wheel and/or a set of buttons to navigate display screens to thereby access functions of the hand-held devices. However, these user interfaces can be difficult to use at times for various reasons. One reason is that the display screens tend to be small in size and form factor and therefore difficult to see. Another reason is that a user may have poor reading vision or otherwise be visually impaired. Even if the display screens can be perceived, a user will have difficulty navigating the user interface in “eyes-busy” situations when a user cannot shift visual focus away from an important activity and towards the user interface. Such activities include, for example, driving an automobile, exercising, and crossing a street.
It is noted that text strings that correspond to standard text strings can have pre-recorded audio files. Such text strings may correspond to common user interface controls, such as “play”, “stop”, “previous”, etc., and to common menu items such as “Music”, “Extras”, “Backlight.” These audio files can be created using a voice talent or speech synthesized from the voice talent’s recordings. The other text displayed as part of the media player user interface that is usually user specific, such as contacts and customized playlist names can all be synthesized by building a voice from the voice talent recordings. This provides consistency by having the same voice for all textual data to be presented to the user.