or Connect
AppleInsider › Forums › General › General Discussion › Apple looking to add character to text-to-speech voices
New Posts  All Forums:Forum Nav:

Apple looking to add character to text-to-speech voices

post #1 of 12
Thread Starter 
An Apple patent application discovered on Thursday outlines an invention that uses metadata from emails, texts and other communications to determine how a synthesized voice sounds in a text-to-speech (TTS) system.

Text-to-speech
Source: USPTO


The filing, titled "Voice assignment for text-to-speech output," looks to create "speaker profiles" which can change the voice characteristics of TTS output to match parsed-out metadata like age, sex, dialect and other variables.

As noted by the application, many systems exist today to aid the visually impaired, including the system on Apple's iPhone, however most TTS engines "generate synthesized speech having voice characteristics of either a male speaker or a female speaker. Regardless of the gender of the speaker, the same voice is used for all text-to-speech conversion regardless of the source of the text being converted." Apple's invention proposes a different solution.

Instead of hearing the same voice for every message, the invention obtains metadata "directly from the communication or from a secondary source identified by the directly obtained metadata" to create the most suitable speaker profile.

According to the patent filing, "Providing a speech output that is associated with a speaker profile allows speaker recognition while providing a more enjoyable and entertaining experience for the listener."

An example is provided in which a user receives a message from "Charles Prince," who has an email address of charles.prince@isp.uk, regarding a party for "Albert." In this case, the system could use the ".uk" address as primary metadata. Secondary metadata can be gathered if a contact card is attached to the message, or if Charles Prince's information is already in the user's address book.

Text-to-speech
Metadata samples.


The data from the text and the corresponding metadata are then fed into a TTS engine, which assigns a speaker profile to convert the text into speech.

After converting each word and phonetic transcription in the text to distinct sounds that comprise a given language, the TTS engine then divides and marks rhythmic sounds like phrases, clauses and sentences.

In some implementations, speech can be created by piecing together pre-recorded voice fragments, including sounds, entire words or even sentences, that are stored on a mobile device or in an off-site database.

In other implementations, the TTS engine can include a synthesizer that "incorporates a model of the human vocal tract or other human voice characteristics to create a synthetic speech output according to the speaker profile."

One of the most interesting iterations notes that "a speaker's voice can be recorded and analyzed to generate voice data."

From the patent filing's description:

For example, the speaker's voice can be recorded by a recording application running on the device or during a telephone call (with permission). The voice characteristics of the speaker can be obtained using known voice recognition techniques. In this implementation, a speaker profile may not be necessary as the speaker's name can be directly associated with voice data stored in voice database.


As for output, the system may pick the ".uk" email address to use as primary metadata, taking contact card information like a birthday to determine sex and age, to subsequently output a speaker profile matching an older male with a British accent. Charles Prince's physical address, phone number, or picture can also be used to determine a speaker profile. The more metadata available, the more refined the output.

TTS Flowchart
Flowchart of TTS system.


It is unclear if Apple plans to deploy such a system, however the company currently has a similar, albeit less advanced, system in place with Siri. While the feature is limited to certain regions, Siri has an option to choose dialects like "English (United States)" or "English (United Kingdom)" to recognize incoming voice commands, as well as provide responses in the selected accent.
post #2 of 12
"Hello, sweetheart. I'm so excited by your picture, please write me asap on sexygirl.apple.xxx" ?

Social Capitalist, dreamer and wise enough to know I'm never going to grow up anyway... so not trying anymore.

 

http://m.ign.com/articles/2014/07/16/7-high-school-girls-are-kickstarting-their-awa...

Reply

Social Capitalist, dreamer and wise enough to know I'm never going to grow up anyway... so not trying anymore.

 

http://m.ign.com/articles/2014/07/16/7-high-school-girls-are-kickstarting-their-awa...

Reply
post #3 of 12
Very clever invention! That's a patent I'd love to see working! Adapting voice synthesis from metadata could happen in a first moment, then, after people get used to the tech, voice recording for synthesis could be used. The phone would need to be extremely secure though, I wouldn't want speech patterns for all my contacts get lost in the wild.
post #4 of 12
Quote:
Originally Posted by ClemyNX View Post

Very clever invention! That's a patent I'd love to see working! Adapting voice synthesis from metadata could happen in a first moment, then, after people get used to the tech, voice recording for synthesis could be used. The phone would need to be extremely secure though, I wouldn't want speech patterns for all my contacts get lost in the wild.

This is great and all but it seems very "sci-fi". I'd expect to see many other changes in how text-to-speech works long before this patent gets implemented.

For starters, I hate that artists in my Music are spoken incorrectly when the name is well know. This is something that the system should have a digital phonetic spelling of for all artists so that it can be as accurate as possible.

Next, I'd like for the system to allow me to record the name of people in my contacts. Not to have my recording is played back to me when Siri reads it off but so that the pattern I use can processed and used to get a playback from the system. For instance, the name Jim is being pronounced as |gim| by Siri. But even names it does get right for the masses might be unique for different dialects or other languages or cultures and it would be nice if Siri tried to know the proper one once being corrected. This is much like the first one expect it's more individual and therefore would be harder to implement.

Finally, I'd like for Apple to get with linguists to create a paragraph that details all phonemes of a language so that when you first sign up for Siri it will have you speak each sentence and will record every part of your voice which it will then process and store with your on-line profile so that it will better understand your accent, your dialect, and/or any speech aberrations you may have.

"The real haunted empire?  It's the New York Times." ~SockRolid

"There is no rule that says the best phones must have the largest screen." ~RoundaboutNow

Reply

"The real haunted empire?  It's the New York Times." ~SockRolid

"There is no rule that says the best phones must have the largest screen." ~RoundaboutNow

Reply
post #5 of 12

Speech to text?

Citing unnamed sources with limited but direct knowledge of the rumoured device - Comedy Insider (Feb 2014)
Reply
Citing unnamed sources with limited but direct knowledge of the rumoured device - Comedy Insider (Feb 2014)
Reply
post #6 of 12

It would be wonderful to have "speaker profiles" for everyone I know, so I can send "voice" messages from them to others, destroying their relationships and lives. That's going to be awesome.

post #7 of 12
Quote:
Originally Posted by SolipsismX View Post


Finally, I'd like for Apple to get with linguists to create a paragraph that details all phonemes of a language so that when you first sign up for Siri it will have you speak each sentence and will record every part of your voice which it will then process and store with your on-line profile so that it will better understand your accent, your dialect, and/or any speech aberrations you may have.

 

That would be a way to do it, but it's not very elegant. The user shouldn't notice when Siri is learning about his speech characteristics.

 

 

Quote:
Originally Posted by NotScott View Post

It would be wonderful to have "speaker profiles" for everyone I know, so I can send "voice" messages from them to others, destroying their relationships and lives. That's going to be awesome.

 

"Darling, I'm leaving you, I think our sex life is no longer what it used to be"

Sent from granma to granpa.

post #8 of 12
Quote:
Originally Posted by NotScott View Post

It would be wonderful to have "speaker profiles" for everyone I know, so I can send "voice" messages from them to others, destroying their relationships and lives. That's going to be awesome.

It's a shame Michael Crichton left us. He seemed to always get a novel out about the latests tech advance early enough for it to seem like sci-fi. I'm sure he'd have had fun with this topic as he did with images that could be manipulated ... OMG ... at the pixel level!!! (Remember when that seemed sci-fi?).
Been using Apple since Apple ][ - Long on AAPL so biased
nMac Pro 6 Core, MacBookPro i7, MacBookPro i5, iPhones 5 and 5s, iPad Air, 2013 Mac mini, SE30, IIFx, Towers; G4 & G3.
Reply
Been using Apple since Apple ][ - Long on AAPL so biased
nMac Pro 6 Core, MacBookPro i7, MacBookPro i5, iPhones 5 and 5s, iPad Air, 2013 Mac mini, SE30, IIFx, Towers; G4 & G3.
Reply
post #9 of 12

"My wife's real voice isn't a sexy as the speaker profile I created for her. So... phone sex it is!"

post #10 of 12
It would be cool if they turned recordings of Steve Jobs into an iPhone voice. In Maps:

"The road ahead is a well-worn path, take the next exit."
"There's an insanely great restaurant coming up in 5 miles."
"You're driving it wrong. Make a U-turn."
"You can't connect the dots looking forward but stop checking your makeup in the mirror and keep your eyes on the road ahead"
"Innovation distinguishes between a leader and a follower so innovate and overtake that bus in front of you or you're going to be late."
"Being the richest man in the cemetery doesn't matter. Going to bed at night saying you've done something wonderful is what matters so slow down, you are breaking the speed limit."
"Boom. You have reached your destination."
post #11 of 12
Originally Posted by Marvin View Post
It would be cool if they turned recordings of Steve Jobs into an iPhone voice. In Maps:
"The road ahead is a well-worn path, take the next exit."
"There's an insanely great restaurant coming up in 5 miles."
"You're driving it wrong. Make a U-turn."
"You can't connect the dots looking forward but stop checking your makeup in the mirror and keep your eyes on the road ahead"
"Innovation distinguishes between a leader and a follower so innovate and overtake that bus in front of you or you're going to be late."
"Being the richest man in the cemetery doesn't matter. Going to bed at night saying you've done something wonderful is what matters so slow down, you are breaking the speed limit."
"Boom. You have reached your destination."

 

"You know… I think there's a better way to go." (for when there's traffic and it's rerouting you)

"The back of your car looks better than the front of theirs!" (at random times)

Originally Posted by Slurpy

There's just a TINY chance that Apple will also be able to figure out payments. Oh wait, they did already… …and you’re already fucked.

 

Reply

Originally Posted by Slurpy

There's just a TINY chance that Apple will also be able to figure out payments. Oh wait, they did already… …and you’re already fucked.

 

Reply
post #12 of 12

The thing about TTS for blind users is that a synthesized voice which has no inflections or tonal variations can be understood at very high rates, up to 400 words per minute. If a user had to listen to a different voice for each email they would need to slow down the rate to normal speaking pace which is ok if you want to listen to voice for the sake of entertainment but for pure transmission of information this new patent would not be practical.

Life is too short to drink bad coffee.

Reply

Life is too short to drink bad coffee.

Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: General Discussion
AppleInsider › Forums › General › General Discussion › Apple looking to add character to text-to-speech voices