Inside OS X 10.8 Mountain Lion GM: Dictation & speech

Posted:
in macOS edited January 2014
In Mountain Lion, Macs are getting system-wide speech recognition, the same "Dictation" feature Apple gave the new iPad at the beginning of the year. While it works well, it does require a network connection.

Apple's cloud-based Dictation feature, currently supported on the new iPad and as part of the broader Siri voice assistant feature of iPhone 4S, converts speech to text virtually anywhere.

It works by sending audio recordings of captured speech to Apple's servers, which respond with plain text. While it doesn't go as far as the more intelligent Siri, Dictation does intelligently cross reference the names and assigned nicknames of your contacts in order to better understand what you are saying.

Similar to Siri or Dictation on the new iPad, Dictation on Macs running OS X Mountain Lion pops up a simple mic icon when activated, which listens until you click or type the key to finish.

Just as with Siri or dictation on the new iPad, Dictation under Mountain Lion is quite fast and highly accurate, but does require a network connection to function. If you don't have a network connection, the Dictation input icon will simply shake, indicating that it is not available.

Anything you say can be used to improve your dictation

Apple appears to be exercising great caution in highlighting the privacy issues related to using Dictation. The service is turned off by default, and turning it on from System Preferences requires clicking through a notice that various types of local data, including Contacts, are sent to Apple's servers in order to recognize the speech you're trying to convert to text.





Privacy on parade

For an even longer discussion of what's involved, you can click the "About Dictation & Privacy" button, which presents the following explanation:



"When you use the keyboard dictation feature on your computer, the things you dictate will be recorded and sent to Apple to convert what you say into text. Your computer will also send Apple other information, such as your first name and nickname; and the names, nicknames, and relationship with you (for example, ?my dad?) of your address book contacts. All of this data is used to help the dictation feature understand you better and recognize what you say. Your User Data is not linked to other data that Apple may have from your use of other Apple services.

"Information collected by Apple will be treated in accordance with Apple's Privacy Policy, which can be found at www.apple.com/privacy.

"You can choose to turn off the dictation feature at any time. To do so, open System Preferences, click Dictation & Speech, and then click Off in the Dictation section. If you turn off Dictation, Apple will delete your User Data, as well as your recent voice input data. Older voice input data that has been disassociated from you may be retained for a period of time to generally improve Dictation and other Apple products and services. This voice input data may include audio files and transcripts of what you said and related diagnostic data, such as hardware and operating system specifications and performance statistics.

"You can restrict access to the Dictation feature on your computer in the Parental Controls pane of System Preferences."



Using Dictation

Unlike the virtual keyboard on iOS devices, your Mac has no ability to sprout an extra mic key just to initiate Dictation. However, you do have a little used key that Apple has assigned by default to serve as a conveniently accessible way to begin Dictation.

Once activated, the new Dictation feature can be activated by hitting the lower left Function (fn) key twice. This brings up a microphone popup at the insertion point in the current text field, whether in a document, a Finder search field, within a web page, or any other standard region for entering text.



Alternatively, you can also select Start Dictation from the Edit menu within the active app. Below is the Edit menu from TextEdit, showing the shortcut of hitting the function key twice.



You can also assign either the right or left Command (Apple's "propeller" key), or both, to serve as the double-tap signal to begin Dictation, or you can enter some arbitrary other set of keys to trigger the event.

You can also assign Dictation to use either the internal microphone or a plugged in mic, or leave it to its default setting, which is automatic. Plug in an iPhone-style pair of headphones with an integrated mic or connect a dedicated USB microphone or line-in mic, and Dictation will automatically begin using it as the most appropriate input device unless you specify otherwise.

Siri on the horizon?

The new iPad got the Dictation subset of Siri features when it arrived at the beginning of the year, but by the end of 2012, it will join iPhone 4S in getting the full Siri experience, thanks to iOS 6.

This suggests that Mountain Lion Macs will also eventually get an upgrade from basic Dictation to the full Siri feature set, although many of the features of Siri may seem more useful in a mobile device.

Apple is also working to improve upon Siri's bag of tricks for iOS users, having promised new sport scores, movie information with reviews, expanded restaurant responses including table reservations, integration with updating Facebook and Twitter feeds, and the ability to launch apps by name.

The expansion potential for Siri on desktop computers would likely benefit from a different set of features aimed more at voice control of the desktop, such as commands to invoke Mission Control or perform a Spotlight search.

Speech Recognition replaced with Dictation

Apple previously focused on voice control of the desktop environment, rather than accurate voice dictation, in the feature set currently presented in OS X Lion as "Speech Recognition."

Apple's feature set of "speakable items" that could be used to navigate menu bar items and switch between applications was first made part of the Mac system software back in 1993 on the Macintosh Quadra AV, part of an ambitious, pioneering effort to deliver advanced speech recognition under the program known as "PlainTalk."

That was almost 20 years ago, at a time when even the fastest desktop computers lacked the resources needed to rapidly and accurately decipher speech into text. Apple focused on a highly resource efficient design that focused on commands to invoke tasks rather than turning natural voice into paragraphs of text.

In bringing iOS-style Dictation to the Mac, Apple has discontinued the seldom used, rather outdated "Speakable Items" system, which was complex to configure, navigate and use. Dictation is, in contrast, incredibly simple. However, unlike the previous Speech Recognition system, Dictation relies on Apple's cloud services to work. This leaves open an opportunity for dedicated, specialized voice recognition systems that work locally and don't require a network link to function.

Text to Speech

In the other direction of turning text into synthesized speech, Mountain Lion retains the same default System Voice of Alex, which was first introduced in OS X 10.5 Leopard in late 2007. Alex replaced Vicki, the previous default voice that had introduced natural sounding speech in OS X 10.3 Panther in late 2003.

The current release of OS X Lion introduced a series of new, very high quality voices in both American English and other English accents, from British to Australian and Irish, as well as 21 other languages. AppleInsider first broke news of these new optional voices, which can be downloaded from Apple as desired from the System Voice/ Customize popup window.



Among the new voices are are Tom and Jill, both very natural sounding American English voices. As with Dictation, Text to Speech is turned off by default. It is also invoked with the more clumsy Option+Esc key sequence, or the Speech menu hidden away in most apps' Edit menu. Ironically, one very useful task Siri on the Mac could provide is to allow users to convert selected text to high quality speech, using their voice.
«134

Comments

  • Reply 1 of 62
    saareksaarek Posts: 1,540member


    This is a big feature for my wife, she is dyslexic and finds it far easier to type an email or document via speech. We tried DragonDictate but it was not nearly as accurate as the dictation feature built into the iPad.

  • Reply 2 of 62
    mjtomlinmjtomlin Posts: 2,678member

    Quote:

    Originally Posted by saarek View Post


    This is a big feature for my wife, she is dyslexic and finds it far easier to type an email or document via speech. We tried DragonDictate but it was not nearly as accurate as the dictation feature built into the iPad.



     


    DragonDictate is a Nuance product. The Nuance speech recognition engine is a learning system. The more you use it, the more accurate it gets. Since Apple's use of the engine is through their servers, it of course would be far more accurate after the millions and millions of translations it has performed since last October.


     


    I am amazed at how much better it has become since its release; it almost always recognizes what I say.

  • Reply 3 of 62
    blitz1blitz1 Posts: 442member
    Why recognition is done on the Apple Servers iso at home is beyond me.
  • Reply 4 of 62
    zozmanzozman Posts: 393member


    The only thing I'm worried about (even tho i do use dictate on my iPad & iPhone & soon on my macs) how will this compete in the broader market, now that google has introduced offline dictation in Jelly Bean.


    I hope that at some point we have the option to have an offline dictate, even if it isn't as accurate as being online, id like it to be there.

  • Reply 5 of 62
    lightknightlightknight Posts: 2,312member

    Quote:

    Originally Posted by Zozman View Post


    The only thing I'm worried about (even tho i do use dictate on my iPad & iPhone & soon on my macs) how will this compete in the broader market, now that google has introduced offline dictation in Jelly Bean.


    I hope that at some point we have the option to have an offline dictate, even if it isn't as accurate as being online, id like it to be there.





    Will now Apple make itself incompatible with Dragon? I do NOT want to have the choice forced between "no voice recognition" and "Internet-based voice recognition". I want Dragon, with its ondisk voice database trained to my voice... besides, it wasn't exactly free, so I'd be annoyed to lose ability to use it.


     


    Apple already has used that kind of annoying tactic. I just hope I'm needlessly scared.

  • Reply 6 of 62
    There's still one important question - will Alex finally be able to tell me some new jokes?!

    Actually, note to self, I need to try that feature in other languages. I would assume it still works. Does anyone know if the jokes are the same?
  • Reply 7 of 62
    umrk_labumrk_lab Posts: 550member
    one more nail in the keyboard coffin, Microsoft, uh ?
  • Reply 8 of 62
    umrk_labumrk_lab Posts: 550member

    Will now Apple make itself incompatible with Dragon? I do NOT want to have the choice forced between "no voice recognition" and "Internet-based voice recognition". I want Dragon, with its ondisk voice database trained to my voice... besides, it wasn't exactly free, so I'd be annoyed to lose ability to use it.

    Apple already has used that kind of annoying tactic. I just hope I'm needlessly scared.

    dragon constantly crashed my Mac. i won't regret it
  • Reply 9 of 62
    asciiascii Posts: 5,936member
    Regarding the voices, some of them have 2 levels of quality, the lower end one is called "compact" and I think that might be what is downloaded by default. If you go in to VoiceOver Utility in /Applications/Utilities you can download a higher quality version. But be warned the high quality voices are 375+ MB.
  • Reply 10 of 62
    irelandireland Posts: 17,798member
    Moira, lol. No one here talks like that.
  • Reply 11 of 62
    rtm135rtm135 Posts: 310member


    Microsoft Windows has had extensive speech capabilities since Vista, including dictation, vocal feedback, and extensive control of the operating system via commands, all without requiring an internet connection.  Without Siri, why does Apple require an internet connection?

  • Reply 12 of 62
    asciiascii Posts: 5,936member
    rtm135 wrote: »
    Microsoft Windows has had extensive speech capabilities since Vista, including dictation, vocal feedback, and extensive control of the operating system via commands, all without requiring an internet connection.  Without Siri, why does Apple require an internet connection?

    That's a good question. I think they are taking a scientific approach: by doing it this way they will have a massive database of phrases the computer didn't understand, which can be used to improve the product. In future, once it is "almost perfect," there will be no benefit doing it on the server-side any more and they can deploy a purely client side solution.
  • Reply 13 of 62
    tallest skiltallest skil Posts: 43,388member
    rtm135 wrote: »
    Microsoft Windows has had extensive speech capabilities since Vista, including dictation, vocal feedback, and extensive control of the operating system via commands, all without requiring an internet connection.  Without Siri, why does Apple require an internet connection?

    Because Apple's actually works.
  • Reply 14 of 62
    rtm135rtm135 Posts: 310member


    That's an old video.  It works fine now.

  • Reply 15 of 62
    solipsismxsolipsismx Posts: 19,566member
    I love dictation in Mountain Lion. In fact i'm using it right now. OK now how do I turn it off. No that's not it. Ah there it...
  • Reply 16 of 62
    brutus009brutus009 Posts: 356member

    Quote:

    Originally Posted by ascii View Post





    In future, once it is "almost perfect," there will be no benefit doing it on the server-side any more and they can deploy a purely client side solution.


     


    I hope so.

  • Reply 17 of 62
    solipsismxsolipsismx Posts: 19,566member
    mjtomlin wrote: »
    DragonDictate is a Nuance product. The Nuance speech recognition engine is a learning system. The more you use it, the more accurate it gets. Since Apple's use of the engine is through their servers, it of course would be far more accurate after the millions and millions of translations it has performed since last October.

    I am amazed at how much better it has become since its release; it almost always recognizes what I say.

    t's great that it's learning on the aggregate and with specific users but I wish they would do more faster. For instance, during those initial linen-background setup screens on a new iOS 5+ device when you acknowledge that you want to use the service I wish it would have carefully constructed paragraph by linguists to get a foundation for your speech pattern by recording how you say most of the phonemes for common words for a given language.

    450

    Outside of that my only other major request to update Contacts on iOS and Mac OS to allow the use of the voice option already built into the vCard system. I hate that I have, what I think are common given and surnames, that are pronounced incorrectly. Let me hit a button to record the name and then Siri will get a really good idea of how to say it back to me next time.

    blitz1 wrote: »
    Why recognition is done on the Apple Servers iso at home is beyond me.

    How big is the full software package to make it a local service? How would that affect the $20 ML download from the Mac App Store? It's not just the amount of data for all those many millions of Macs but how Apple and Nuance have licensed the service. How well does it perform locally with, say, a CULV processor in oldest MBA that can get ML compared to the server side connection with a standard broadband connection? What about the most common Mac used today or presumed used in the future?
  • Reply 18 of 62
    flowneyflowney Posts: 53member


    I hope that Apple exposes this technology to third parties via an API (may already be available).  This could power speech-to-text apps that create captions and subtitles for video,  I can see this  in iMovie (for home movies) and in conferencing software such as Bb Collaborate and in webcasting apps such as WireCast.  Legislation relating to media accessibility is being enforced more rigorously and accessibility groups are suing non-compliant entities.

  • Reply 19 of 62


    So Apple is up front in disclosing exactly what information is transferred to it's servers and the feature is turned off by default. Is there any other companies that have features with possible privacy concerns that default to the service turned off? I'm looking at you Google and Facebook.

  • Reply 20 of 62


    How much does anyone want to bet that the new MacBook Pro has the hardware for full Siri support?

Sign In or Register to comment.