MacSpeech's Dictate: high quality voice recognition for the Mac

Posted:
in Mac Software edited January 2014
MacSpeech at this week's Macworld Expo unveiled Dictate, its new speech recognition and voice command software currently in beta and slated for release mid February. The new product replaces and improves upon the existing iListen.



Dictate is now based upon the highly accurate speech recognition engine developed by Naturally Speaking; iListen was based upon technology licensed from Philips. MacSpeech supplies the user interface and rich integration with AppleScript and other Mac technologies.



A $29 crossgrade is available for any registered iListen customers who have purchased or obtain a copy of iListen in 2008. Any registered iLife customer from 2007 and earlier can pre-order a crossgrade for $79.



Speech Recognition Accuracy



Representatives demonstrated the accuracy and intelligence of the new system by dictating live into the system. After being switched on, the system allows the user to both dictate and issue voice commands. It determines which you are doing by analyzing the context of words. Dictate only requires a 5 minute profile creation session, which profiles the mic used and then analyzes the speaker's speech patterns and diction. In addition, the user can supply text that the software will analyze for unfamiliar words, and then speak those words to expand the system's dictionary.



The software's advanced recognition engine allows the software to accurately present natural speech dictation, correctly interpreting text such as "the patient was in a coma, comma" or "the end of the medieval period period." It also correctly formatted phone numbers and currency amounts, complete with a dollar sign, a thousands comma, and a decimal point, even when spoken in different ways, such as "five thousand dollars and twenty cents."



Dictate can enter text into any application that supports text entry from the keyboard, even including Windows apps running in a virtual environment such as Parallels or Fusion. To take a quick dictation without opening another application, Dictate also provides a simple text entry window of its own.



The software will support a variety of English language families, including American English, UK English, and Australian, Indian, and SE Asian variants. MacSpeech also has immediate plans to release German, Italian, Spanish, and French versions, and can match developments in new speech engine models released by Naturally Speaking.







Voice Control



In addition to entering text, Dictate can also be used to control the desktop interface. Reps demonstrated the software being used to launch applications, edit entered text, even open Safari bookmarks.



When a new application is installed, Dictate rapidly scans it to set up a table of commands, allowing the user to launch it by name and then activate any of its menu commands by voice. The voice command features can also be extended using AppleScript. Among other features, Dictate can also be used to launch Spotlight and rapidly search the system.



Dictation Hardware



Dictate ships with a microphone, but can be used with any standard mic. Company reps recommended against using a Bluetooth mic because that protocol limits the bandwidth of sound input to 8 KHz, reducing the overall accuracy of dictation. Other wireless microphones, such as professional quality RF equipment, can be used at full quality.

«1

Comments

  • Reply 1 of 26
    coolfactorcoolfactor Posts: 1,582member
    Glad to see developments in this area. I expect the next version of OS X to better support speech recognition, which falls in line with the need for voice dialing on the iPhone.
  • Reply 2 of 26
    irelandireland Posts: 17,686member
    Apple should buy these guys up right away.
  • Reply 3 of 26
    solipsismsolipsism Posts: 25,726member
    Quote:
    Originally Posted by Ireland View Post


    Apple should buy these guys up right away.



    Agreed.





    Can it also read text back to you using Apple's software of their own voice synthesizers?
  • Reply 4 of 26
    bageljoeybageljoey Posts: 1,763member
    I too am glad to see progress, but I, personally, am waiting for subvocal input...
  • Reply 5 of 26
    Quote:
    Originally Posted by Ireland View Post


    Apple should buy these guys up right away.



    MacSpeech is using technology from Nuance and essentially the same technology currently present in the Dragon Naturally Speaking engine. Nothing too novel there I think



    I wish Apple would realize the importance of speech recognition too and start investing money in it like it did back in the 90s. The potential of speech recognition for enabling voice commands and accurate dictation in devices like the iMac, iPhone and iPod is huge .
  • Reply 6 of 26
    Quote:
    Originally Posted by Bageljoey View Post


    I too am glad to see progress, but I, personally, am waiting for subvocal input...



    I can't believe there is already a patent out for that!



  • Reply 7 of 26
    There were rumors that Microsoft's 1997 $150,000 investment in Apple came with some conditions including that Apple not compete in the area of voice recognition. I could imagine Apple agreeing to these terms given the state of the technology at that time. However, it's hard to believe, even if there was such an agreement, that there is not some sunset on the period of time until Apple can enter this arena. Hopefully, we will see voice recognition addressed by Apple soon.
  • Reply 8 of 26
    solipsismsolipsism Posts: 25,726member
    Quote:
    Originally Posted by penchanted View Post


    There were rumors that Microsoft's 1997 $150,000 investment in Apple came with some conditions including that Apple not compete in the area of voice recognition. I could imagine Apple agreeing to these terms given the state of the technology at that time. However, it's hard to believe, even if there was such an agreement, that there is not some sunset on the period of time until Apple can enter this arena. Hopefully, we will see voice recognition addressed by Apple soon.



    Since they no longer install IE as the default browser I'd say whatever deal was made is now complete.



    edit: It was a 150,000 shared then valued at $150 million. MS sold those shares pretty much as soon as they could.
  • Reply 9 of 26
    Quote:
    Originally Posted by penchanted View Post


    There were rumors that Microsoft's 1997 $150,000 investment in Apple came with some conditions including that Apple not compete in the area of voice recognition. I could imagine Apple agreeing to these terms given the state of the technology at that time. However, it's hard to believe, even if there was such an agreement, that there is not some sunset on the period of time until Apple can enter this arena. Hopefully, we will see voice recognition addressed by Apple soon.



    That would explain why Microsoft has gotten really good on speech recognition lately



    http://www.youtube.com/watch?v=2Y_Jp6PxsSQ



    "I think it's picking up a little bit of echo here."



  • Reply 10 of 26
    hmurchisonhmurchison Posts: 12,310member
    It's nice to see iListen drop that turd of a engine and move to Nuance technology. If Apple isn't interested in Spech Rec at a serious level they're on crack. I wince everytime I see a mini chiclet qwerty keyboard on a phone. Stone Age comes to mind.



    I like the price of Dictate. It leads me to believe that they are basically delivering Dragon Preferred on Mac. However I'd love to see features that come in Professional. There needs to be robust support for scripting and creating Macros. That's where the fun...and efficiency really kick in.
  • Reply 11 of 26
    I'm also glad to see things improving in this area, although all my Macs still use IBM chips and won't work with the new software.



    I use to use IBM's software for speech recognition. It was very good at learning new words, in fact the accuracy was often better for some of the long words I had to teach the program than for small words. Having to dictate into a separate app and copy over into a word processor etc was a pain.



    I own iListen, but have never found it useful. It worked very well for simple language but it completely failed to learn some of the jargon I use my writing. That killed it for me.



    Look forward to trying the new version one day, which combined with the ever increasig speeds of modern Macs should one day make this software actually useful.
  • Reply 12 of 26
    Quote:
    Originally Posted by hmurchison View Post


    There needs to be robust support for scripting and creating Macros. That's where the fun...and efficiency really kick in.



    Doesn't the old iListen (and presumably Dictate) support the use of Macros?
  • Reply 13 of 26
    1) cbt:



    apple leaves soooo much money on the table by ignoring the no-brainers like cbt -- especially for second language learning!



    apple needs to created a common reference point for language learning by licensing the Dragon voxreco engine (as well as pay the NRE costs for cepstral to port their engine to asian languages!) ... along with oem-ing the various pen input tools (cf: http://www.yale.edu/chinesemac/pages/palm.html) -- and yes, a BT stylus is inevitable (hopefully like an Annato "digital pen"!)



    then hopefully the cbt big players - like rosetta stone and plecto - will deliver their desktop experience in the pre-eminent mobile platform!



    however, the biggest opportunity is for ASIAN LANGUAGES to be added to reco engines on the mac!



    -- however dont hold your breath: apple has a LONG history of pissing away golden opportunities.



    2) mic:



    it is unclear if the current limitations of BT (in terms of audio quality) will be solved in the new version 2.1 (which the Airbook supports!).



    the info on wikipedia does not stress whether the 8khz encoding is hard-wired into the spec or not ...



    certainly the higher bitrates of EDR (2.1 Mbps) are entirely ample to support high fidelity stereo (i'm looking at you iMuffs!), so one would presume that high quality audio-in (22Khz @ 16 bit = 2.2 Mbps) would also not be a problem for EDR!? (if it were custom hardware programmed to using the whole available channel, so it did mot necessarily not be limited to a pre-defined max link rate for audio).



    so, hey RF bitheads! -- some clarification would be useful!



    cf: http://en.wikipedia.org/wiki/Bluetooth

    cf: http://www.bluetooth.com/Bluetooth/T...__Baseband.htm



    note: the current spec does specifically state that BT hardware is expected to provide for hard-wired support for audio "at least" at 64kbps (or equivalent quality) ...



    "On the air-interface, either a 64 kb/s log PCM (Pulse Code Modulation) format (A-law or μ-law) may be used, or a 64 kb/s CVSD (Continuous Variable Slope Delta Modulation) may be used. The latter format applies an adaptive delta modulation algorithm with syllabic companding. The voice coding on the line interface is designed to have a quality equal to or better than the quality of 64 kb/s log PCM. The table below summarizes the voice coding schemes supported on the air interface."



    However, the Av profiles seem to support a wide variety of codecs that could make good use of that constrained audio bandwidth (64k?) ... especially mpeg4 audio (AAC ... which is actuallly an enhancement of mpeg2 audio, but let's not quibble ;-)



    "This (stereo) profile relies on GAVDP. It includes mandatory support for low complexity subband codec (SBC) and supports optionally MPEG-1,2 Audio, MPEG-2,4 AAC and ATRAC.



    The audio data is compressed in a proper format for efficient use of the limited bandwidth. Surround sound distribution is not included in the scope of this profile."



    soooo, the upshot of a quick perusal of the BT spec seems hopeful but not definitive: 8khz sampling is used only for legacy encoding (ie for PCM equivalents such as alpha/mu-law) ... the physical link rate reserved for audio -- (64k?) -- would be enough to handle a reasonable sample rate (16Khz) and a reasonable quantization (16 bits) => 196Kbps raw with at least 3X compression for AAC produces a bitrate ≤ 64Kbps! (ie within the link rate reserved for audio).



    if this is correct - then the optional codecs in the BT spec could deliver sufficent quality for vox reco!?



    again, the RF bitheads can help us all understand the options (and the commercially available chipsets) ;-)
  • Reply 14 of 26
    Considering that Nuance uses this engine for SMS on mobiles, I hope MacSpeech adapts this to the iPhone. It would require the use of the mic for this purpose... an Apple update for the mic to function while phone inactive?
  • Reply 15 of 26
    My dreams are coming true.



    Computer, hello computer!



    http://uk.youtube.com/watch?v=v9kTVZiJ3Uc



    Apple seem far more alligned with the physical manuipulation of a machine not verbal at the moment
  • Reply 16 of 26
    willrobwillrob Posts: 203member
    MacSpeech claims (on their web site) that bluetooth is not currently accurate for dictation. The second generation iPod Nano however can be fitted with a mic and used as a portable dictation unit which iListen can "type." [the Nano actually comes with voice recording capabilities built it] Progressing to the iPhone would be an obvious step, IF Apple allows it. The Touch unfortunately has no mic and currently MacSpeech doesn't support it, so it's not clear if it has the same internal ability to record. There is a third party app (jailbreak required) that records the voice if one has a specially made mic? but no way to translate that into type automatically (so far).



    If you order iListen now (Buy.com has lowest price of $122.99, with mic), you can crossgrade to the new engine for $29 once it ships. I'm not sure where this article is getting the $140 price; there's no sign of that offer on the MacSpeech website.
  • Reply 17 of 26
    Quote:
    Originally Posted by willrob View Post




    If you order iListen now (Buy.com has lowest price of $122.99, with mic), you can crossgrade to the new engine for $29 once it ships. I'm not sure where this article is getting the $140 price; there's no sign of that offer on the MacSpeech website.



    I bought/pre-ordered "Dictate" for the $149 price at the MacWorld show earlier this week. I don't know whether this was a show-only price, or whether you can order it at this price directly from MacSpeech.com (you could try calling them?). I would guess that if you can order it, the price may only be good during MWSF (i.e. not after today).
  • Reply 18 of 26
    Quote:
    Originally Posted by Delfoniq View Post


    That would explain why Microsoft has gotten really good on speech recognition lately



    http://www.youtube.com/watch?v=2Y_Jp6PxsSQ



    "I think it's picking up a little bit of echo here."







    Hahaha, I almost crapped my pants when I saw that! You made my day! Maybe I should give Vista a try, seems to be really funny...
  • Reply 19 of 26
    eckingecking Posts: 1,588member
    This is pretty cool, I'd love something like this to use with final draft.
  • Reply 20 of 26
    kkerstkkerst Posts: 330member
    Roger Roger, what's your vector Victor, do we have clearance Clarance?



    Let's hope it would know how to intrepret that.
Sign In or Register to comment.