Mac mainstay Audio Hijack adds automation transcription

Posted:
in Mac Software

The go-to app for recording absolutely anything on a Mac, Audio Hijack, has been updated to add Whisper-based AI transcription for any recordings.

Audio Hijack from Rogue Amoeba is a Mac favorite
Audio Hijack from Rogue Amoeba is a Mac favorite



It's the pro audio app used by any Mac user who needs to record any audio. And it's been like that since Steve Jobs decided to save it against the wishes of the recording industry.

Now, with Audio Hijack 4.3, the app can simultaneously create a transcription of audio and record it using what's called a Transcribe block. The app works as a series of building blocks representing what you want to record and what you want to do with that audio.

So, to record your microphone, for instance, you can drag an Input Device block onto the Audio Hijack canvas, then say what connected device you want recorded.

Then you might have just an Output Device, and through that, you tell Audio Hijack to route the recording to your headphones instead of your speaker.

What makes Audio Hijack so useful is that you can string together any number of blocks. So have one that records sound from Safari, then saves it as an MP3 file while playing out through the Mac's speakers -- and now while also transcribing.

To set up the transcription for the first time, users have to drag the Transcribe block out onto the canvas and then click on it. When they do, the app offers options to do with the quality of transcription and where to save transcribed text files.

Audio Hijack has added a Transcribe block that can be just dragged into any recording setup
Audio Hijack has added a Transcribe block that can be just dragged into any recording setup



There are two quality options: High Accuracy and Low Resources. The latter is quicker and takes up less disk space with its AI-based Large Language Module (LLM), while the former is the one to use unless time and disk space are short.

It can then take a few minutes for the LLM to download, but it's always ready after that.

From then, users can add the Transcribe block to any canvas or any selection of audio recording settings. Once the Run button is pressed to record the audio, the transcription just happens automatically.

There's no extra cost for the transcription, nor any limitation on the recording length or the number of recordings that can be transcribed.

Rogue Amoeba's Audio Hijack 4.3 costs $77 direct from the developer. It's a free update to existing version 4 users, and version 3 owners can upgrade for $35.

Read on AppleInsider

FileMakerFellerNoGodsNoMasters

Comments

  • Reply 1 of 6
    22july201322july2013 Posts: 3,573member
    I just tried it. It's only in "beta" (the article doesn't mention this) so any criticism (from me) has to keep that in mind. Anything I criticise could be fixed soon.

    Its accuracy is poor. About 10% of the words, probably more than that, are wrong. And not just wrong, but the same phrase can be generated twice into the same sentence, separated by other words. Why should a phrase that I speak once be entered into the output twice, in different locations? 

    Apple's transcription, which is built into macOS, is far more accurate, (maybe 99%?) but the problem it has is that it's very difficult to transfer the text coming from Apple's service into an application. I've done it, but it's tough, and suffers from some issues, because it wasn't designed for that.

    Audio Hijack doesn't quite suffer from the same problem as Apple here, but Audio Hijack's Transcribe Block still doesn't have the options it needs to send the data into another program in real time. It should, and it easily could. And I opened a ticket with them to ask for more features in this regard. Their website says they want our feedback, and so that suggests that they may take some of our feedback. Which is something I can't say about Apple.
    edited November 2023
  • Reply 2 of 6
    How do I use apple's transcription? I didn't know that existed. 
  • Reply 3 of 6
    22july201322july2013 Posts: 3,573member
    nadkavs said:
    How do I use apple's transcription? I didn't know that existed. 
    It's called Voice Control and can be enabled in the System Settings under Accessibility.

    It can be used either to control things on the macOS screen, which is interesting, or to enter words into an application, such as dictating text into Apple Pages. 

    With some work, I've been able to route the text into an application of my own, and let my application "listen" to the text words and take appropriate action.
  • Reply 4 of 6
    This is interesting as I am looking for a better way to transcribe my voice interviews into a Word document. I use Audio Hijack for various podcasting things but have not updated to 4.0. I will do so and start testing.
    williamlondon
  • Reply 5 of 6
    MarvinMarvin Posts: 15,326moderator
    nadkavs said:
    How do I use apple's transcription? I didn't know that existed. 
    It's called Voice Control and can be enabled in the System Settings under Accessibility.

    It can be used either to control things on the macOS screen, which is interesting, or to enter words into an application, such as dictating text into Apple Pages. 

    With some work, I've been able to route the text into an application of my own, and let my application "listen" to the text words and take appropriate action.
    There's a replacement for the old SoundFlower audio router here:

    https://github.com/ExistentialAudio/BlackHole/
    https://existential.audio/blackhole/

    There's an installer on the page and after install, it shows up as an input/output source in sound settings. The Audio Hijack devs have an app for this too:

    https://rogueamoeba.com/loopback/
    https://www.jeffgeerling.com/blog/2022/how-transcribe-audio-text-using-dictation-on-mac

    The page for BlackHole says to setup a multi-output device to be able to listen and loop the audio back.

    With the audio looping back by selecting BlackHole in the dropdown for dictation in keyboard settings and input/output in sound, with something like TextEdit open, press the start dictation button on the keyboard and have a Youtube video playing in the background. Pressing stop dictation will transcribe the audio.

    The system audio has to be kept from muting during dictation e.g by enabling voice control:

    https://apple.stackexchange.com/questions/110839/how-to-keep-sound-from-muting-while-using-dictation

    I guess Apple doesn't want to have audio loopback built-in to avoid people recording copyrighted audio but a built-in transcription service by setting system audio as dictation source would be useful and helps accessibility for deaf people.
  • Reply 6 of 6
    The transcription feature works pretty well for me!

    Tips & Ideas:

    • Download the entire whisper offline model
    • Make sure you are processing the correct input
      • Use monitors in Loopback and relying on Soundsource just for mixing
      • Setup pre-processing effects in Soundsource but keep post-processing effects" in audio hijack
    • Use filter chains to remove noise and other sounds, eq your voice and stream that directly to the transcription block
      • then either use loopback passthroughs; or copy over effects settings with saved presets
      • Pay attention to ordering; I use a chain of lo/hi-pass, de-noise, de-hum,  eq, compression etc.
    • Route the "full" normal recording to a normal output for general use, and the super crisp one to transcribe in parallel

    So then, test it out a bit and make sure you're files are going where you expect, organized, the correct sound formats and they play back without feedback or issues. Make sure you're not skipping effect in your effects chain. 

    NOTE: Careful with feedback, low volume at first, lol, trust me.

    Another idea is to practice annunciating and talking a bit slower to see how well this works. Very useful for transcribing handwritten notes once you get the hang of it. Be sure to give it a sec to process too, and try to limit other open apps using memory or getting in your audio card/gpu. This helps a lot actually.  Super pro tip: You can set up a `tail -f` on the log file and this will usually work for me to watch in real time. It may lag a bit behind but this is useful for setup at least.
    tail -f ~/Documents/20240407\ 0241\ Transcription.txt
    edited April 24
Sign In or Register to comment.