Apple set to deliver AI assistant for transcribing, summarizing meetings and lectures

Posted:
in iOS edited May 14

Apple later this year hopes to make real-time audio transcription and summarization available system-wide on many of its devices, as the iPhone maker looks to harness the power of AI in delivering efficiency boosts to several of its core applications, AppleInsider has learned.

iPhone showing a voice recording app interface with playback controls, flanked by a Voice Memos Transcription logo.
Voice Memo Transcription in iOS 18



People familiar with the matter have told us that Apple has been working on AI-powered summarization and greatly enhanced audio transcription for several of its next-gen operating systems. The new features are expected to enable significant improvements in efficiency for users of its staple Notes, Voice Memos, and other apps.

Apple is currently testing the capabilities as feature additions to several app updates scheduled to arrive with the release of iOS 18 later in 2024. They're also expected to make their way to the corresponding apps in macOS 15 and iPadOS 18 as well.

The default Voice Memos application that Apple includes across its device portfolio will be among the first to receive upgraded capabilities. Early versions of the app provide a running transcript of each audio recording, operating similarly to the company's recent Live Voicemail feature.

The transcriptions occupy the central area of the application window, replacing the larger graphical representation of recorded audio found in the existing version of the app.

Transcription is also being pulled into the next version of Notes. Pre-release versions of both apps feature a dedicated transcription button in the form of a speech bubble, according to those familiar with the software. Tapping the new speech bubble will display a transcription of audio recorded within the app.

Screenshot of a voice recording application with an audio waveform, playback controls, and the text 'This is a test recording.'
Rendition of the Voice Memos transcription in Notes in iOS 18



The transcription tool will go hand-in-hand with -- and provide new context to -- upcoming audio recording features in Notes, which were first detailed by AppleInsider in April. Specifically, the update will add an option for AI-generated summarization of recorded audio that instantly provides a basic text summary of the key focal points and action items.

The AI summarization feature, coupled together with the new in-app audio recording and real-time transcription options, is expected to make Apple's built-in Notes app a true powerhouse. The trio of features will serve to benefit a wide array of practical applications, taking on the heavy lifting of processing large amounts of data down to key focal points. This all translates to convenience and clarity at a glance for users.

Students would be able to easily record lectures and classes without relying on third-party tools. If recording from the new Notes app, there's an option to include a transcription and summary within a note, alongside other media such as images, links, and data structures like tables.

The features will also pay dividends for professionals who regularly attend conference calls, virtual business meetings, or seminars as part of their line of work. Such events often divulge large amounts of information, various statistics, detailed business plans, dates, and schedules that Apple's AI technology will analyze and reorganize into properly structured summary briefs.

The same applies to classes or lectures at more advanced levels that frequently include an assortment of information, such as definitions, explanations of complex ideas or theoretical principles, illustrative examples, and much more.

Meanwhile, journalists would gain an extremely efficient way of transcribing and summarizing lengthy interviews. Creatives such as authors and screenwriters could easily record key ideas and look through them later, without having to playback and listen to the majority of the recordings simply to isolate key data points.

Although Apple has gone to great lengths to ensure that its transcription and summarization features generate accurate results, mistakes are inevitable. Thus, maintaining the original audio alongside the transcript and AI-generated summary assures that none of the source information is lost in the transcription or summarization process.

Summarization is only part of a larger Apple AI effort



The new transcription and summarization features will be part of Apple's broader AI push by Apple this year. Similar summarization features are also expected to make their way to Safari 18 via Intelligent Browsing, and to the built-in Messages app -- through integration with Apple's on-device AI software.

The use cases and overall purpose of AI-powered summarization features in Safari and Messages are completely different. Whereas Notes will give users the option to summarize meetings, conference calls, and lectures, Safari will allow for webpage summarization, while Messages will offer a condensed version of message contents.

Apple's AI software could also serve to protect its users' privacy, as certain AI features are expected to function entirely on-device. In the case of audio transcription and advanced AI summarization, however, server-side processing may be required for the time being.

By incorporating summarization and audio transcription into its system applications, the company looks to demonstrate some of the best use-case advantages of deploying AI to tackle real-world scenarios. The goal of Apple's AI endeavors is to provide developer features that promise to empower its customers to be more efficient and successful in their daily tasks.

At the same time, the company is hoping to better position itself against the proliferation of competitive third-party applications now utilizing AI technology, several of which have seen healthy adoption rates as consumers weave them into their digital lives.

The Otter application, for example, is another recipient of Apple's Editors' Choice Award. It offers similar functionality to the features discussed in this article. With it, users can record, transcribe, and summarize meetings via generative AI, all in one app.

Microsoft's OneNote also offers support for audio recording in the form of voice notes, serving as another potential rival for Apple's Notes and Voice Memos applications.

It's worth emphasizing, however, that not all software features that Apple tests in pre-release builds of software make it into the existing release cycle. Apple has been known to cancel projects or delay features to subsequent operating system releases and apps at the last minute, so there are ultimately no guarantees on timing and availability.

That said, the new AI summarization and real-time transcription features still appear to be on track for an expected unveiling alongside Apple's next-generation operating systems at the company's Worldwide Developers' Conference (WWDC) in June. They're expected to be joined by improved Calendar and Calculator apps, among others.



Read on AppleInsider

«1

Comments

  • Reply 1 of 22
    jas99jas99 Posts: 158member
    THIS is the sort of AI enhancement that is actually meaningful. 
    I have no use for hallucinating LLMs handling my e-mail or writing error-laden book reports. 
    I have no use for gimmicks. 
    Thank you, Apple for making something that improves my life.
    apple4thewinwilliamlondonAlex1Nnubusradarthekat
  • Reply 2 of 22
    JamesCudeJamesCude Posts: 52member
    Sure but we’ve had many of these tools already for years- economical but not exactly innovative.
    williamlondonmichelb76dope_ahmine
  • Reply 3 of 22
    So I assume transcript is on device and summarization is server-side? Anyways i did a voice to text note when I got trained to work closings in my job and it was great during the first couple of times and I didn’t have to double check with a coworker and bother them. So a summarized transcript would have been even better during that time.
    Alex1N
  • Reply 4 of 22
    lam92103lam92103 Posts: 135member
    MacOS has had a decent summarize feature since quite some time. Microsoft Ofiice has had background removal from photos also since quite some time. All these AI features, while useful, are not really anything new
    williamlondon
  • Reply 5 of 22
    gatorguygatorguy Posts: 24,328member
    On the surface this would seem to support the rumor of Apple using Google's on-device Gemini GenAI. The Pixel 8's were the first smartphones to do this on-device, and it required an update that installed Gemini Nano to accomplish it. 
    Alex1N
  • Reply 6 of 22
    omasouomasou Posts: 595member
    Now that is a useful and practical implementation of AI.

    But what will all the administrative assistance do with their new found time. Oh, right, post on social media about how AI is taking their jobs, a job they probably never liked. /s
    edited May 10 jas99williamlondon
  • Reply 7 of 22
    omasouomasou Posts: 595member

    JamesCude said:
    Sure but we’ve had many of these tools already for years- economical but not exactly innovative.
    Yes, but like OCR they have had limited utility b/c the success rate is not 100%. Will the AI version be 100%, probably not but it will improve and "learn" over time.
    edited May 10 jas99williamlondonAlex1N
  • Reply 8 of 22
    CheeseFreezeCheeseFreeze Posts: 1,276member
    jas99 said:
    THIS is the sort of AI enhancement that is actually meaningful. 
    I have no use for hallucinating LLMs handling my e-mail or writing error-laden book reports. 
    I have no use for gimmicks. 
    Thank you, Apple for making something that improves my life.
    What are you talking about? I use ChatGPT 4.5 on a daily basis and it is amazing for many tasks, and not 'error-laden' or 'hallucinating'. 
    Don't pretend Apple is solving an issue here. They are playing catch-up. No doubt they'll nail the execution.
    muthuk_vanalingamavon b7williamlondon
  • Reply 9 of 22
    jas99jas99 Posts: 158member
    jas99 said:
    THIS is the sort of AI enhancement that is actually meaningful. 
    I have no use for hallucinating LLMs handling my e-mail or writing error-laden book reports. 
    I have no use for gimmicks. 
    Thank you, Apple for making something that improves my life.
    What are you talking about? I use ChatGPT 4.5 on a daily basis and it is amazing for many tasks, and not 'error-laden' or 'hallucinating'. 
    Don't pretend Apple is solving an issue here. They are playing catch-up. No doubt they'll nail the execution.
    I suppose if your output consists of regurgitating what an LLM scrapes from the preexisting internet rather than
    making original contributions to human knowledge, an LLM can do your job perfectly well. 
    eightzerowilliamlondonAlex1Nradarthekatbluefire1foregoneconclusion
  • Reply 10 of 22
    This is the biggest announcement of a truly useful feature that Apple has made in recent memory. Being able to transcribe meetings and record them is incredibly useful in the real world. Summaries maybe not so. Most ChatGPT output is garbage for technical discussions. Just a transcription is huge capability.
    jas99Alex1N
  • Reply 11 of 22
    eightzeroeightzero Posts: 3,089member
    Cool. Another reason to hate meetings and actually talking to people. 
    Alex1N
  • Reply 12 of 22
    michelb76michelb76 Posts: 644member
    jas99 said:
    THIS is the sort of AI enhancement that is actually meaningful. 
    I have no use for hallucinating LLMs handling my e-mail or writing error-laden book reports. 
    I have no use for gimmicks. 
    Thank you, Apple for making something that improves my life.
    What are you talking about? I use ChatGPT 4.5 on a daily basis and it is amazing for many tasks, and not 'error-laden' or 'hallucinating'. 
    Don't pretend Apple is solving an issue here. They are playing catch-up. No doubt they'll nail the execution.
    So do I and I would take nothing it produces at face value. It's often factually wrong and makes a lot of stuff up, even on your own data. This is well-known, and a 'feature' of any current model on the market. Denying that just makes you look silly.
    williamlondonjas99
  • Reply 13 of 22
    mpantonempantone Posts: 2,071member
    jas99 said:
    THIS is the sort of AI enhancement that is actually meaningful. 
    I have no use for hallucinating LLMs handling my e-mail or writing error-laden book reports. 
    I have no use for gimmicks. 
    Thank you, Apple for making something that improves my life.
    There are companies that already provide this service like Otter.ai.

    If Apple does bring this capability for free as a part of the next iOS, that puts a lot of pressure for other companies to provide differentiated value-add.

    There are likely other AI uses that will move into iOS/macOS that are currently being handled by third party providers. AI deniers will have weaker arguments that "AI is a big waste of time..." over the coming year and beyond.

    Remember that the current GPT LLM is not AI fully matured. AI is still very much in its infancy and is growing rapidly with every passing month. If you watch any YouTube video on anything AI that was uploaded over six months ago, it is likely out of date already.
    edited May 11
  • Reply 14 of 22
    22july201322july2013 Posts: 3,607member
    Since Photos has facial recognition, I would guess that it could be upgraded to identify voices from videos stored in the Photos app. And once it has recognized voices, it could automatically annotate transcriptions of meetings by using the name of the person speaking in the annotation. Not a stretch.
    radarthekat
  • Reply 15 of 22
    mpantonempantone Posts: 2,071member
    Since Photos has facial recognition, I would guess that it could be upgraded to identify voices from videos stored in the Photos app. And once it has recognized voices, it could automatically annotate transcriptions of meetings by using the name of the person speaking in the annotation. Not a stretch.
    Hell, it can already associate faces with voices through FaceTime calls. It just needs to assign a faceprint and voiceprint to a name (whether it be CallerID or an address book entry).

    A lot of this data has been here for years. It's just taken something like AI/ML technologies to do something with that data.

    US CBP already uses facial recognition as a preliminary identification. At some point they will likely integrate voice identification to a person's profile to improve identification accuracy. GlobalEntry kiosks no longer have fingerprint scanners. My guess is their current technology can estimate weight within one 1 kg and height within 0.5 cm maybe even narrower.
    edited May 11
  • Reply 16 of 22
    CheeseFreezeCheeseFreeze Posts: 1,276member
    I hope it becomes part of Final Cut Pro. An auto-caption feature is very much welcome.
  • Reply 17 of 22
    CheeseFreezeCheeseFreeze Posts: 1,276member
    michelb76 said:
    jas99 said:
    THIS is the sort of AI enhancement that is actually meaningful. 
    I have no use for hallucinating LLMs handling my e-mail or writing error-laden book reports. 
    I have no use for gimmicks. 
    Thank you, Apple for making something that improves my life.
    What are you talking about? I use ChatGPT 4.5 on a daily basis and it is amazing for many tasks, and not 'error-laden' or 'hallucinating'. 
    Don't pretend Apple is solving an issue here. They are playing catch-up. No doubt they'll nail the execution.
    So do I and I would take nothing it produces at face value. It's often factually wrong and makes a lot of stuff up, even on your own data. This is well-known, and a 'feature' of any current model on the market. Denying that just makes you look silly.
    I use it for presentations, coding, marketing and more and it is rarely factually incorrect. Where it goes off the rails is usually over time (e.g in coding), but as long as you direct it properly, these issues are contained. Same goes for the larger models on Huggingface. 

    In fact my job is to bring generative AI to the company’s product suite.

    So yes I’m “denying” as you put it, and I leave it up to others whether that is “silly”.
    gatorguy
  • Reply 18 of 22
    AI is Apples new stock market growth, everything mentioned by leaks or anything related to AI to keep those perspective investors open their stupid dumb no idea of technology mind s happy. We all know AI is just bollocks tag given to anything related to an app or a platform or anything people buy into it. AI is just another tool.
  • Reply 19 of 22
    I’ve enjoyed this voice transcription capability for years with the app Just Press Record. The real grabber is the ability to record on one device (typically your watch) and have the recording and transcription appear promptly on all the cloud-connected devices where you might want to view, copy, move, edit and/or delete them.
  • Reply 20 of 22
    mikethemartianmikethemartian Posts: 1,389member
    I want an AI that can pretend it is me during MS Team meetings at work so I can spend that time actually working on the problem at hand than talking about it.
    williamlondon
Sign In or Register to comment.