How Gemini's and OpenAI's updates play into Apple's AI strategy

in General Discussion

Google and OpenAI have announced significant updates for their AI models and features, creating more competition for Apple ahead of WWDC.

Abstract glowing Siri voice assistant graphic with overlapping colorful circles on a dark background.
Apple will has a lot of catching up to do if it wants to compete with Google and OpenAI

On Monday, OpenAI announced its innovative GPT-4o AI model and an all-new Mac app, while Google previewed major improvements to its Gemini software the on Tuesday. The two companies showcased a variety of remarkable features, making the market even more competitive as a result.

While Apple has seemingly fallen far behind in its AI endeavors, a partnership with Google or OpenAI could prove to be an easy way of offering generative AI features to its user base. At least rumors suggest that's a path Apple is willing to take.

OpenAI updates

OpenAI recently introduced a GPT-4o, a new multi-modal version of the company's GPT AI model which contains enhanced capabilities in processing different input types.

Unlike its predecessors, GPT-4o will be able to utilize one neural network to process audio, images and text, offering significant improvements models as a result. Increases in speed and language processing were also touted during the product announcement.

OpenAI's GPT-4o will be able to understand and convey emotions. During the company's recent event, team members demonstrated this by asking the model to analyze facial expressions and determine the specific emotions a user was expressing.

A computer screen displaying a bar graph comparing major building projects by Roman emperors, with a cursor pointing at a bar labeled 'Constantine'.
OpenAI's ChatGPT is now officially available on macOS

With the improved Voice Mode feature, which provides audio output in the form of speech, GPT-4o can adjust the tone of its voice, making it more robotic or more natural depending on the user's request.

The company has also launched a new desktop application for ChatGPT, which is available on macOS, and has introduced a new API for developers. GPT-4o will be available to users through a gradual rollout process,

Google's Gemini updates

Google, at its I/O developer conference on Tuesday, revealed a multitude of enhancements to its Gemini model. The new-and-improved Google Gemini will be able to understand more complex user input, images while taking into account the context behind them.

The Google Gemini logo with lines being threaded below a star shape
Google Gemini is a generative AI tool

The AI software will feature new context-aware capabilities, meaning that it can see everything on screen, whether it's a PDF, a video, or a series of text messages. Gemini will be able to gather information and generate output, but only on select Android devices.

With its new Circle to Search option, for instance, users will be able to select individual objects within an image and instantly receive Google Search results about said object.

Another feature available exclusively on Android will provide users with the option to analyze YouTube videos and PDFs via Gemini Advanced. With the paid service, users will be able to ask specific questions, and will receive answers taken from the content of said video or PDF.

Google's updated Gemini will be able to summarize lengthy conversations and isolate key information from documents, images and videos, all of which should be greatly beneficial to its end-users. Apple is pursuing similar features via its own products.

What we know about Apple's AI strategy so far

Apple is noticeably behind the competition when it comes to its AI offerings, but that could all change very soon with the announcement of iOS 18 in early June.

For well over a year, Apple has been working on its in-house large language model (LLM) known as Ajax. With its generative AI software, the company aims to offer new features similar to those announced by Google and OpenAI in early May.

As part of its recent AI push, Apple is expected to introduce several AI-powered features across its new operating systems. Document and webpage analysis, text summarization, image captioning, and response generation are all in the works.

The company seeks to embed generative AI technology into its existing assortment of core system applications. As a result, apps like Notes, Safari, Messages, Mail, Siri and Spotlight Search are all expected to receive AI-enabled enhancements in one way or another.

Colorful Siri icon, Safari and Messages icons, and Spotlight search bar on a dark background.
Apple's Ajax LLM will improve Safari, Spotlight and Messages

In terms of actual functionality, however, there are limits to what Apple has been able to achieve. The on-device AI model in testing is only capable of rudimentary text analysis and basic on-device response generation.

More advanced features will seemingly necessitate cloud-based processing, which is why Apple is reportedly looking to establish a licensing arrangement with OpenAI. This would allow Apple to offer a variety of AI-related enhancements which its own on-device models cannot facilitate.

A separate rumor claims that Apple wants to create an "AI App Store" through which users could purchase AI-themed applications and products from other companies. This would, in theory, give users the option to use paid versions of products, such as Gemini Advanced.

We will gain a better understanding Apple's AI endeavors soon enough, as the company is expected to debut its new generative AI features at its annual Worldwide Developers' Conference on June 10.

Read on AppleInsider


  • Reply 1 of 2
    lolliverlolliver Posts: 496member
    I watched the full announcement from Open AI along with some of the demo videos they posted to YouTube. Very impressive stuff. 

    I’ve only watched a few clips from the Google announcement though. Did they clarify if their demos were being shown in real time or was it another edited/exaggerated demo like their last one?
    edited May 15
  • Reply 2 of 2
    avon b7avon b7 Posts: 7,833member
    Even 'simple' mundane tasks that require huge backend effort like photo searching will come to the masses relatively soon.

    Something like 'find pictures of me wearing a red cap and holding my dog on a sunny day'. Where even my dog can be identified from other dogs of the same breed.

    For me the bigger AI question (parking to one side the ethical questions for a moment) is monetization.

    We used to pay for byte data/speed (downloads/uploads/bandwidth). Soon we'll be paying for the 'intelligent' analysis of that data/input and generation. There is a big revenue stream there. At least for a while. 
Sign In or Register to comment.