Apple AI research: ReALM is smaller, faster than GPT-4 when parsing contextual data

Posted:
in General Discussion edited April 1
Apple AI research reveals a model that will make giving commands to Siri faster and more efficient by converting any given context into text, which is easier to parse by a Large Language Model.

An image combining the colorful Siri sphere with the ChatGPT logo
Apple is working to bring AI to Siri

Artificial Intelligence research at Apple keeps
being published as the company approaches a public launch of its AI initiatives in June during WWDC. There has been a variety of research published so far, including an image animation tool.

The latest paper was first shared by VentureBeat. The paper details something called ReALM -- Reference Resolution As Language Modeling.

Having a computer program perform a task based on vague language inputs, like how a user might say "this" or "that," is called reference resolution. It's a complex issue to solve since computers can't interpret images the way humans can, but Apple may have found a streamlined resolution using LLMs.

When speaking to smart assistants like Siri, users might reference any number of contextual information to interact with, such as background tasks, on-display data, and other non-conversational entities. Traditional parsing methods rely on incredibly large models and reference materials like images, but Apple has streamlined the approach by converting everything to text.

Apple found that its smallest ReALM models performed similarly to GPT-4 with much fewer parameters, thus better suited for on-device use. Increasing the parameters used in ReALM made it substantially outperform GPT-4.

One reason for this performance boost is GPT-4's reliance on image parsing to understand on-screen information. Much of the image training data is built on natural imagery, not artificial code-based web pages filled with text, so direct OCR is less efficient.

Two images listing information as seen by screen parsers, like addresses and phone numbers
Representations of screen capture data as text. Source: Apple research



Converting an image into text allows ReALM to skip needing these advanced image recognition parameters, thus making it smaller and more efficient. Apple also avoids issues with hallucination by including the ability to constrain decoding or use simple post-processing.

For example, if you're scrolling a website and decide you'd like to call the business, simply saying "call the business" requires Siri to parse what you mean given the context. It would be able to "see" that there's a phone number on the page that is labeled as the business number and call it without further user prompt.

Apple is working to release a comprehensive AI strategy during WWDC 2024. Some rumors suggest the company will rely on smaller on-device models that preserve privacy and security, while licensing other company's LLMs for the more controversial off-device processing filled with ethical conundrums.



Read on AppleInsider

ForumPost

Comments

  • Reply 1 of 7
    coolfactorcoolfactor Posts: 2,321member
    Makes sense since text is already being extracted from images displayed on-screen. 
    ForumPostwatto_cobrassfe11
  • Reply 2 of 7
    mattinozmattinoz Posts: 2,445member
    Makes sense since text is already being extracted from images displayed on-screen. 
    Also, Images are increasingly tagged with ALT text as search engines favour sites that put work into being accessible.  Apple has already put a lot of effort into Accessibility features to allow screen readers to do the same with Apps on their devices, giving them a wealth of text data at any time on the device to use for this.  I guess this means they can "normalise" or even anonymise a query using on-device smarts and then feed it off to other AI systems on-line. 


    watto_cobrassfe11Alex1N
  • Reply 3 of 7
    Meanwhile, GPT5 is COOKing Apple. 

    Honestly.. When does Apple launch it then? After several years? 
    After several years, their model will be outdated. 

    I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.
  • Reply 4 of 7
    danoxdanox Posts: 3,263member
    Meanwhile, GPT5 is COOKing Apple. 

    Honestly.. When does Apple launch it then? After several years? 
    After several years, their model will be outdated. 

    I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.
    By the end of 2024 if Microsoft, Google don't produce something actually useful to the public in AI the financial world and the public will move to some other distraction my bet is: there will be nothing forthcoming. 
  • Reply 5 of 7
    gatorguygatorguy Posts: 24,582member
    danox said:
    Meanwhile, GPT5 is COOKing Apple. 

    Honestly.. When does Apple launch it then? After several years? 
    After several years, their model will be outdated. 

    I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.
    By the end of 2024 if Microsoft, Google don't produce something actually useful to the public in AI the financial world and the public will move to some other distraction my bet is: there will be nothing forthcoming. 
    So Apple is wasting their time and money chasing Generative AI; it's just a flash in the pan? 
    edited April 2
  • Reply 6 of 7
    danox said:
    Meanwhile, GPT5 is COOKing Apple. 

    Honestly.. When does Apple launch it then? After several years? 
    After several years, their model will be outdated. 

    I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.
    By the end of 2024 if Microsoft, Google don't produce something actually useful to the public in AI the financial world and the public will move to some other distraction my bet is: there will be nothing forthcoming. 
    Microsoft has already implemented their AI function for PPT, Excel, Teams. 
    Google has already licensed Gemini for Samsung S24. S24 sells like a hot cake worldwide. 
  • Reply 7 of 7
    danoxdanox Posts: 3,263member
    danox said:
    Meanwhile, GPT5 is COOKing Apple. 

    Honestly.. When does Apple launch it then? After several years? 
    After several years, their model will be outdated. 

    I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.
    By the end of 2024 if Microsoft, Google don't produce something actually useful to the public in AI the financial world and the public will move to some other distraction my bet is: there will be nothing forthcoming. 
    Microsoft has already implemented their AI function for PPT, Excel, Teams. 
    Google has already licensed Gemini for Samsung S24. S24 sells like a hot cake worldwide. 
    Samsung S24 is currently on life support in the USA, and is dead in Japan, China, and aside from a small tiny market in the EU has no where else to go in the rest of the world. 
    edited April 14
Sign In or Register to comment.