Apple's new AI model could help Siri see how iOS apps work

Posted:
in iOS

Apple's Ferret LLM could help allow Siri to understand the layout of apps in an iPhone display, potentially increasing the capabilities of Apple's digital assistant.

A ferret in the wild [Pixabay/Michael Sehlmeyer]
A ferret in the wild [Pixabay/Michael Sehlmeyer]



Apple has been working on numerous machine learning and AI projects that it could tease at WWDC 2024. In a just-released paper, it now seems that some of that work has the potential for Siri to understand what apps and iOS itself looks like.

The paper, released by Cornell University on Monday, is titled "Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs." It essentially explains a new multimodal large language model (MLLM) that has the potential to understand the user interfaces of mobile displays.

The Ferret name originally came up from an open-source multi-modal LLMreleased in October, by researchers from Cornell University working with counterparts from Apple. At the time, Ferret was able to detect and understand different regions of an image for complex queries, such as identifying a species of animal in a selected part of a photograph.

An LLM advancement



The new paper for Ferret-UI explains that, while there have been noteworthy advancements in MLLM usage, they still "fall short in their ability to comprehend and interact effectively with user interface (UI) screens." Ferret-UI is described as a new MLLM tailored for understanding mobile UI screens, complete with "referring, grounding, and reasoning capabilities."

Part of the problem that LLMs have in understanding the interface of a mobile display is how it gets used in the first place. Often in a portrait orientation, it often means icons and other details can take up a very compact part of the display, making it difficult for machines to understand.

To help with this, Ferret has a magnification system to upscale images to "any resolution" to make icons and text more readable.

An example of Ferret-UI analyzing an iPhone's display
An example of Ferret-UI analyzing an iPhone's display



For processing and training, Ferret also divides the screen into two smaller sections, cutting the screen in half. The paper states that other LLMs tend to scan a lower-resolution global image, which reduces the ability to adequately determine what icons look like.

Adding in significant curation of data for training, it's resulted in a model that can sufficiently understand user queries, understand the nature of various on-screen elements, and to offer contextual responses.

For example, a user could ask how to open the Reminders app, and be told to tap the on-screen Open button. A further query asking if a 15-year-old could use an app could check out age guidelines, if they're visible on the display.

An assistive assistant



While we don't know whether it will be incorporated into systems like Siri, Ferret-UI offers the possibility of advanced control over a device like an iPhone. By understanding user interface elements, it offers the possibility of Siri performing actions for users in apps, by selecting graphical elements within the app on its own.

There are also useful applications for the visually impaired. Such an LLM could be more capable of explaining what is on screen in detail, and potentially carry out actions for the user without them needing to do anything else but ask for it to happen.



Read on AppleInsider

Comments

  • Reply 1 of 9
    cpsrocpsro Posts: 3,226member
    This technology could be a great boon to the app review process and, in turn, our security and satisfaction with 3rd party apps.
    mattinozwatto_cobra
  • Reply 2 of 9
    AppleZuluAppleZulu Posts: 2,140member
    This starts to get at where I think Apple's machine learning/artificial intelligence and Siri are going. 

    Despite it being front-and-center in the public consciousness for the past year, AI as currently implemented is a hot mess of questionable utility, privacy and security, and based on petabytes of stolen data and intellectual property. While everyone touts how cool their AI is and the peanut gallery throws shade at Apple for being late to the party, moving in "late" to supplant technological hot messes with something well thought out and useful is actually Apple's sweet spot. The Ferret LLM described above could result in users being able to make voice commands that require complex interactions with apps on their device to yield a desired result. If Siri is able to interface with and read from on-device apps, this does two important things. First, it eliminates a requirement for special code within the apps to allow for things like the current Shortcuts app to drive certain tasks... if the user can figure out how to make Shortcuts do it properly. Second, it allows the digital assistant to carry out functions and draw information from sources for which the user already has legitimate permission to access. 

    Legally and functionally it would be very much like handing a human PA your iPhone and then asking him or her to use it to carry out various tasks on your behalf. 

    Imagine waking in the morning and asking Siri to implement various smart home functions based on current conditions like the weather and what you've got on your schedule for the day. Then imagine asking Siri what the morning news is, and it pulls information from your Apple News subscription, along with other news sources to which you have subscribed or otherwise have access, and verbally gives you a news summary, citing each source. Then you ask Siri to bookmark a few of the source articles so you can read them during breakfast. You could ask Siri to order lunch, or flowers, or an Uber, and it simply interfaces with on-device apps and accounts on your behalf. Apple could implement this sort of thing without running roughshod over copyrights, and without selling out the user's privacy and security. 

    This could be how Apple once again enters a space seemingly late, but then implements its vision of that thing so well that others must regroup and scramble to catch up. 
    edited April 9 Alex1Nwilliamlondonwatto_cobradavidlewis54
  • Reply 3 of 9
    avon b7avon b7 Posts: 7,972member
    AppleZulu said:
    This starts to get at where I think Apple's machine learning/artificial intelligence and Siri are going. 

    Despite it being front-and-center in the public consciousness for the past year, AI as currently implemented is a hot mess of questionable utility, privacy and security, and based on petabytes of stolen data and intellectual property. While everyone touts how cool their AI is and the peanut gallery throws shade at Apple for being late to the party, moving in "late" to supplant technological hot messes with something well thought out and useful is actually Apple's sweet spot. The Ferret LLM described above could result in users being able to make voice commands that require complex interactions with apps on their device to yield a desired result. If Siri is able to interface with and read from on-device apps, this does two important things. First, it eliminates a requirement for special code within the apps to allow for things like the current Shortcuts app to drive certain tasks... if the user can figure out how to make Shortcuts do it properly. Second, it allows the digital assistant to carry out functions and draw information from sources for which the user already has legitimate permission to access. 

    Legally and functionally it would be very much like handing a human PA your iPhone and then asking him or her to use it to carry out various tasks on your behalf. 

    Imagine waking in the morning and asking Siri to implement various smart home functions based on current conditions like the weather and what you've got on your schedule for the day. Then imagine asking Siri what the morning news is, and it pulls information from your Apple News subscription, along with other news sources to which you have subscribed or otherwise have access, and verbally gives you a news summary, citing each source. Then you ask Siri to bookmark a few of the source articles so you can read them during breakfast. You could ask Siri to order lunch, or flowers, or an Uber, and it simply interfaces with on-device apps and accounts on your behalf. Apple could implement this sort of thing without running roughshod over copyrights, and without selling out the user's privacy and security. 

    This could be how Apple once again enters a space seemingly late, but then implements its vision of that thing so well that others must regroup and scramble to catch up. 
    That is very far from the truth. 

    A quote from MWC2024:

    "In addition to quantity though, we must also look at quality. The more accurate, reliable, relevant, and valuable our data, the more reliable our model input. This improves the availability and reliability of models. This is how data determines the power of AI,”

    A huge amount of work has already gone into LLMs using clean data. Official data. 

    https://readwrite.com/huawei-build-ai-model-for-accurate-weather-forecasting/

    The list of shipping solutions covers just about everything you can think of and not all of it was dredged off the internet. Far from it. That doesn't mean that dredging for data doesn't have its own value. There may be issues surrounding licencing in some cases but that is another story. 

    AI requires a lot of elements to tie it all together and Apple isn't producing many of those elements. 

    https://developingtelecoms.com/telecom-business/vendor-news/16365-huawei-urges-data-importance-for-ai-age.html

    We are also fast approaching the yottabyte era and AI will be just one of the factors driving an ever increasing need for data storage and processing. Almost unimaginable amounts. 

    https://www.huawei.com/en/news/2023/5/data-infrastructure-forum


  • Reply 4 of 9
    mattinozmattinoz Posts: 2,448member
    So Apple's decades-long efforts and incremental improvements to accessibility will pay big dividends for AI as it helps the ML learn how the app works. 

    It makes you wonder if that might have been a reason for pushing hard in that space all along. You know, not just to be nice or sell lots of hardware. 
    watto_cobra
  • Reply 5 of 9
    AppleZuluAppleZulu Posts: 2,140member
    avon b7 said:
    AppleZulu said:
    This starts to get at where I think Apple's machine learning/artificial intelligence and Siri are going. 

    Despite it being front-and-center in the public consciousness for the past year, AI as currently implemented is a hot mess of questionable utility, privacy and security, and based on petabytes of stolen data and intellectual property. While everyone touts how cool their AI is and the peanut gallery throws shade at Apple for being late to the party, moving in "late" to supplant technological hot messes with something well thought out and useful is actually Apple's sweet spot. The Ferret LLM described above could result in users being able to make voice commands that require complex interactions with apps on their device to yield a desired result. If Siri is able to interface with and read from on-device apps, this does two important things. First, it eliminates a requirement for special code within the apps to allow for things like the current Shortcuts app to drive certain tasks... if the user can figure out how to make Shortcuts do it properly. Second, it allows the digital assistant to carry out functions and draw information from sources for which the user already has legitimate permission to access. 

    Legally and functionally it would be very much like handing a human PA your iPhone and then asking him or her to use it to carry out various tasks on your behalf. 

    Imagine waking in the morning and asking Siri to implement various smart home functions based on current conditions like the weather and what you've got on your schedule for the day. Then imagine asking Siri what the morning news is, and it pulls information from your Apple News subscription, along with other news sources to which you have subscribed or otherwise have access, and verbally gives you a news summary, citing each source. Then you ask Siri to bookmark a few of the source articles so you can read them during breakfast. You could ask Siri to order lunch, or flowers, or an Uber, and it simply interfaces with on-device apps and accounts on your behalf. Apple could implement this sort of thing without running roughshod over copyrights, and without selling out the user's privacy and security. 

    This could be how Apple once again enters a space seemingly late, but then implements its vision of that thing so well that others must regroup and scramble to catch up. 
    That is very far from the truth. 

    A quote from MWC2024:

    "In addition to quantity though, we must also look at quality. The more accurate, reliable, relevant, and valuable our data, the more reliable our model input. This improves the availability and reliability of models. This is how data determines the power of AI,”

    A huge amount of work has already gone into LLMs using clean data. Official data. 

    https://readwrite.com/huawei-build-ai-model-for-accurate-weather-forecasting/

    The list of shipping solutions covers just about everything you can think of and not all of it was dredged off the internet. Far from it. That doesn't mean that dredging for data doesn't have its own value. There may be issues surrounding licencing in some cases but that is another story. 

    AI requires a lot of elements to tie it all together and Apple isn't producing many of those elements. 

    https://developingtelecoms.com/telecom-business/vendor-news/16365-huawei-urges-data-importance-for-ai-age.html

    We are also fast approaching the yottabyte era and AI will be just one of the factors driving an ever increasing need for data storage and processing. Almost unimaginable amounts. 

    https://www.huawei.com/en/news/2023/5/data-infrastructure-forum


    Sure, because Huawei has an exemplary record in respecting IP. 
    williamlondoncornchipwatto_cobra
  • Reply 6 of 9
    avon b7avon b7 Posts: 7,972member
    AppleZulu said:
    avon b7 said:
    AppleZulu said:
    This starts to get at where I think Apple's machine learning/artificial intelligence and Siri are going. 

    Despite it being front-and-center in the public consciousness for the past year, AI as currently implemented is a hot mess of questionable utility, privacy and security, and based on petabytes of stolen data and intellectual property. While everyone touts how cool their AI is and the peanut gallery throws shade at Apple for being late to the party, moving in "late" to supplant technological hot messes with something well thought out and useful is actually Apple's sweet spot. The Ferret LLM described above could result in users being able to make voice commands that require complex interactions with apps on their device to yield a desired result. If Siri is able to interface with and read from on-device apps, this does two important things. First, it eliminates a requirement for special code within the apps to allow for things like the current Shortcuts app to drive certain tasks... if the user can figure out how to make Shortcuts do it properly. Second, it allows the digital assistant to carry out functions and draw information from sources for which the user already has legitimate permission to access. 

    Legally and functionally it would be very much like handing a human PA your iPhone and then asking him or her to use it to carry out various tasks on your behalf. 

    Imagine waking in the morning and asking Siri to implement various smart home functions based on current conditions like the weather and what you've got on your schedule for the day. Then imagine asking Siri what the morning news is, and it pulls information from your Apple News subscription, along with other news sources to which you have subscribed or otherwise have access, and verbally gives you a news summary, citing each source. Then you ask Siri to bookmark a few of the source articles so you can read them during breakfast. You could ask Siri to order lunch, or flowers, or an Uber, and it simply interfaces with on-device apps and accounts on your behalf. Apple could implement this sort of thing without running roughshod over copyrights, and without selling out the user's privacy and security. 

    This could be how Apple once again enters a space seemingly late, but then implements its vision of that thing so well that others must regroup and scramble to catch up. 
    That is very far from the truth. 

    A quote from MWC2024:

    "In addition to quantity though, we must also look at quality. The more accurate, reliable, relevant, and valuable our data, the more reliable our model input. This improves the availability and reliability of models. This is how data determines the power of AI,”

    A huge amount of work has already gone into LLMs using clean data. Official data. 

    https://readwrite.com/huawei-build-ai-model-for-accurate-weather-forecasting/

    The list of shipping solutions covers just about everything you can think of and not all of it was dredged off the internet. Far from it. That doesn't mean that dredging for data doesn't have its own value. There may be issues surrounding licencing in some cases but that is another story. 

    AI requires a lot of elements to tie it all together and Apple isn't producing many of those elements. 

    https://developingtelecoms.com/telecom-business/vendor-news/16365-huawei-urges-data-importance-for-ai-age.html

    We are also fast approaching the yottabyte era and AI will be just one of the factors driving an ever increasing need for data storage and processing. Almost unimaginable amounts. 

    https://www.huawei.com/en/news/2023/5/data-infrastructure-forum


    Sure, because Huawei has an exemplary record in respecting IP. 
    Better than Apple's? 

    "Disputes over intellectual property are common in international business. According to public records, from 2009 to 2019, Apple was involved in 596 intellectual property lawsuits and Samsung in 519. Huawei was involved in 209. The US Department of Justice has insisted on filing a criminal lawsuit against Huawei over the kind of civil intellectual property disputes that are common across the industry."

    https://www.huawei.com/en/news/2020/2/huawei-statement-on-us-justice-department-indictment

    "Huawei is one of the world's largest patent holders, holding more than 120,000 active patents by the end of 2022." 

    https://www.huawei.com/en/sustainability/the-latest/stories/intellectual-property-and-trade-secret-protection

    With over 200,000 employees across the globe you will inevitably get some problems, and no company will ever be free of accusations and some of them may play out to be well founded, but how many? A minute fraction? 

    Just last year Huawei filed more patents than anyone. Apple included. Obviously there is more to see here than your one liner. 

    But what has that got to do with the point I raised? 
    edited April 10
  • Reply 7 of 9
    AppleZuluAppleZulu Posts: 2,140member
    avon b7 said:
    AppleZulu said:
    avon b7 said:
    AppleZulu said:
    This starts to get at where I think Apple's machine learning/artificial intelligence and Siri are going. 

    Despite it being front-and-center in the public consciousness for the past year, AI as currently implemented is a hot mess of questionable utility, privacy and security, and based on petabytes of stolen data and intellectual property. While everyone touts how cool their AI is and the peanut gallery throws shade at Apple for being late to the party, moving in "late" to supplant technological hot messes with something well thought out and useful is actually Apple's sweet spot. The Ferret LLM described above could result in users being able to make voice commands that require complex interactions with apps on their device to yield a desired result. If Siri is able to interface with and read from on-device apps, this does two important things. First, it eliminates a requirement for special code within the apps to allow for things like the current Shortcuts app to drive certain tasks... if the user can figure out how to make Shortcuts do it properly. Second, it allows the digital assistant to carry out functions and draw information from sources for which the user already has legitimate permission to access. 

    Legally and functionally it would be very much like handing a human PA your iPhone and then asking him or her to use it to carry out various tasks on your behalf. 

    Imagine waking in the morning and asking Siri to implement various smart home functions based on current conditions like the weather and what you've got on your schedule for the day. Then imagine asking Siri what the morning news is, and it pulls information from your Apple News subscription, along with other news sources to which you have subscribed or otherwise have access, and verbally gives you a news summary, citing each source. Then you ask Siri to bookmark a few of the source articles so you can read them during breakfast. You could ask Siri to order lunch, or flowers, or an Uber, and it simply interfaces with on-device apps and accounts on your behalf. Apple could implement this sort of thing without running roughshod over copyrights, and without selling out the user's privacy and security. 

    This could be how Apple once again enters a space seemingly late, but then implements its vision of that thing so well that others must regroup and scramble to catch up. 
    That is very far from the truth. 

    A quote from MWC2024:

    "In addition to quantity though, we must also look at quality. The more accurate, reliable, relevant, and valuable our data, the more reliable our model input. This improves the availability and reliability of models. This is how data determines the power of AI,”

    A huge amount of work has already gone into LLMs using clean data. Official data. 

    https://readwrite.com/huawei-build-ai-model-for-accurate-weather-forecasting/

    The list of shipping solutions covers just about everything you can think of and not all of it was dredged off the internet. Far from it. That doesn't mean that dredging for data doesn't have its own value. There may be issues surrounding licencing in some cases but that is another story. 

    AI requires a lot of elements to tie it all together and Apple isn't producing many of those elements. 

    https://developingtelecoms.com/telecom-business/vendor-news/16365-huawei-urges-data-importance-for-ai-age.html

    We are also fast approaching the yottabyte era and AI will be just one of the factors driving an ever increasing need for data storage and processing. Almost unimaginable amounts. 

    https://www.huawei.com/en/news/2023/5/data-infrastructure-forum


    Sure, because Huawei has an exemplary record in respecting IP. 
    Better than Apple's? 

    "Disputes over intellectual property are common in international business. According to public records, from 2009 to 2019, Apple was involved in 596 intellectual property lawsuits and Samsung in 519. Huawei was involved in 209. The US Department of Justice has insisted on filing a criminal lawsuit against Huawei over the kind of civil intellectual property disputes that are common across the industry."

    https://www.huawei.com/en/news/2020/2/huawei-statement-on-us-justice-department-indictment

    "Huawei is one of the world's largest patent holders, holding more than 120,000 active patents by the end of 2022." 

    https://www.huawei.com/en/sustainability/the-latest/stories/intellectual-property-and-trade-secret-protection

    With over 200,000 employees across the globe you will inevitably get some problems, and no company will ever be free of accusations and some of them may play out to be well founded, but how many? A minute fraction? 

    Just last year Huawei filed more patents than anyone. Apple included. Obviously there is more to see here than your one liner. 

    But what has that got to do with the point I raised? 
    Plenty. 
    williamlondonwatto_cobra
  • Reply 8 of 9
    avon b7avon b7 Posts: 7,972member
    AppleZulu said:
    avon b7 said:
    AppleZulu said:
    avon b7 said:
    AppleZulu said:
    This starts to get at where I think Apple's machine learning/artificial intelligence and Siri are going. 

    Despite it being front-and-center in the public consciousness for the past year, AI as currently implemented is a hot mess of questionable utility, privacy and security, and based on petabytes of stolen data and intellectual property. While everyone touts how cool their AI is and the peanut gallery throws shade at Apple for being late to the party, moving in "late" to supplant technological hot messes with something well thought out and useful is actually Apple's sweet spot. The Ferret LLM described above could result in users being able to make voice commands that require complex interactions with apps on their device to yield a desired result. If Siri is able to interface with and read from on-device apps, this does two important things. First, it eliminates a requirement for special code within the apps to allow for things like the current Shortcuts app to drive certain tasks... if the user can figure out how to make Shortcuts do it properly. Second, it allows the digital assistant to carry out functions and draw information from sources for which the user already has legitimate permission to access. 

    Legally and functionally it would be very much like handing a human PA your iPhone and then asking him or her to use it to carry out various tasks on your behalf. 

    Imagine waking in the morning and asking Siri to implement various smart home functions based on current conditions like the weather and what you've got on your schedule for the day. Then imagine asking Siri what the morning news is, and it pulls information from your Apple News subscription, along with other news sources to which you have subscribed or otherwise have access, and verbally gives you a news summary, citing each source. Then you ask Siri to bookmark a few of the source articles so you can read them during breakfast. You could ask Siri to order lunch, or flowers, or an Uber, and it simply interfaces with on-device apps and accounts on your behalf. Apple could implement this sort of thing without running roughshod over copyrights, and without selling out the user's privacy and security. 

    This could be how Apple once again enters a space seemingly late, but then implements its vision of that thing so well that others must regroup and scramble to catch up. 
    That is very far from the truth. 

    A quote from MWC2024:

    "In addition to quantity though, we must also look at quality. The more accurate, reliable, relevant, and valuable our data, the more reliable our model input. This improves the availability and reliability of models. This is how data determines the power of AI,”

    A huge amount of work has already gone into LLMs using clean data. Official data. 

    https://readwrite.com/huawei-build-ai-model-for-accurate-weather-forecasting/

    The list of shipping solutions covers just about everything you can think of and not all of it was dredged off the internet. Far from it. That doesn't mean that dredging for data doesn't have its own value. There may be issues surrounding licencing in some cases but that is another story. 

    AI requires a lot of elements to tie it all together and Apple isn't producing many of those elements. 

    https://developingtelecoms.com/telecom-business/vendor-news/16365-huawei-urges-data-importance-for-ai-age.html

    We are also fast approaching the yottabyte era and AI will be just one of the factors driving an ever increasing need for data storage and processing. Almost unimaginable amounts. 

    https://www.huawei.com/en/news/2023/5/data-infrastructure-forum


    Sure, because Huawei has an exemplary record in respecting IP. 
    Better than Apple's? 

    "Disputes over intellectual property are common in international business. According to public records, from 2009 to 2019, Apple was involved in 596 intellectual property lawsuits and Samsung in 519. Huawei was involved in 209. The US Department of Justice has insisted on filing a criminal lawsuit against Huawei over the kind of civil intellectual property disputes that are common across the industry."

    https://www.huawei.com/en/news/2020/2/huawei-statement-on-us-justice-department-indictment

    "Huawei is one of the world's largest patent holders, holding more than 120,000 active patents by the end of 2022." 

    https://www.huawei.com/en/sustainability/the-latest/stories/intellectual-property-and-trade-secret-protection

    With over 200,000 employees across the globe you will inevitably get some problems, and no company will ever be free of accusations and some of them may play out to be well founded, but how many? A minute fraction? 

    Just last year Huawei filed more patents than anyone. Apple included. Obviously there is more to see here than your one liner. 

    But what has that got to do with the point I raised? 
    Plenty. 
    Now you are down to one worders.

    Clean data is used for all manner of AI tasks. 

    "The system was engineered using an array of operations research methodologies – such as continuous optimization, integer programming, graph theory, scheduling, and network-flow problem solving, along with state-of-the-art machine learning algorithms"

    https://www.theregister.com/2024/04/10/huawei_cloud_streaming_network_optimizer/

    Literally all over the place and in many areas where Apple simply does not have business but nevertheless still relies on it to some degree.


  • Reply 9 of 9
    Looking at the first part of AppleZulu’s original post on 9th April, I agree.

    AI is nothing special or different.  It is just more data dishing out information, mostly wrong and purely to generate more revenue.  It is not designed to be actually useful.  It is already pretty much impossible to get a simple answer to a simple question off the internet.  You might get a million responses but, aside from them largely being repeats, they don’t answer the question. It does tell you you can buy your question from Temu or Amazon, though which, if you want to know how long to cook lamb for (as I did last week), is not very helpful. 

    Webpages, being cluttered with adverts have largely become unreadable.

    Probably 30% of the time some part of the system doesn’t work as it should to a greater or lesser extent, with system restarts being required several times a day.

    Then there are multiple unnecessary reminders of things, even when I’ve acknowledged the reminder and it’s nothing to do with me anyway and security settings so strong it renders use of a device impossible.

    What we need to do is get the basics right and not try to create ever more complex systems which serve no useful purpose.

Sign In or Register to comment.