Apple insists its AI training is ethical and respects publishers

AppleInsider · July 21, 2025 3:00PM

In a new research paper, Apple doubles down on its claim of not training its Apple Intelligence models on anything scraped illegally from the web.

Apple Intelligence -- image credit: Apple

It's a fair bet that Artificial Intelligence systems have been scraping every part of the web they can access, whether or not they should. In 2023, both OpenAI and Microsoft were sued by the New York Times for copyright infringement, and that was far from the only suit.

Whereas also in 2023, Apple was reported to have attempted to buy the rights to train its large language models (LLMs) on work from publishers including Conde Nast, and NBC News. Apple was said to have offered publishers millions of dollars, although it was not clear at the time which had agreed or disagreed.

Now in a newly published research paper, Apple says that if a publisher does not agree to its data being scraped for training, Apple won't scrape it.

Apple details its ethics

"We believe in training our models using diverse and high-quality data," says Apple. "This includes data that we've licensed from publishers, curated from publicly available or open-sourced datasets, and publicly available information crawled by our web-crawler, Applebot."

"We do not use our users' private personal data or user interactions when training our foundation models, it continues. "Additionally, we take steps to apply filters to remove certain categories of personally identifiable information and to exclude profanity and unsafe material. "

Most of the research paper is concerned with how Apple goes about doing this scraping, and specifically how its internal Applebot system ensures getting useful information despite "the noisy nature of the web." But it does return to the overall issues regarding copyright, and each time insists that Apple is respecting rights holders.

"[We] continue to follow best practices for ethical web crawling, including following widely-adopted robots. txt protocols to allow web publishers to opt out of their content being used to train Apple's generative foundation models," says Apple. "Web publishers have fine-grained controls over which pages Applebot can see and how they are used while still appearing in search results within Siri and Spotlight."

The "fine-grained controls" appear to be based around the long-standing robots.txt system. That is not any kind of standard privacy system, but it is widely adopted and involves publishers including a text file called robots.txt on their sites.

A geometric, white, interwoven circular pattern on a teal background, resembling a knot or flower.

ChatGPT logo - image credit: OpenAI

If an AI system sees that file, it is supposed to not scrape the site or specific pages that the file details. It's as simple as that.

What companies say and what they do

It's easy to say that a company's AI systems will respect robots.txt, and OpenAI implies -- but only implies -- that it does too.

"Decades ago, the robots.txt standard was introduced and voluntarily adopted by the Internet ecosystem for web publishers to indicate what portions of websites web crawlers could access," said OpenAI in a May 2024 blog post called "Our approach to data and AI."

"Last summer," it continued, "OpenAI pioneered the use of web crawler permissions for AI, enabling web publishers to express their preferences about the use of their content in AI. We take these signals into account each time we train a new model."

Even that last part about taking signals into account is not the same as saying OpenAI respects these signals. Then that key paragraph about signals directly follows the one about robots.txt, but does not explicitly say it pays any attention.

And seemingly a great many AI companies do not adhere to any robots.txt instructions. Market analysis firm TollBit said that in March 2025, there were over 26 million disallowed scrapes where AI firms ignored robots.txt entirely.

The same firm also reports that the number is rising. In Q4 2024, 3.3% of AI scrapes ignored robots.txt, and in Q1 2025 it was around 13%.

While TollBit does not speculate on the reasons for this, it's likely that the entire available internet has already been scraped. So the companies are pressing on, and in June 2025, a US District Court said they could.

Robots.txt is more than a simple no

When any AI system attempts to scrape a website, it identifies itself. So when Google does it, the site registers that Googlebot is accessing it, and returns a comprehensive list of permissions.

That list comprises which sections of the site the bot is not allowed to access. When Apple's system, Applebot, was revealed in 2015, Apple said that if a site doesn't recognize it, Applebot would follow any guidelines included for Googlebot.

The BBC said in 2023 that "we have taken steps to prevent web crawlers like those from OpenAI and Common Crawl from accessing BBC websites." Around the same time, a study of 1,156 news publishers found that 626 had blocked AI scraping, including that by OpenAI and Google AI.

Text 'Anthropic' overlaid on code, gavel, and blurred background.

A court case against Anthropic has concluded that AI can train on any material

But a company changed the name of its scraping tool, and it can just ignore blocks -- or at least be accused of doing so.

Perplexity.ai -- which Apple is repeatedly rumored to be buying -- marketed itself as an ethical AI too, with a detailed blog post about why ethics are so necessary.

But that was published in November 2024, and in the June before it, Forbes threatened Perplexity over it having scraped anyway. Perplexity CEO Aravind Srinivas later admitted to its search and scraping having some "rough edges."

Apple stands out in AI

Unless Apple's claims on ethical AI training are challenged legally, as Forbes at least started to do with Perplexity.ai, we will never know if they are true.

But OpenAI has been sued over this, Microsoft has, and Perplexity has been called out for doing it. So far, no one has claimed Apple has done anything unethical.

That's not the same thing as publishers being happy with any firm training its LLMs on the data, but so far, Apple may be the only one doing it all legally.

Read on AppleInsider

ssfe11 · July 21, 2025 3:17PM

Of course Apple is the only one doing it legally and ethically. Thats why it has 2.4b using its products.

cheesefreeze · July 21, 2025 4:08PM

Of course Apple hasn’t been called out on ethics and AI. So far there’s not much to see to begin with.

crossplatformfrogger · July 21, 2025 4:13PM

I mean this all sounds good until you realize Apple doesn't really have a product in this field. Apple Intelligence is a nothing burger and instead of publishing research papers they should get on the ball and get competitive in this field.

wesley_hilliard · July 21, 2025 4:55PM

CheeseFreeze said:

Of course Apple hasn’t been called out on ethics and AI. So far there’s not much to see to begin with.

I'd say launching an on-device AI across every compatible device in the market is more than "not much to see." It gave every compatible device access to a private and secure on-device AI for editing text, photos, and generating images. It also provided access to ChatGPT without needing to worry about data collection.

It's quite an innovative launch when you're not viewing it from the "AI will take over the world and replace all jobs" nonsense point of view. Grounded here in reality, Apple Intelligence is quite useful for the everyday consumer. Does Apple need a lying chatbot?

CrossPlatformFrogger said:

I mean this all sounds good until you realize Apple doesn't really have a product in this field. Apple Intelligence is a nothing burger and instead of publishing research papers they should get on the ball and get competitive in this field.

Believe it or not, publishing research papers is how you compete in the field. Engineers fought with Apple to get their names on public documents for this reason.

As for not having a product in the field, see above. I really don't know what people expect Apple to have when they make statements like this. Apple isn't in the business of chatbots, but what they're doing seems to be much more useful, legal, and ethical. And at the least, for those looking for access to those other models, Apple is giving users that access via a private and secure access point.

ihatescreennames · July 21, 2025 5:11PM

Wesley_Hilliard said:

It's quite an innovative launch when you're not viewing it from the "AI will take over the world and replace all jobs" nonsense point of view. Grounded here in reality, Apple Intelligence is quite useful for the everyday consumer. Does Apple need a lying chatbot?

I agree. I figure people who complain about Apple being behind just aren’t doing what I do. For example, my wife and I have been dealing with some extended family issues. We will typically use FaceTime Audio when we speak over the phone regarding these issues. One day my wife was rushing to get out the door but wanted me to send an email/text to her mother. She made a FaceTime Audio call to me from the car which I recorded with my iPhone. At the end of the call that recording was available with a transcript in Notes. I used Writing Tools to summarize the three points my wife had for her mom. I did it using “Professional” but that tone was rejected, so I redid it using “Friendly” and that got approval to send on. This was all done privately, didn’t require third-party (and potentially nosy) software and used Apple Intelligence. It worked great!

With ChatGPT I can take a photo of something and ask for assistance or directions and that works great, too. That isn’t something that Apple Intelligence can do, maybe in the future. But I ChatGPT couldn’t have done what I did with that phone call and kept it all private.

wesley_hilliard · July 21, 2025 5:34PM

ihatescreennames said:

Wesley_Hilliard said:

It's quite an innovative launch when you're not viewing it from the "AI will take over the world and replace all jobs" nonsense point of view. Grounded here in reality, Apple Intelligence is quite useful for the everyday consumer. Does Apple need a lying chatbot?

I agree. I figure people who complain about Apple being behind just aren’t doing what I do. For example, my wife and I have been dealing with some extended family issues. We will typically use FaceTime Audio when we speak over the phone regarding these issues. One day my wife was rushing to get out the door but wanted me to send an email/text to her mother. She made a FaceTime Audio call to me from the car which I recorded with my iPhone. At the end of the call that recording was available with a transcript in Notes. I used Writing Tools to summarize the three points my wife had for her mom. I did it using “Professional” but that tone was rejected, so I redid it using “Friendly” and that got approval to send on. This was all done privately, didn’t require third-party (and potentially nosy) software and used Apple Intelligence. It worked great!

With ChatGPT I can take a photo of something and ask for assistance or directions and that works great, too. That isn’t something that Apple Intelligence can do, maybe in the future. But I ChatGPT couldn’t have done what I did with that phone call and kept it all private.

When using ChatGPT via Siri or Visual Intelligence, none of the data, including the images, text, or prompt, are kept by OpenAI or Google. The session is tossed after it is complete, which is part of Apple's work with third parties in this respect. It could get even more private soon if a Private Cloud Compute version of ChatGPT or others are made available to users.

I was able to utilize this recently by taking a photo of a gift my friend received during a baby shower. Visual Intelligence let me do a quick reverse image search via Google and brought up the product in question. All private, with none of the data being used by Google or others. That's the innovation Apple is providing via its approach to accessible, on-device features and call outs to third-party AI.

No one else is doing this. And this feels way more useful than a Hatsune Miku I can have an affair with.

eriamjh · July 21, 2025 7:30PM

As someone who runs a forum on the internet, my site had to be shut down due to AI scrapers bombarding my site with requests.

Sure, I had 50GB of pictures and over 370,000 posts made over 25 years from over 6000 users so it was perfect to learn from, but they did not have permission.

These scrapers nearly killed my site.

Needless to say, it was closed for three months while I figured out WTF to do. CloudFlare blocked a lot, but I still have too many “guests” leaching bandwidth and content.

I want to be able to tell Apple to make my own private AI search to improve my forum for users. I want to control the access.

I’m still figuring out how to minimize their disruption and stop content stealing.

crossplatformfrogger · July 21, 2025 8:05PM

Wesley_Hilliard said:

CrossPlatformFrogger said:

I mean this all sounds good until you realize Apple doesn't really have a product in this field. Apple Intelligence is a nothing burger and instead of publishing research papers they should get on the ball and get competitive in this field.

Believe it or not, publishing research papers is how you compete in the field. Engineers fought with Apple to get their names on public documents for this reason.

As for not having a product in the field, see above. I really don't know what people expect Apple to have when they make statements like this. Apple isn't in the business of chatbots, but what they're doing seems to be much more useful, legal, and ethical. And at the least, for those looking for access to those other models, Apple is giving users that access via a private and secure access point.

I think you guys look at the small picture when you keep mentioning AI chatbots because AI is more than that!
I'm building custom scripts for trading, looking forward to building apps for personal use, going to be using an AI agent to do my online shopping for me, completing tasks like building a Google sheet that will constantly scan the market and build me a watchlist based on my criteria, etc. I mean I'm really getting into learning all I can and I hope to one day get rid of Google Home and replace it with a custom built ChatGPT agent.
Apple is nowhere on this level and I honestly don't see them ever getting close! Apple piggybacking off ChatGPT is not them having a product in the field and I'm sure nobody in this field considers Apple competition. Apple Intelligence by itself is a nothing burger, and Apple writing some research paper about how ethical their non-existent product is means nothing, they need to get in the game.
This is so frustrating because instead of letting you replace Siri and Apple "Intelligence" with something more useful, we are stuck waiting on Apple to actually do something and not just talk about it

wesley_hilliard · July 21, 2025 9:16PM

CrossPlatformFrogger said:

Wesley_Hilliard said:

CrossPlatformFrogger said:

I mean this all sounds good until you realize Apple doesn't really have a product in this field. Apple Intelligence is a nothing burger and instead of publishing research papers they should get on the ball and get competitive in this field.

Believe it or not, publishing research papers is how you compete in the field. Engineers fought with Apple to get their names on public documents for this reason.

As for not having a product in the field, see above. I really don't know what people expect Apple to have when they make statements like this. Apple isn't in the business of chatbots, but what they're doing seems to be much more useful, legal, and ethical. And at the least, for those looking for access to those other models, Apple is giving users that access via a private and secure access point.

I think you guys look at the small picture when you keep mentioning AI chatbots because AI is more than that!
I'm building custom scripts for trading, looking forward to building apps for personal use, going to be using an AI agent to do my online shopping for me, completing tasks like building a Google sheet that will constantly scan the market and build me a watchlist based on my criteria, etc. I mean I'm really getting into learning all I can and I hope to one day get rid of Google Home and replace it with a custom built ChatGPT agent.
Apple is nowhere on this level and I honestly don't see them ever getting close! Apple piggybacking off ChatGPT is not them having a product in the field and I'm sure nobody in this field considers Apple competition. Apple Intelligence by itself is a nothing burger, and Apple writing some research paper about how ethical their non-existent product is means nothing, they need to get in the game.
This is so frustrating because instead of letting you replace Siri and Apple "Intelligence" with something more useful, we are stuck waiting on Apple to actually do something and not just talk about it

Scripts for trading? As in money on the stock market? Good luck with that. And shopping too? Yeesh. I wonder if there's AI insurance available for bank accounts yet.

If you've been paying attention, Apple is building the framework for any and every task to run via Apple Intelligence. The on-device model being private and secure is an amazing boon, and soon developers will be able to use it for any general task that they can target the AI at. We'll see what people do and what the limitations are after the public release, but it sounds promising.

Again, "I don't want to use these tools" doesn't translate to "non-existent product." I don't want to let ChatGPT spend my money, but that doesn't mean that the feature doesn't exist. Apple Intelligence provides useful features and will only improve going forward. App Intents and contextual AI with Siri will be a game changer too, once it launches.

I use Apple Intelligence every day, but I don't use ChatGPT except in the rarest of occasions, and even then, I use it through Siri. Should I be declaring ChatGPT a useless AI tool? No, because that's silly.

Demanding something to release because of some misconception about Apple being behind is unfounded. Why release garbage like the rest of the industry when you can do something actually useful for iPhone users?

What's wild is that you dismiss Apple's ability to make ChatGPT an accessory to Apple Intelligence as some kind of failure of Apple's. Instead, it's a huge boon to users to not only be able to access third-party AI with privacy and security in place, but to have it potentially come to Private Cloud Compute as well. Apple is the only company that seems to be gearing up for an AI ecosystem within Apple Intelligence, and if you can't see how that will propel Apple forward in the AI conversation, then I don't know what to tell you.

AI chatbots, ChatGPT, etc are glorified features, not standalone products. They're a technology without a home. I wouldn't bet against Apple here.

longfang · July 22, 2025 2:16AM

CrossPlatformFrogger said:

I mean this all sounds good until you realize Apple doesn't really have a product in this field. Apple Intelligence is a nothing burger and instead of publishing research papers they should get on the ball and get competitive in this field.

So you want Apple to engage in shady practices?

coolfactor · July 22, 2025 3:18AM

CheeseFreeze said:

Of course Apple hasn’t been called out on ethics and AI. So far there’s not much to see to begin with.

This all-or-nothing thinking isn't serving anybody. You're not sounding smart by repeating this fable.

Apple has been employing various levels of machine-learning tech for decades. The keyboard on the very first iPhone employed it ... 18 years ago.

It's only been this latest "GenAI" craze where they were caught with their pants down, but they still make a sale selling devices to consumers that access competing GenAI. So they win either way.

mikethemartian · July 22, 2025 5:09AM

Wesley_Hilliard said:

CheeseFreeze said:

Of course Apple hasn’t been called out on ethics and AI. So far there’s not much to see to begin with.

I'd say launching an on-device AI across every compatible device in the market is more than "not much to see." It gave every compatible device access to a private and secure on-device AI for editing text, photos, and generating images. It also provided access to ChatGPT without needing to worry about data collection.

It's quite an innovative launch when you're not viewing it from the "AI will take over the world and replace all jobs" nonsense point of view. Grounded here in reality, Apple Intelligence is quite useful for the everyday consumer. Does Apple need a lying chatbot?

That is all pretty basic stuff. I just had a conversation with ChatGPT discussing Mie scattering theory, Kirchhoff diffraction theory, quantum memories, Green's functions and a bunch of other topics in engineering and applied physics.

wesley_hilliard · July 22, 2025 11:17AM

mikethemartian said:

Wesley_Hilliard said:

CheeseFreeze said:

Of course Apple hasn’t been called out on ethics and AI. So far there’s not much to see to begin with.

I'd say launching an on-device AI across every compatible device in the market is more than "not much to see." It gave every compatible device access to a private and secure on-device AI for editing text, photos, and generating images. It also provided access to ChatGPT without needing to worry about data collection.

It's quite an innovative launch when you're not viewing it from the "AI will take over the world and replace all jobs" nonsense point of view. Grounded here in reality, Apple Intelligence is quite useful for the everyday consumer. Does Apple need a lying chatbot?

That is all pretty basic stuff. I just had a conversation with ChatGPT discussing Mie scattering theory, Kirchhoff diffraction theory, quantum memories, Green's functions and a bunch of other topics in engineering and applied physics.

So you talked to a chatbot. Cool. Apple isnt making one if those, but you can talk to one via Siri.

randominternetperson · July 22, 2025 3:51PM

AppleInsider said:

While TollBit does not speculate on the reasons for this, it's likely that the entire available internet has already been scraped. So the companies are pressing on, and in June 2025, a US District Court said they could.
...
A court case against Anthropic has concluded that AI can train on any material

William, I appreciate your reporting in this area, but you have misinterpreted the earlier article about the Anthropic case. The Court was very clear that there is a difference between using materials acquired legally and those that were not. Also that case had nothing to do with web scraping; it was entirely about "pirated" books and books purchased legally.

Here's the conclusion about Anthropic's purported "fair use" of the pirated books:

This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason. But it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies. We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.

Now, ignoring a robots.txt file and scraping publicly available web pages is not the same as downloading pirated books. That would be a legal question about terms of use, presumably, and that was not at issue in that case. So, maybe a court will rule that ignoring robots.txt files is fine, but the case cited does not say that.

Apple insists its AI training is ethical and respects publishers

Apple details its ethics

What companies say and what they do

Robots.txt is more than a simple no

Apple stands out in AI

Comments