Apple licenses millions of Shutterstock images to train its AI models

AppleInsider · April 6, 2024 6:10PM

Apple has struck a deal to license millions of images from Shutterstock in order to train its AI models.

Other tech companies have obtained similar deals from Shutterstock to help develop visual AI engines, including Google, Meta, and Amazon. News of Apple's deal comes well after its signing in late 2022, and is expected to cost Apple up to $50 million.

This follows on from news of previous negotiations between Apple and various publishers for similar AI large language model (LLM) training using content from news articles. Conde Nast IAC, and NBC are among the big media names that have allegedly been in talks with Apple about licensing their content.

Apple is expected to make some major announcements about its efforts to add more AI technologies into its operating systems this June, at WWDC. Though often perceived as being behind its rivals in AI integration, Apple has made some innovations of its own.

Over the past year, Apple device users may have noticed smaller improvements in Apple's "machine learning" technologies. Predictive text, for example, has grown steadly more accurate in adapting to a given user's preferred vocabulary, and Siri has improved its ability to translate common phrases.

The next generation of Apple's processors are rumored to be including substantially more powerful neural engines.

Apple's Senior VP of Worldwide Marketing, Greg Joswiak, has quipped on social media that the next WWDC conference will be "Absolutely Incredible," hinting that the conference will be heavily focused around AI type features being added to iOS 18 and other Apple OSes.

The big challenge for Apple in using AI technologies is in maintaining its standards on user privacy, a problem other big AI-using tech firms don't concern themselves with. Apple has recently revealed that it intends to develop LLMs that can use on-device technology as much as possible.

Read on AppleInsider

cesar battistini maziero · April 6, 2024 8:31PM

Apple is too pure for this world.

Open ai, Google, Meta and Microsoft, trained their ai on the web on everyone’s content. Even personal content.

They haven’t licensed most of the material. They straight up stollen the web.

byronl · April 6, 2024 8:52PM

Cesar Battistini Maziero said:

Apple is too pure for this world.

Open ai, Google, Meta and Microsoft, trained their ai on the web on everyone’s content. Even personal content.

They haven’t licensed most of the material. They straight up stollen the web.

Apple definitely is not too pure for this world...

gatorguy · April 6, 2024 9:34PM

Cesar Battistini Maziero said:

Apple is too pure for this world.

Open ai, Google, Meta and Microsoft, trained their ai on the web on everyone’s content. Even personal content.

They haven’t licensed most of the material. They straight up stollen the web.

Did you read the AI article? Google and Meta also licensed Shutterstock's image library some time back, Google doing so in 2016. They didn't steal it.

If you're curious, prior to this 2022 licensing deal with Shutterstock for AI training images, Apple had been using undisclosed "other photo data sets" for AI training data for over 8 years, though none of them were images from their own customers...
unless you or I agreed to it hidden somewhere in the multipage ToS for some Apple service or iOS app.

Further to that, as I read Apple's iOS10 disclosures, Apple did say they may use other data from us for AI training, but anonymized with differential privacy so it could no longer be connected to us as personal data. That's quite similar to some other techs using anonymized/differential data and it being considered acceptable since it was no longer deemed identifiable.

The ChatGPT, Canva, DALL-E etc Generative AI training models have reopened the conversation, so what was once considered OK no longer is.
Thus, the relatively recent rush to pay to license data sets rather than scraping from customer-contributed content, interactions, and the general web even if it is anonymized.

edited April 2024

bobolicious · April 7, 2024 12:06AM

Further to that, as I read Apple's iOS10 disclosures, Apple did say they may use other data from us for AI training, but anonymized with differential privacy so it could no longer be connected to us as personal data.

What does this mean for human content creators...? Is there also a legal argument for moral rights requiring credit where credit is due ?

https://en.wikipedia.org/wiki/Moral_rights

"The preserving of the integrity of the work allows the author to object to alteration, distortion, or mutilation of the work that is "prejudicial to the author's honor or reputation". Anything else that may detract from the artist's relationship with the work even after it leaves the artist's possession or ownership may bring these moral rights into play."

"Even if an artist has assigned his or her copyright rights to a work to a third party, he or she still maintains the moral rights to the work"

Does this create as well an embedded risk for anyone using or relying on AI as a basis for derivative work ?

Lawyers start your engines...

... when Photos was introduced shortly after Steve Jobs died it 'featured' always on image tagging and iCloud sync vs the 2' usb cable on the desk ...

... would this also apply to all the contact data customers sync which might include headshots without the contact's knowledge or permission ...?

edited April 2024

timmillea · April 7, 2024 5:55AM

I am sure an AI lawyer would be cheaper and more effective.

Apple licenses millions of Shutterstock images to train its AI models

Comments