- Last Active
Going to toss this out there:
- archive.org caches copyrighted images aplenty
- google and other search engines do the same
- your browser, same
There's something to be said about Terms of Service that allows for specific use cases where an automated system is allowed to access resources for specific purposes and with specific limitations (that is, search engines can't display the full text of articles - only a snippet, etc.). If the AI models were trained on images via a process that violated ToS, then yes, they'd be in violation of those terms, and probably copyright as well (IANAL). However, I'm not sure there's proof the plaintiffs can bring that their specific works were accessed in violation of ToS during the training process. If they can, they might have a leg to stand on and the models might need to be retrained a bit more carefully, but could still perform similarly.
I'm guessing not every published resource on the web must have a ToS explicitly enabling machine bulk access like spidering/crawling, or search engines could never exist. I therefore assume only websites that publish a ToS with language explicitly restricting machine access could be considered protected from such.
On the question of derivation, if a human can look at images, learn from them, and then have in mind the ideas of what a cat or a car looks like, or develop a heuristic for what a given artist's style is, then I think using a tool to do the same (look at images, learn from them, and generate something based on that learning) is no different. Unless I greatly misunderstand what the model contains or how it works, it doesn't retain a copy of all the images, nor does it reference them during generation - only it's "learning". I'm fine with that, even if it upends some things.