AI, Content Scraping and the Negative Affects on Copyright

kat · **on:** July 21, 2023, 03:19:32 PM

[image courtesy CrAIyon.com based on "AI gobbling monster vacuuming artwork" text input]

How is AI content scraping affecting copyright

As another exercise in asking AI what it essentially thinks about various 'AI' topics, BING AI is quizzed about copyright and content scraping, how the latter affects the former.

AI content scraping is a complex issue that has been debated for years.

While it is true that content scraping in relation to AI "has been debated for years", the discussion had been limited to manually run and controlled scripts and bots, not autonomous programs able to function unaided or undirected, with tools utilised to scrape content restricted to small datasets, essentially what data scientists had on their hard-drives.

Now these tools are available to, and used by, multinational corporations with unlimited budgets, content scraping is on a whole 'nother level, and none of it is traceable (for image-based content production, AI does not yet publish a source list when compiling something based on text prompts).

Legally, AI systems cannot be considered the author of the material they produce. Their outputs are simply a culmination of human-made work, much of which has been scraped from the internet and is copyright protected in one way or another.

The argument has never been that AI is 'claiming' to be, or is being "considered the author" of anything that's produced. The complaint is that their output is based on potential misappropriations of other people content; this is especially so considering AI cannot function without initial like-for-like input replication - ten artists drawing the same egg will produce ten different results; an AI 'drawing' the same egg has to copy, duplicate, at least one source image pixel-by-pixel to (re)construct an iteration or derivation of that same egg.

While it is possible to make an 'influence' arguement, the artist (human) will find it exceptionally difficult to make an exact duplicate of something without a direct copy/paste of a digital asset, something not possible inRL - 'influence' is not the same argument as 'duplicate' or 'copy', especially in the digital realm.

AI art tools like Stable Diffusion rely on human-created images for training data, which companies scrape from the web, often without their creators' knowledge or consent. AI firms claim this practice is covered by laws like the US fair use doctrine, but many rights holders disagree and say it constitutes copyright violation[1].

Per Title 17 S/S107 "Limitations on exclusive rights: Fair use" (PDF) the Fair Use doctrine accommodates;

In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include -

the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

the nature of the copyrighted work;

the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

As with all issues of Copyright, individual or Corporate, claims of misappropriations (ne 'theft') are up to the owner of Copyright to a given work, not those allegedly infringing or engaged in the misappropriation.

The legal implications of using generative AI are still unclear, particularly in relation to copyright infringement[2].

In practice the "legal implications of using generative AI" are not quite as "unclear" as AI itself would have creatives and the general public believe; scraping content without permission is a misappropriation. The issue is what happens after that, the output, is the iterative output then subject to misappropriation - source being scraped doesn't expressly mean output isn't 'derivative' and subject to protection itself.

The broader issue with AI in relation to copyright infringements is that Fair Use tends to be tolerant of educational and/or research use, which are the underlying development environments for AI, but not the subsequent commercialisation of that research which, typically include 'stock' or 'base line' content, isn't (re)distributing others work directly. In other words, although others work is used to train AI processes, the end result isn't the image out but the generative or iterative 'machine' that compiles input fed into it. That, the machine, is 'original'.

There are also concerns about how AI-generated content can be used to spread disinformation and propaganda[3].

And here is the actual worry, not subsuming creator's rights, rather simply how AI can be used to spread "disinformation and propaganda". That is a topic for separate discussion entirely.

I hope this helps!

No, it really doesn't.

Sources BING AI used to compile its answer to the question.

[1] Getty Images is suing the creators of AI art tool Stable Diffusion for scraping its content - The Verge
[2] Generative AI Has an Intellectual Property Problem (hbr.org)
[3] AI, machine learning and EU copyright law - CREATe
[4] AI content scraping copyright - Search (bing.com)
[5] The lawsuit against Microsoft, GitHub and OpenAI that could change the rules of AI copyright - The Verge
[6] AI and Copyright Law: What We Know | Built In