Opinion: The Copyright Office is making a mistake on AI-generated art

@[email protected] · 1 year ago

Opinion: The Copyright Office is making a mistake on AI-generated art

@[email protected] · edit-2 1 year ago

When determining whether something is fair use, the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.

Search engine scrapers are fair use, because they only copy a snippet of a work and a search result cannot substitute for the work itself. Likewise if you copy an excerpt of a movie in order to critique it, because consumers don’t watch reviews as a substitute for watching movies.

On the other hand, openAI is accused of copying entire works, and openAI is explicitly intended as a replacement for hiring actual writers. I think it is unlikely to be considered fair use.

And in practice, fair use is not easy to establish.

@[email protected] · 1 year ago

Removed by mod

@[email protected] · edit-2 1 year ago

I know the model doesn’t contain a copy of the training data, but it doesn’t matter.

If the copyrighted data is downloaded at any point during training, that’s an IP violation. Even if it is immediately deleted after being processed by the model.

As an analogy, if you illegally download a Disney movie, watch it, write a movie review, and then delete the file … then you still violated copyright. The movie review doesn’t contain the Disney movie and your computer no longer has a copy of the Disney movie. But at one point it did, and that’s all that matters.

@[email protected] · 1 year ago

Removed by mod

@[email protected] · 1 year ago

No, it doesn’t.

It defends web scraping (downloading copyrighted works) as legal if necessary for fair use. But fair use is not a foregone conclusion.

In fact, there was a recent case in which a company was sued for scraping images and texts from Facebook users. Their goal was to analyze them and create a database of advertising trackers, in competition with Facebook. The case settled, but not before the judge noted that the web scraper was not fair use and very likely infringing IP.

@[email protected] · 1 year ago

Removed by mod

@[email protected] · edit-2 1 year ago

Yes, it absolutely hinges on fair use. That’s why the very first page of the lawsuit alleges:

“Defendants’ LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create”

If the court agrees with that claim, it will basically kill the fair use defense.

@[email protected] · 1 year ago

Removed by mod

@[email protected] · edit-2 1 year ago

the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market.

Yes, and I named three of those factors:

the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.

And while you don’t need to meet all the criteria, the odds are pretty long when you fail three of the four (commercial nature, copying complete work rather than a portion, and negative effect on the market for the original).

Think of it this way: if it were legal to download books in order to train an AI, then it would also be legal to download books in order to train a human student. After all, why would a human have fewer rights than an AI?

Do you really think courts are going to decide that it’s ok to download books from The Pirate Bay or Z-Library, provided they are being read by the next generation of writers?