- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
GenAI tools ‘could not exist’ if firms are made to pay copyright::undefined
GenAI tools ‘could not exist’ if firms are made to pay copyright::undefined
The Kit Walsh article purposefully handwaves around a couple of issues that could present larger issues as law suits in this arena continue.
He says that due to the size of training data and the model, only a byte of data per image could be stored in any compressed format, but this assumes all training data is treated equally. It’s very possible certain image artifacts are compressed/stored in the weights more than other images.
These models don’t produce exact copies. Beyond the Getty issue, nytimes recently released an article about a near duplicate - https://www.nytimes.com/interactive/2024/01/25/business/ai-image-generators-openai-microsoft-midjourney-copyright.html.
I think some of the points he makes are valid, but they’re making a lot of assumptions about what is actually going on in these models which we either don’t know for certain or have evidence to the contrary.
I didn’t read Katherine’s article so maybe there is something more there.
Removed by mod
I’m not sure she does, just read the article and it focuses primarily what models can train on. However, the real meat of the issue, at least I think, with GenAI is what it produces.
For example, if I built a model that just spit out exact frames from “Space Jam”, I don’t think anyone would argue that would be a problem. The question is where is the line?
Removed by mod
This goes back to my previous comment of handwaving away the details. There is a model out there that clearly is reproducing copyrighted materials almost identically (nytimes article), we also have issues with models spitting out training data https://www.wired.com/story/chatgpt-poem-forever-security-roundup/. Clearly people studying these models don’t fully know what is actually possible.
Additionally, it only takes one instance to show that these models, in general, can and do have issues with regurgitating copyrighted data. Whether that passes the bar for legal consequences we’ll have to see, but i think it’s dangerous to take a couple of statements made by people who don’t seem to understand the unknowns in this space at face value.
Removed by mod