The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

@[email protected] · 6 months ago

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

@[email protected] · 6 months ago

The issue isn’t that you can coax AI into giving away unaltered copyrighted books out of their trunk, the issue is that if you were to open the hood, you’d see that the entire engine is made of unaltered copyrighted books.

All those “anti hacking” measures are just there to obfuscate the fact that that the unaltered works are being in use and recallable at all times.

@[email protected] · edit-2 6 months ago

This is an inaccurate understanding of what’s going on. Under the hood is a neutral network with weights and biases, not a database of copyrighted work. That neutral network was trained on a HEAVILY filtered training set (as mentioned above, 45 terabytes was reduced to 570 GB for GPT3). Getting it to bug out and generate full sections of training data from its neutral network is a fun parlor trick, but you’re not going to use it to pirate a book. People do that the old fashioned way by just adding type:pdf to their common web search.

@[email protected] · 6 months ago

Again: nobody is complaining that you can make AI spit out their training data because AI is the only source of that training data. That is not the issue and nobody cares about AI as a delivery source of pirated material. The issue is that next to the transformed output, the not-transformed input is being in use in a commercial product.

@[email protected] · 6 months ago

The issue is that next to the transformed output, the not-transformed input is being in use in a commercial product.

Are you only talking about the word repetition glitch?