L4sBotMB to

[email protected]English • 1 year ago

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.

www.nytimes.com

cross-posted to:
[email protected]

534

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.

www.nytimes.com

L4sBotMB to

[email protected]English • 1 year ago

cross-posted to:
[email protected]

Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

Chat

@[email protected]
link
fedilink
English
6•1 year ago

But where is the infringement?

Do Training weights have the data? Are the servers copying said data on a mass scale, in a way that the original copyrighters don’t want or can’t control?
- @[email protected]
  link
  fedilink
  English
  14•1 year ago
  Data is not copyrighted, only the image is. Furthermore you can not copyright a number, even though you could use a sufficiently large number to completely represent a specific image. There’s also the fact that copyright does not protect possession of works, only distribution of them. If I obtained a copyrighted work no matter the means chosen to do so, I’ve committed no crime so long as I don’t duplicate that work. This gets into a legal grey area around computers and the fundamental way they work, but it was already kind of fuzzy if you really think about it anyway. Does viewing a copyrighted image violate copyright? The visual data of that image has been copied into your brain. You have the memory of that image. If you have the talent you could even reproduce that copyrighted work so clearly a copy of it exists in your brain.
  - @[email protected]
    link
    fedilink
    English
    5•
    edit-2
    1 year ago
    
    only distribution of them.
    
    Yeah. And the hard drives and networks that pass Midjourney’s network weights around?
    
    That’s distribution. Did Midjourney obtain a license from the artists to allow large numbers of “Joker” copyrighted data to be copied on a ton of servers in their data-center so that Midjourney can run? They’re clearly letting the public use this data.
    - @[email protected]
      link
      fedilink
      English
      7•1 year ago
      Because they’re not copying around images of Joker, they’re copying around a work derived from many many things including images of Joker. Copying a derived work does not violate the copyright of the work it was derived from. The wrinkle in this case is that you can extract something very similar to the original works back out of the derived work after the fact. It would be like if you could bake a cake, pass it around, and then down the line pull a whole egg back out of it. Maybe not the exact egg you started with, but one very similar to it. This is a situation completely unlike anything that’s come before it which is why it’s not actually covered by copyright. New laws will need to be drafted (or at a bare minimum legal judgements made) to decide how exactly this situation should be handled.
      - archomrade [he/him]
        link
        fedilink
        English
        5•1 year ago
        Someone already downvoted you but this is exactly the topic of debate surrounding this issue.
        
        Other recognized fair-use exemptions have similar interpretations: a computer model analyzes a large corpus of copyrighted work for the purposes of being able to search their contents and retrieve relevant snippets and works based on semantic and abstract similarities. The computer model that is the representation of those works for that purpose is fair use: it contains only factual information about those works. It doesn’t matter if the works used for that model were unlicensed: the model is considered fair use.
        
        AI models operate by a very similar method, albeit one with a lot more complexity. But the model doesn’t contain copyrighted works, it is only itself a collection of factual information about the copyrighted works. The novel part of this case is that it can be used to re-construct expressions very similar to the original (it should be pointed out that the fidelity is often very low, and the more detailed the output the less like the original it becomes). It isn’t settled yet if that fact changes this interpretation, but regardless I think copyright is already not the right avenue to pursue, if the goal is to remediate or prevent harm to creators and encourage novel expressions.
        
        @[email protected]
        link
        fedilink
        English
        3•
        edit-2
        1 year ago
        Right, you’re basically making the same points as me, although technically the model itself is a copyrighted work. Part of the problem we’re running into these days is that copyright, patent, trademark, and trade secret, all date from a time when the difference between those things was fairly intuitive. With our modern digital world with things like 3D printers and the ease with which you can rapidly change the formats and encodings of arbitrary pieces of data the lines all start to blur together.
        
        If you have a 3D scan of a statue of pikachu what rights are involved there? What if you print it? What if you use that model to generate a PNG? What if you print that PNG? What if you encode the model file using base64 and embed it in the middle of a gif of Rick Astley?
        
        Corporations have already utterly fucked all our IP laws, it might be time to go back to the drawing board and reevaluate the whole thing, because what we have now often feels like it has more cracks than actual substance.
        
        archomrade [he/him]
        link
        fedilink
        English
        4•1 year ago
        Yea, sorry if it wasn’t clear, but I was agreeing with you (defending against the downvote).
        
        There are a lot of things at play here, even if there seems to be a clear way to interpret copyright law (that’s untested, but still) that would determine the models being a fair use. I think people are rightfully angry/frustrated with the size of these companies building the models, and the risk posed by private ownership over them. If I were inclined to be idealistic, I would say that the models should be in the public domain and the taxes should be used so as to provide a UBI to counter any job loss/efficiencies provided by the automation, but that’s a tall order.
      - @[email protected]
        link
        fedilink
        English
        1•1 year ago
        
        derived
        
        https://www.law.cornell.edu/wex/derivative_work
        
        Copyrights allow their owners to decide how their works can be used, including creating new derivative works off of the original product. Derivative works can be created with the permission of the copyright owner or from works in the public domain. In order to receive copyright protection, a derivative work must add a sufficient amount of change to the original work.
        
        Are you just making shit up?
- @[email protected]
  link
  fedilink
  English
  3•1 year ago
  
  Do Training weights have the data?
  
  The answer to that question is extensively documented by thousands of research papers - it’s not up for debate.
  - @[email protected]
    link
    fedilink
    English
    5•1 year ago
    If someone wants to read one of those papers, I can recommend Extracting Training Data from Diffusion Models. It shouldn’t be too hard for someone with little experience in the field to be able to follow along.
- @[email protected]
  link
  fedilink
  English
  1•1 year ago
  There response well be we don’t know we can’t understand what its doing.
  - @[email protected]
    link
    fedilink
    English
    3•
    edit-2
    1 year ago
    
    There response well be we don’t know we can’t understand what its doing.
    
    What the fuck is this kind of response? Its just a fucking neural network running on GPUs with convolutional kernels. For fucks sake, turn on your damn brain.
    
    Generative AI is actually one of the easier subjects to comprehend here. Its just calculus. Use of derivatives to backpropagate weights in such a way that minimizes error. Lather-rinse-repeat for a billion iterations on a mass of GPUs (ie: 20 TFlop compute systems) for several weeks.
    
    Come on, this stuff is well understood by Comp. Sci by now. Not only 20 years ago when I learned about this stuff, but today now that AI is all hype, more and more people are understanding the basics.
    - @[email protected]
      link
      fedilink
      English
      9•1 year ago
      Understanding the math behind it doesn’t immediately mean understanding the decision progress during forward propagation. Of course you can mathematically follow it, but you’re quickly gonna lose the overview with that many weights. There’s a reason XAI is an entire subfield in Machine Learning.
      - @[email protected]
        link
        fedilink
        English
        1•1 year ago
        
        Understanding the math behind it doesn’t immediately mean understanding the decision progress during forward propagation.
        
        Ummm… its lossy compressed data from the training set.
        
        Is it a perfect copy? No. But copyright law covers “derivative data” so whatever, the law remains clear on this situation.
    - @[email protected]
      link
      fedilink
      English
      2•1 year ago
      Removed by mod

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

4.68K users / day
9.59K users / week
17.3K users / month
31.6K users / 6 months
63K subscribers
13.5K Posts
566K Comments
Modlog