L4sBotMB to

[email protected]English • 1 year ago

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.

www.nytimes.com

cross-posted to:
[email protected]

534

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.

www.nytimes.com

L4sBotMB to

[email protected]English • 1 year ago

cross-posted to:
[email protected]

Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

Chat

Jilanico
link
fedilink
English
11•1 year ago

Because this proves that the “AI”, at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

Is it tho? Honest question.
- QubaXR
  link
  fedilink
  English
  6•
  edit-2
  1 year ago
  Yes it is. Honest answer.
  - Jilanico
    link
    fedilink
    English
    4•1 year ago
    So stable diffusion, midjourney, etc., all have massive databases with every picture on the Internet stored in them? I know the AI models are trained on lots of images, but are the images actually stored? I’m skeptical, but I’m no expert.
    - QubaXR
      link
      fedilink
      English
      6•1 year ago
      These models were trained on datasets that, without compensating the authors, used their work as training material. It’s not every picture on the net, but a lot of it is scrubbing websites, portfolios and social networks wholesale.
      
      A similar situation happens with large language models. Recently Meta admitted to using illegally pirated books (Books3 database to be precise) to train their LLM without any plans to compensate the authors, or even as much as paying for a single copy of each book used.
      - Jilanico
        link
        fedilink
        English
        5•1 year ago
        Most of the stuff that inspires me probably wasn’t paid for. I just randomly saw it online or on the street, much like an AI.
        
        AI using straight up pirated content does give me pause tho.
        
        QubaXR
        link
        fedilink
        English
        4•
        edit-2
        1 year ago
        I was on the same page as you for the longest time. I cringed at the whole “No AI” movement and artists’ protest. I used the very same idea: Generations of artists honed their skills by observing the masters, copying their techniques and only then developing their own unique style. Why should AI be any different? Surely AI will not just copy works wholesale and instead learn color, composition, texture and other aspects of various works to find it’s own identity.
        
        It was only when my very own prompts started producing results I started recognizing as “homages” at best and “rip-offs” at worst that gave me a stop.
        
        I suspect that earlier generations of text to image models had better moderation of training data. As the arms race heated up and pace of development picked up, companies running these services started rapidly incorporating whatever training data they could get their hands on, ethics, copyright or artists’ rights be damned.
        
        I remember when MidJourney introduced Niji (their anime model) and I could often identify the mangas and characters used to train it. The imagery Niji produced kept certain distinct and unique elements of character designs from that training data - as a result a lot of characters exhibited “Chainsaw Man” pointy teeth and sticking out tongue - without as much as a mention of the source material or even the themes.
        
        @[email protected]
        link
        fedilink
        English
        3•1 year ago
        How much profit do you make from this stuff ?
        
        Jilanico
        link
        fedilink
        English
        1•1 year ago
        The stuff I sell on jilanico.com? Enough to make it worth my while.
      - archomrade [he/him]
        link
        fedilink
        English
        1•
        edit-2
        1 year ago
        
        These models were trained on datasets that, without compensating the authors, used their work as training material.
        
        Couple things:
        
        this doesn’t explain ops question about how the information is stored. On fact op is right, that the images and source material is NOT stored in a database within the model, it basically just stores metadata about the source material as a whole in order to construct new material from text descriptions
        
        the use of copyrighted works in the training isn’t necessarily infringing if the model is found to be a fair use, and there is a very strong fair use argument here.
        
        QubaXR
        link
        fedilink
        English
        3•1 year ago
        “metadata” is such a pretty word. How about “recipe” instead? It stores all information necessary to reproduce work verbatim or grab any aspect of it.
        
        The legal issue of copyright is a tricky one, especially in the US where copyright is often being weaponized by corporations. The gist of it is: The training model itself was an academic endeavor and therefore falls under a fair use. Companies like StabilityAI or OpenAI then used these datasets and monetized products built on them, which in my understanding skims gray zone of being legal.
        
        If these private for-profit companies simply took the same data and built their own, identical dataset they would be liable to pay the authors for use of their work in commercial product. They go around it by using the existing model, originally created for research and not commercial use.
        
        Lemmy is full of open source and FOSS enthusiasts, I’m sure someone can explain it better than I do.
        
        All in all I don’t argue about the legality of AI, but as a professional creative I highlight ethical (plagiarism) risks that are beginning to arise in majority of the models. We all know Joker, Marvel superheroes, popular Disney and WB cartoon characters - and can spot when “our” generations cross the line of copying someone else’s work. But how many of us are familiar with Polish album cover art, Brazilian posters, Chinese film superheroes or Turkish logos? How sure can we be that the work “we” produced using AI is truly original and not a perfect copy of someone else’s work? Does our ignorance excuse this second-hand plagiarism? Or should the companies releasing AI models stop adding features and fix that broken foundation first?
        
        archomrade [he/him]
        link
        fedilink
        English
        1•1 year ago
        
        “metadata” is such a pretty word. How about “recipe” instead?
        
        Well isn’t recipe another one of those pretty words? ‘Metadata’ is specific to other precedents that deal with computer programs that gather data about works (see Authors Guild, Inc. v. HathiTrust and Authors Guild v. Google), but you’re welcome to challenge the verbiage if you don’t like it. Regardless, what we’re discussing is objectively something that describes copyrighted works, not copies or a copy of the works themselves. A computer program that is very good at analyzing textual/pixelated data is still only analyzing data, it is itself a novel, non-expressive factual representation of other expressive works, and because of this, it cannot be considered as infringement on its own.
        
        It stores all information necessary to reproduce work verbatim or grab any aspect of it.
        
        This isn’t really true, at least not for the majority of works analyzed by the model, but granted. If a person uses a tool to copy the work of another person, it is the person who is doing the copying, not the tool. I think it is far more reasonable to hold an individual who uses an AI model to infringe on a copyright responsible. If someone chooses to author a work with the use of a tool that does the work for them (in part or in whole), it is more than reasonable to expect that individual to check the work that is being produced.
        
        All in all I don’t argue about the legality of AI, but as a professional creative I highlight ethical (plagiarism) risks that are beginning to arise in majority of the models.
        
        As a professional creative myself, I think this is a load of horseshit. We always hold individual authors responsible for the work that they publish, and it should be no different here. That some choose to be lazy and careless is more of a reflection of them.
        
        How sure can we be that the work “we” produced using AI is truly original and not a perfect copy of someone else’s work?
        
        If you have the words to describe a desired image/text response to the model that produce a ‘perfect copy of someone else’s work’, then we have the words to search for that work, too.
        
        Or should the companies releasing AI models stop adding features and fix that broken foundation first?
        
        How about we stop expanding the scope of an already broken copyright law and fix that broken foundation first?
- @[email protected]
  link
  fedilink
  English
  4•1 year ago
  How did the Joker image get replicated?
  - Jilanico
    link
    fedilink
    English
    4•1 year ago
    It’s too hard to type up how generative AIs work, but look up a video on “how stable diffusion works” or something like that. I seriously doubt they have a massive database with every image from the Internet inside it, with the AI just spitting those pics out, but I’m no expert.
  - @Schmidtster
    link
    English
    2•
    edit-2
    1 year ago
    deleted by creator
- @[email protected]
  link
  fedilink
  English
  3•1 year ago
  Sure, but so is your memory, you could study the originals and re-draw them a similar way.
  - Jilanico
    link
    fedilink
    English
    4•1 year ago
    I agree, but I don’t think these generative AIs actually store image files off the Internet in a massive database. I could be wrong.
    - @[email protected]
      link
      fedilink
      English
      5•
      edit-2
      1 year ago
      That’s correct. The structure of information isn’t anywhere remotely similar to a file or database. Information pixel by pixel isn’t stored, it more loosely remembers correlations and similarities and facts about the content as opposed to storing and copying it
      - @[email protected]
        link
        fedilink
        English
        2•1 year ago
        Which is also very similar to how your brain stores things.
        
        @[email protected]
        link
        fedilink
        English
        2•1 year ago
        Yeah, much more similar to the brain than a database or file anyway

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

4.68K users / day
9.59K users / week
17.3K users / month
31.6K users / 6 months
63K subscribers
13.5K Posts
566K Comments
Modlog