L4sBotMB to

[email protected]English • 1 year ago

GenAI tools ‘could not exist’ if firms are made to pay copyright

www.computerweekly.com

398

GenAI tools ‘could not exist’ if firms are made to pay copyright

www.computerweekly.com

L4sBotMB to

[email protected]English • 1 year ago

GenAI tools ‘could not exist’ if firms are made to pay copyright | Computer Weekly

www.computerweekly.com

Artificial intelligence firm Anthropic hits out at copyright lawsuit filed by music publishing corporations, claiming the content ingested into its models falls under ‘fair use’ and that any licensing regime created to manage its use of copyrighted material in training data would be too complex and costly to work in practice

GenAI tools ‘could not exist’ if firms are made to pay copyright::undefined

Chat

Valen
link
fedilink
English
152•1 year ago
So they’re admitting that their entire business model requires them to break the law. Sounds like they shouldn’t exist.
- @[email protected]
  link
  fedilink
  English
  61•
  edit-2
  1 year ago
  Removed by mod
  - @[email protected]
    link
    fedilink
    English
    6•1 year ago
    The Kit Walsh article purposefully handwaves around a couple of issues that could present larger issues as law suits in this arena continue.
    
    He says that due to the size of training data and the model, only a byte of data per image could be stored in any compressed format, but this assumes all training data is treated equally. It’s very possible certain image artifacts are compressed/stored in the weights more than other images.
    
    These models don’t produce exact copies. Beyond the Getty issue, nytimes recently released an article about a near duplicate - https://www.nytimes.com/interactive/2024/01/25/business/ai-image-generators-openai-microsoft-midjourney-copyright.html.
    
    I think some of the points he makes are valid, but they’re making a lot of assumptions about what is actually going on in these models which we either don’t know for certain or have evidence to the contrary.
    
    I didn’t read Katherine’s article so maybe there is something more there.
    - @[email protected]
      link
      fedilink
      English
      5•
      edit-2
      1 year ago
      Removed by mod
      - @[email protected]
        link
        fedilink
        English
        4•1 year ago
        I’m not sure she does, just read the article and it focuses primarily what models can train on. However, the real meat of the issue, at least I think, with GenAI is what it produces.
        
        For example, if I built a model that just spit out exact frames from “Space Jam”, I don’t think anyone would argue that would be a problem. The question is where is the line?
        
        @[email protected]
        link
        fedilink
        English
        4•
        edit-2
        1 year ago
        Removed by mod
        
        @[email protected]
        link
        fedilink
        English
        3•1 year ago
        This goes back to my previous comment of handwaving away the details. There is a model out there that clearly is reproducing copyrighted materials almost identically (nytimes article), we also have issues with models spitting out training data https://www.wired.com/story/chatgpt-poem-forever-security-roundup/. Clearly people studying these models don’t fully know what is actually possible.
        
        Additionally, it only takes one instance to show that these models, in general, can and do have issues with regurgitating copyrighted data. Whether that passes the bar for legal consequences we’ll have to see, but i think it’s dangerous to take a couple of statements made by people who don’t seem to understand the unknowns in this space at face value.
        
        @[email protected]
        link
        fedilink
        English
        3•1 year ago
        Removed by mod
- @[email protected]
  link
  fedilink
  English
  51•1 year ago
  Reproduction of copyrighted material would be breaking the law. Studying it and using it as reference when creating original content is not.
  - Optional
    link
    fedilink
    English
    25•1 year ago
    humans studying it, is fair use.
    - @[email protected]
      link
      fedilink
      English
      22•1 year ago
      So if a tool is involved, it’s no longer ok? So, people with glasses cannot consume copyrighted material?
      - @[email protected]
        link
        fedilink
        English
        5•1 year ago
        No. A tool already makes it unnatural. /S
    - @[email protected]
      link
      fedilink
      English
      11•1 year ago
      Copyright can only be granted to works created by a human, but I don’t know of any such restriction for fair use. Care to share a source explaining why you think only humans are able to use fair use as a defense for copyright infringement?
      - @[email protected]
        link
        fedilink
        English
        4•1 year ago
        Because a human has to use talent+effort to make something that’s fair use. They adapt a product into something that while similar is noticeably different. AI will
        
        make things that are not just similar but not noticeably different.
        
        There’s not an effort in creation. There’s human thought behind a prompt but not on the AI following it.
        
        If allowed to AI companies will basically copyright everything…
        
        @[email protected]
        link
        fedilink
        English
        6•1 year ago
        Your reply has nothing to do with fair use doctrine.
        
        @[email protected]
        link
        fedilink
        English
        4•1 year ago
        You are aware of the insane amounts of research, human effort and the type of human talent that is required to make a simple piece of software, let alone a complex artificial neural network model whose function is to try and solve whatever stuff…right?
        
        @[email protected]
        link
        fedilink
        English
        3•1 year ago
        And that is human effort, not the AIs.
        
        @[email protected]
        link
        fedilink
        English
        2•1 year ago
        Good point. I say the software can be copywrite protected, but not the content the program generates.
    - @[email protected]
      link
      fedilink
      English
      8•1 year ago
      Removed by mod
    - @[email protected]
      link
      fedilink
      English
      5•1 year ago
      I don’t agree. The publisher of the material does not get to dictate what it is used for. What are we protecting at the end of the day and why?
      
      In the case of a textbook, someone worked hard to explain certain materials in a certain way to make the material easily digestible. They produced examples to explain concepts. Reproducing and disseminating that material would be unfair to the author who worked hard to produce it.
      
      But the author does not have jurisdiction over the knowledge gained. They cannot tell the reader that they are forbidden from using the knowledge gained to tutor another person in calculus. That would be absurd.
      
      IP law protects the works of the creator. The author of a calculus textbook did not invent calculus. As such, copyright law does not apply.
  - @[email protected]
    link
    fedilink
    English
    11•1 year ago
    
    Reproduction of copyrighted material would be breaking the law. Studying it and using it as reference when creating original content is not.
    
    I’m curious why we think otherwise when it is a student obtaining an unauthorized copy of a textbook to study, or researchers getting papers from sci-hub. Probably because it benefits corporations and they say so?
    - @[email protected]
      link
      fedilink
      English
      9•1 year ago
      While I would like to be in a world where knowledge is free, this is apples and oranges.
      
      OpenAI can purchase a textbook and read it. If their AI uses the knowledge gained to explain maths to an individual, without reproducing the original material, then there’s no issue.
      
      The difference is the student in your example didn’t buy their textbook. Someone else bought it and reproduced the original for others to study from.
      
      If OpenAI was pirating textbooks, that would be a wholly separate issue.
      - @[email protected]
        link
        fedilink
        English
        2•1 year ago
        I agree that the issues
        
        whether AI output are derivative works of its input, and
        
        whether input to AI is fair use and requires no compensation
        
        are separate, but I think they are related, in that AI companies are trying to impose whatever interpretation of copyright that is convenient to them to the rest of the society.
        
        And indeed Meta pirated books to feed its AI.
        
        https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html
      - @[email protected]
        link
        fedilink
        English
        2•1 year ago
        I was under the impression they mentioned at some point torrenting things
        
        @[email protected]
        link
        fedilink
        English
        1•1 year ago
        Don’t know about OpenAI, but Meta used pirated books to train its AI.
        
        https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html
- @[email protected]
  link
  fedilink
  English
  31•1 year ago
  It doesn’t break the law at all. The courts have already ruled that copyrighted material can be fed into AI/ML models for training:
  
  https://towardsdatascience.com/the-most-important-supreme-court-decision-for-data-science-and-machine-learning-44cfc1c1bcaf
  - @[email protected]
    link
    fedilink
    English
    17•1 year ago
    This ruling only applies to the 2nd Circuit and SCOTUS has yet to take up a case. As soon as there’s a good fact pattern for the Supreme Court of a circuit split, you’ll get nationwide information. You’ll also note that the decision is deliberately written to provide an extremely narrow precedent and is likely restricted to Google Books and near-identical sources of information.
    - @[email protected]
      link
      fedilink
      English
      6•1 year ago
      Have there been any US ruling stating something along the lines of “The training of general purpose LLMs and/or image generation AIs does not qualify as fair use,” even in a lower court?
    - @[email protected]
      link
      fedilink
      English
      2•1 year ago
      Hell, that article is also all about Google Books, which is an entirely different beast from generative AI. One of the key points from the circuit judge was that Google Books’ use of copyrighted material “…[maintains] respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders.” The appeals court, in upholding the ruling that Google Books’ use of copyrighted content is fair use, ruled “the revelations do not provide a significant market substitute for the protected aspects of the originals.”
      
      If you think that gen AI doesn’t provide a significant market substitute for the artwork created by the artists and authors used to train these models, or that it doesn’t adversely impact their rights, then you’re utterly delusional.
- @[email protected]
  link
  fedilink
  English
  21•1 year ago
  You might want to read this post from one of the EFF’s senior lawyers on the topic who has previously litigated IP cases:
  
  https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0
- @[email protected]
  link
  fedilink
  English
  13•1 year ago
  I guess I can’t read anything and learn from it.
- iquanyin
  link
  fedilink
  English
  2•1 year ago
  i don’t think it’s need rules against the law…

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

3.08K users / day
9.36K users / week
17.1K users / month
31.6K users / 6 months
63K subscribers
13.5K Posts
566K Comments
Modlog