[email protected]English • 1 year ago

Meta admits using pirated books to train AI, but won't pay for it

www.techspot.com

cross-posted to:
[email protected]

832

Meta admits using pirated books to train AI, but won't pay for it

www.techspot.com

[email protected]English • 1 year ago

cross-posted to:
[email protected]

A group of authors filed a lawsuit against Meta, alleging the unlawful use of copyrighted material in developing its Llama 1 and Llama 2 large language models....

Chat

Snot Flickerman
link
fedilink
English
14•
edit-2
1 year ago
Removed by mod
- @[email protected]
  link
  fedilink
  English
  13•1 year ago
  To me it always seems to come back to nobility. Big corpo is the new nobility and they have certain privileges not available to the common folk. In theory it shouldn’t exist but in practice it most certainly does.
  - Snot Flickerman
    link
    fedilink
    English
    12•
    edit-2
    1 year ago
    Removed by mod
- @[email protected]
  link
  fedilink
  English
  2•1 year ago
  
  So why are Meta, and say, Sci-Hub treated so differently?
  
  They are not. Meta is being sued, just like Sci-Hub was sued. So, one difference is that the suit involving Meta is still ongoing.
  
  In any case, Meta did not create the dataset. IDK if they even shared it. The researcher who did is also being sued. The dataset has been taken down in response to a copyright complaint. IDK if it is available anywhere anymore. So the dataset was treated just like Sci-Hub. The sharing of the copyrighted material was stopped.
  
  Meta downloading these books for AI training seems fairly straight-forward fair use to me. I don’t see how what Meta did is anything like what Sci-Hub did.
  - Snot Flickerman
    link
    fedilink
    English
    8•
    edit-2
    1 year ago
    Removed by mod
    - @[email protected]
      link
      fedilink
      English
      3•1 year ago
      ISPs may block sites to prevent unauthorized copying. It’s not a punishment for past wrong-doing. I’m not sure about the details, I think this differs a lot between jurisdictions. But basically, as ISPs they are involved in the unauthorized act of copying. Their servers copy the data to the end user/customer. So, they may be on the hook for infringement themselves if they don’t act.
      
      Again, I am not aware of Meta sharing the copyrighted books in question. So, I don’t know what the legal basis for blocking Meta would be. If ISPs block a site without a legal basis, they are probably on the hook for breach of contract.
      
      IDK on what basis the sharing of Meta’s LLMs could be stopped. If anyone could claim copyright it would be Meta itself and they allow sharing them. (I have doubts if AI models are copyrightable under current US law.)
      - Snot Flickerman
        link
        fedilink
        English
        3•
        edit-2
        1 year ago
        Removed by mod
        
        @[email protected]
        link
        fedilink
        English
        4•1 year ago
        I expect ISPs would get into a lot of legal trouble if they did.
        
        The NYT sued OpenAI and MS. a) That doesn’t involve Meta. b) It’s a claim by the NYT.
        
        Why should ISPs deny their paying customers access to Meta sites or sites hosting LLMs released by Meta? These customers have contracts with their service providers. On what grounds, would ISPs be in the right to stop providing these internet services?
        
        Snot Flickerman
        link
        fedilink
        English
        3•1 year ago
        Removed by mod
        
        @[email protected]
        link
        fedilink
        English
        2•1 year ago
        
        but that does not mean that they couldn’t.
        
        IDK why you believe this. Breaking contracts is illegal. You get sued and have to pay damages. Some contracts, in some jurisdictions, may allow such arbitrary decisions. In other jurisdictions such clauses may be unenforceable.
        
        altruistic groups
        
        Well, that’s not something that copyright law cares about very much. Unfortunately, this community seems very pro-copyright; very Ayn Rand even. You’re not likely to get much agreement for any sensible reforms; quite the opposite. I don’t think arguing that Meta is doing the same as TPB is going to win anyone over. It’s more likely to get people here to call for more onerous and more harmful IP laws.
        
        Both Meta and ChatGPT used books3, it’s functionally the same type of case.
        
        FWIW, no. the NYT case and this is different in some crucial ways.
  - @[email protected]
    link
    fedilink
    English
    3•1 year ago
    
    Meta downloading these books for AI training seems fairly straight-forward fair use to me.
    
    They pirated the books. Is that not legally relevant?
    - @[email protected]
      link
      fedilink
      English
      2•1 year ago
      “Straight-forward” may be too strong regarding these books. If they inadvertently picked up unauthorized copies while scraping the web, that would definitely not be a problem. That’s what search engines do.
      
      The question is if it is a problem that the researchers knowingly downloaded these copyrighted texts. Owners don’t seem to go after downloaders. IDK if there is case law establishing that the mere act of downloading copyrighted material is infringement. I don’t think there’s anything to suggest that knowing about the copyright status should make a difference in civil law.
      
      In any case, researchers must be able to share copyrighted material, not just for AI training but also any other purpose that needs it. If this is not fair use, then common crawl may not be fair use either. IDK if there is case law regarding the sharing of copyrighted materials as research material, rather than for their content. But I find it hard to see how it could not be fair use, as the alternative would be extremely destructive. So even if the download would normally be infringement, I doubt that it is in this case.
      
      Eventually, we are only talking about a single copy of each book. So, even if researchers were forced to purchase these books, all of AI training would yield only a few extra sales for each title. The benefit to the owners would be very small in relation to the damage to the public.

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

4.14K users / day
9.53K users / week
16.7K users / month
31K users / 6 months
62K subscribers
13.2K Posts
557K Comments
Modlog