Lee Duna@lemmy.nz to Technology@lemmy.worldEnglish · 2 years ago

Meta admits using pirated books to train AI, but won't pay for it

www.techspot.com

cross-posted to:
[email protected]

832

Meta admits using pirated books to train AI, but won't pay for it

www.techspot.com

Lee Duna@lemmy.nz to Technology@lemmy.worldEnglish · 2 years ago

cross-posted to:
[email protected]

A group of authors filed a lawsuit against Meta, alleging the unlawful use of copyrighted material in developing its Llama 1 and Llama 2 large language models....

Chat

Snot Flickerman@lemmy.blahaj.zone
link
fedilink
English
arrow-up
188
arrow-down
6·
edit-2
2 years ago
Removed by mod
- kibiz0r@lemmy.world
  link
  fedilink
  English
  arrow-up
  48·
  2 years ago
  
  The main reason they were able to prosecute TPB admins was the claim they were making money.
  
  I think in the Darknet Diaries episode about TPB, the guy said they never even made enough off of ads to pay for the server costs.
  - Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    32·
    edit-2
    2 years ago
    Removed by mod
- Dr. Moose@lemmy.world
  link
  fedilink
  English
  arrow-up
  21
  arrow-down
  11·
  edit-2
  2 years ago
  They’re the same issue tho. Piracy and using books for corporate AI training both should be fine. The same people going after data freedom are pushing this AI drama too. There’s too much money in copyright holding and it’s not being held by your favorite deviantart artists.
  - kibiz0r@lemmy.world
    link
    fedilink
    English
    arrow-up
    53
    arrow-down
    5·
    2 years ago
    It’s not the same issue at all.
    
    Piracy distributes power. It allows disenfranchised or marginalized people to access information and participate in culture, no matter where they live or how much money they have. It subverts a top-down read-only culture by enabling read-write access for anyone.
    
    Large-scale computing services like these so-called AIs consolidate power. They displace access to the original information and the headwaters of culture. They are for-profit services, tuned to the interests of specific American companies. They suppress read-write channels between author and audience.
    
    One gives power to the people. One gives power to 5 massive corporations.
    - Snot Flickerman@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      24
      arrow-down
      2·
      edit-2
      2 years ago
      Removed by mod
    - Dr. Moose@lemmy.world
      link
      fedilink
      English
      arrow-up
      9·
      edit-2
      2 years ago
      It’s the opposite. Closing down public resources would be regulatory capture and that would be consolidation of power.
      
      Who do you think can afford to pay billions in copyright to produce models? Only mega corporations and pirates. No more small AI companies. No more open source models.
    - archomrade [he/him]@midwest.social
      link
      fedilink
      English
      arrow-up
      7·
      2 years ago
      I wish we could be talking about the power imbalances of corporate bodies exercised through the use of capital ownership, instead of squabbling about how that differential is manifested through a specific act of piracy.
      
      The reason we view acts of piracy different when they are committed by corporate bodies is because of the power of their capital, not because the act itself is any different. The issue with Meta and OpenAI using pirated data in the production of LMM’s is that they maintain ownership of the final product to be profited from, not that the LMM comes to exist in the first place (even if it is through questionable means). Had they come to create these models from data that they already owned (I need not remind you that they have already claimed their right to a truly sickening amount of it, without having paid a cent), their profiting from it wouldn’t be any less problematic - LLM’s will still undermine the security of the working class and consolidate wealth into fewer and fewer hands. If we were to apply copyright here as it’s being advocated, nothing fundamental will change in that dynamic; in fact, it will only reinforce the basis of that power imbalance (ownership over capital being the primary vehicle) and delay the inevitable (continued consolidation).
      
      If you’re really concerned with these corporations growing larger and their influence spreading further, then you should be directing your efforts at disrupting that vehicle of influence, not legitimizing it. I understand there’s an enraging double-standard at play here, but the solution isn’t to double down on private ownership, it should be to undermine and seize it for common ownership so that everyone benefits from the advancement.
    - Flying Squid@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1·
      2 years ago
      I wonder if piracy could even benefit these corporations in the long term? Do people who pirate games and movies in their teens and twenties frequently go on to purchase such things when they’re older? I honestly don’t know, but I would love to see a study. I certainly have seen people make that claim.
      - Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        2·
        2 years ago
        Removed by mod
        
        Flying Squid@lemmy.world
        link
        fedilink
        English
        arrow-up
        2·
        2 years ago
        There you go. Piracy helps. I’m sure game companies and TV producers and so on feel the same way quite often. People who pirate are free marketing for them because they’ll tell other people about the product.
        
        Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3·
        edit-2
        2 years ago
        Removed by mod
  - Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    1·
    edit-2
    2 years ago
    Removed by mod
    - SlopppyEngineer@lemmy.world
      cake
      link
      fedilink
      English
      arrow-up
      13·
      2 years ago
      To me it always seems to come back to nobility. Big corpo is the new nobility and they have certain privileges not available to the common folk. In theory it shouldn’t exist but in practice it most certainly does.
      - Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        14
        arrow-down
        2·
        edit-2
        2 years ago
        Removed by mod
    - General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1·
      2 years ago
      
      So why are Meta, and say, Sci-Hub treated so differently?
      
      They are not. Meta is being sued, just like Sci-Hub was sued. So, one difference is that the suit involving Meta is still ongoing.
      
      In any case, Meta did not create the dataset. IDK if they even shared it. The researcher who did is also being sued. The dataset has been taken down in response to a copyright complaint. IDK if it is available anywhere anymore. So the dataset was treated just like Sci-Hub. The sharing of the copyrighted material was stopped.
      
      Meta downloading these books for AI training seems fairly straight-forward fair use to me. I don’t see how what Meta did is anything like what Sci-Hub did.
      - Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        8·
        edit-2
        2 years ago
        Removed by mod
        
        General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        3·
        2 years ago
        ISPs may block sites to prevent unauthorized copying. It’s not a punishment for past wrong-doing. I’m not sure about the details, I think this differs a lot between jurisdictions. But basically, as ISPs they are involved in the unauthorized act of copying. Their servers copy the data to the end user/customer. So, they may be on the hook for infringement themselves if they don’t act.
        
        Again, I am not aware of Meta sharing the copyrighted books in question. So, I don’t know what the legal basis for blocking Meta would be. If ISPs block a site without a legal basis, they are probably on the hook for breach of contract.
        
        IDK on what basis the sharing of Meta’s LLMs could be stopped. If anyone could claim copyright it would be Meta itself and they allow sharing them. (I have doubts if AI models are copyrightable under current US law.)
        
        Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2·
        edit-2
        2 years ago
        Removed by mod
        
        General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        4·
        2 years ago
        I expect ISPs would get into a lot of legal trouble if they did.
        
        The NYT sued OpenAI and MS. a) That doesn’t involve Meta. b) It’s a claim by the NYT.
        
        Why should ISPs deny their paying customers access to Meta sites or sites hosting LLMs released by Meta? These customers have contracts with their service providers. On what grounds, would ISPs be in the right to stop providing these internet services?
        
        Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3·
        2 years ago
        Removed by mod
      - antonim@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        3·
        2 years ago
        
        Meta downloading these books for AI training seems fairly straight-forward fair use to me.
        
        They pirated the books. Is that not legally relevant?
        
        General_Effort@lemmy.world
        link
        fedilink
        English
        arrow-up
        2·
        2 years ago
        “Straight-forward” may be too strong regarding these books. If they inadvertently picked up unauthorized copies while scraping the web, that would definitely not be a problem. That’s what search engines do.
        
        The question is if it is a problem that the researchers knowingly downloaded these copyrighted texts. Owners don’t seem to go after downloaders. IDK if there is case law establishing that the mere act of downloading copyrighted material is infringement. I don’t think there’s anything to suggest that knowing about the copyright status should make a difference in civil law.
        
        In any case, researchers must be able to share copyrighted material, not just for AI training but also any other purpose that needs it. If this is not fair use, then common crawl may not be fair use either. IDK if there is case law regarding the sharing of copyrighted materials as research material, rather than for their content. But I find it hard to see how it could not be fair use, as the alternative would be extremely destructive. So even if the download would normally be infringement, I doubt that it is in this case.
        
        Eventually, we are only talking about a single copy of each book. So, even if researchers were forced to purchase these books, all of AI training would yield only a few extra sales for each title. The benefit to the owners would be very small in relation to the damage to the public.
- The Hobbyist@lemmy.zip
  cake
  link
  fedilink
  English
  arrow-up
  9
  arrow-down
  3·
  2 years ago
  Perhaps I’m misunderstanding, but it sounds like you’re suggesting we side with Meta to put a precedence in which pirating content is legal and allows websites like TPB to keep existing but legitimally? Or are you rather taking the opposite stand, which would further entrench the illegality of TPB activities and in the same swoop prevent meta from performing these actions?
  
  I don’t know if we can simultaneously oppose meta while protecting TPB, is there?
  - Tedrow@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    2·
    2 years ago
    I think what they are saying is that Meta is powerful enough to get away with it. You are attempting to equate two different things.
    
    Meta isn’t using the books for entertainment purposes. They are using another IP to develop their own product. There has to be a distinction here.
    - The Hobbyist@lemmy.zip
      cake
      link
      fedilink
      English
      arrow-up
      5·
      2 years ago
      We are in agreement, but I was attempting to launch a discussion about how we want the laws to actually be applied and possibly how they should be reformulated.
  - Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    3·
    edit-2
    2 years ago
    Removed by mod
    - The Hobbyist@lemmy.zip
      cake
      link
      fedilink
      English
      arrow-up
      2·
      2 years ago
      Of course we should have consistent laws, but which way should we have it? We can either defend pirates and Meta, or none of them, so what are you saying? Unless there’s a third option I’m missing?
      - Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        3·
        edit-2
        2 years ago
        Removed by mod
- Flying Squid@lemmy.world
  link
  fedilink
  English
  arrow-up
  5
  arrow-down
  1·
  2 years ago
  
  “To the extent a response is deemed required, Meta denies that its use of copyrighted works to train Llama required consent, credit, or compensation,” Meta writes.
  
  Cool, so I can train my AI on Facebook and Instagram posts and you’re fine if I don’t consent, credit or compensate you either, right Meta? It’s not even copyrighted in the first place, so you shouldn’t have a single complaint.
- The Barto@sh.itjust.works
  link
  fedilink
  English
  arrow-up
  4·
  1 year ago
  One of the founders of The Pirate Bay.
- yesdogishere@kbin.social
  link
  fedilink
  arrow-up
  3·
  1 year ago
  The only solution is vigilante justice. Bezos and all the directors and snr execs. Bring them all to justice. Exile to Mars.

Technology@lemmy.world

technology@lemmy.world

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

2.24K users / day
7.75K users / week
16.3K users / month
37.1K users / 6 months
342 local subscribers
72.8K subscribers
15.5K Posts
629K Comments
Modlog