L4sBot@lemmy.worldMB to

Technology@lemmy.worldEnglish · 2 years ago

Stephen King: My Books Were Used to Train AI

www.theatlantic.com

243

Stephen King: My Books Were Used to Train AI

www.theatlantic.com

L4sBot@lemmy.worldMB to

Technology@lemmy.worldEnglish · 2 years ago

One prominent author responds to the revelation that his writing is being used to train artificial intelligence.

Stephen King: My Books Were Used to Train AI::One prominent author responds to the revelation that his writing is being used to coach artificial intelligence.

Chat

Drewelite
link
fedilink
English
arrow-up
8
arrow-down
3·
2 years ago
I would argue we do have a legal precedent for this sort of thing. Companies hire creatives all the time and ask them to do things in the style of other creatives. You can’t copyright a style. You don’t own what you inspire.
- IchNichtenLichten@lemmy.world
  link
  fedilink
  English
  arrow-up
  10
  arrow-down
  3·
  2 years ago
  That’s not what’s happening though. His works are being incorporated into a LLM without permission. I hope he sues the hell out of these people.
  - BetaDoggo_@lemmy.world
    link
    fedilink
    English
    arrow-up
    4·
    2 years ago
    Is that illegal though? As long as the model isn’t reproducing the original then copyright isn’t being violated. Maybe in the future there will be laws against it but as of now the grounds for a lawsuit are shaky at best.
    - IchNichtenLichten@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2·
      2 years ago
      There are already laws around what you can’t and can’t do with copyrighted material. If the owners of the LLM didn’t obtain written permission I’d say they are on very shaky ground here.
      - BetaDoggo_@lemmy.world
        link
        fedilink
        English
        arrow-up
        4·
        2 years ago
        What laws specifically? The only ones I can find refer to limits on redistribution, which isn’t happening here. If the models were able to reproduce the contents of the books that would be another issue that would need to be resolved. But I can’t find anything that would prohibit training.
        
        IchNichtenLichten@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2·
        edit-2
        2 years ago
        
        What laws specifically?
        
        Existing laws to protect copywritten material.
        
        “AI systems are “trained” to create literary, visual, and other artistic works by exposing the program to large amounts of data, which may consist of existing works such as text and images from the internet. This training process may involve making digital copies of existing works, carrying a risk of copyright infringement. As the U.S. Patent and Trademark Office has described, this process “will almost by definition involve the reproduction of entire works or substantial portions thereof.” OpenAI, for example, acknowledges that its programs are trained on “large, publicly available datasets that include copyrighted works” and that this process “necessarily involves first making copies of the data to be analyzed.” Creating such copies, without express or implied permission from the various copyright owners, may infringe the copyright holders’ exclusive right to make reproductions of their work.”
        
        https://crsreports.congress.gov/product/pdf/LSB/LSB10922
        
        BetaDoggo_@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1·
        2 years ago
        By that definition of copying Google is infringing on millions of copyrights through their search engine, and anyone viewing a copyrighted work online is also making an unauthorized copy. These companies are using data from public sources that others have access to. They are doing no more copying than a normal user viewing a webpage.
        
        IchNichtenLichten@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        3·
        2 years ago
        I don’t think so. Your comparisons aren’t really relevant. If Google scrapes a page containing copywritten material inadvertently and serves this to a user there are mechanisms to take down that content or face a lawsuit. Try posting a movie on Youtube, if a copyright holder notifies Google that content will be taken down.
        
        Training a LLM is different, that material was used to help build the model and is now a part of that product. That creates a legal liability.
  - Drewelite
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1·
    edit-2
    2 years ago
    But that is what’s happening in the minds of creatives. Reading a book and taking inspiration is functionally the same mechanism that an LLM uses to learn. They read Stephen King, they copy some part of the style. Potentially very closely and for a corporation’s gain if that’s what’s asked of them.
    - IchNichtenLichten@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2·
      2 years ago
      One person being influenced by a prose style isn’t the same as a company using a copyrighted work without permission to train a LLM.
      - Drewelite
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1·
        2 years ago
        Every learning material a company or university has ever used has been used to train an LLM. Us.
        
        Okay I’m being a bit facetious here. I know people and chat GPT aren’t equivalent. But the gap is closing. Maybe LLMs will never bridge the gap, but something will. I hesitate to write into law now that any work can never be ingested or emulated by another intelligent entity. While the difference between a machine and a human are clear to you now, one day they won’t be.
        
        The longer we hold onto the idea that our brains are somehow magically different from the way computers (are) will learn to think, the harder we’ll get blindsided by reality when they’re indistinguishable from us.
        
        IchNichtenLichten@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2·
        2 years ago
        There’s very little a LLM has in common with the human brain. We can’t do AGI yet and there’s no evidence that we will be able to create AGI any time soon.
        
        The main issue as I see it is that we have companies trying to make money by creating LLMs. The people who created the source materials for these LLMs are not only not getting paid, they’re not even being asked permission. To me that’s dead wrong and I hope the courts agree.
        
        Drewelite
        link
        fedilink
        English
        arrow-up
        4·
        edit-2
        2 years ago
        I agree AGIs aren’t going to happen soon. But it sounds like we agree they WILL happen. LLMs do have one important thing in common with humans, their output is transformative based on what they learn.
        
        I think what you take issue with is the scale. People wouldn’t care if this was something that existed on one computer somewhere. Where someone could type, “Write me a spooky story about Top Ramen in the style of Stephen King”. It’s that anyone can get a story in Stephen Kings style when all OpenAI had to do is buy a couple digital copies of Cujo. However, no one is upset that James Cameron bought one ticket to Pocahontas and thought, “What if that were on another planet?”. But 400 million people saw that movie.
        
        People want to protect creatives buy casting a net over machines saying they can’t use the works of artists, even when transforming them, without payment to the original creator. While that sounds like it makes sense now, what happens when the distinction between human and machine disappears? That net will be around us too. Corporations will just use this to empower their copyright rule even further.
        
        Stephen King was largely inspired by Ray Bradbury and H.P. Lovecraft. I doubt he paid them beyond the original price of a couple books.
        
        BTW thanks for the thought provoking conversation. None of my friends care about this stuff 😅
        
        IchNichtenLichten@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1·
        2 years ago
        You’re welcome!
        
        I think we disagree in that you’re concerned with something that may happen at some time in the future and how we legislate this event but I see that as a job for future politicians and lawyers.
        
        Right now, I see is a bunch of tech bros with the “move fast and break things” mindset, running around all over the internet and sweeping up data, with no thought to where it came from, if it’s legally protected material, sensitive material, or if this might bite them in the ass in the very near future.
        
        I’ve seen this story before.

Technology@lemmy.world

technology@lemmy.world

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

3.51K users / day
9.38K users / week
19.8K users / month
35.8K users / 6 months
313 local subscribers
68.6K subscribers
14.5K Posts
591K Comments
Modlog