OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • RadialMonster
    link
    fedilink
    English
    2011 months ago

    what if they scraped a whole lot of the internet, and those excerpts were in random blogs and posts and quotes and memes etc etc all over the place? They didnt injest the material directly, or knowingly.

    • @[email protected]
      link
      fedilink
      English
      311 months ago

      Not knowing something is a crime doesn’t stop you from being prosecuted for committing it.

      It doesn’t matter if someone else is sharing copyright works and you don’t know it and use it in ways that infringes on that copyright.

      “I didn’t know that was copyrighted” is not a valid defence.

      • @[email protected]
        link
        fedilink
        English
        111 months ago

        Is reading a passage from a book actually a crime though?

        Sure, you could try to regenerate the full text from quotes you read online, much like you could open a lot of video reviews and recreate larger portions of the original text, but you would not blame the video editing program for that, you would blame the one who did it and decided to post it online.

      • @[email protected]
        link
        fedilink
        English
        611 months ago

        If training models are considered IP then shouldn’t we allow other training models to view and learn from the competition? If learning from other IPs that are copywritten is okay, why should the training models be treated different?