OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • @[email protected]
    link
    fedilink
    English
    38
    edit-2
    1 year ago

    I think a lot of people are not getting it. AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour. Similar to how using copyrighted clips in a monetized video can make you get a strike against your channel but if the video is not monetized, the chances of YouTube taking action against you is lower.

    Edit - If this was an open source model available for use by the general public at no cost, I would be far less bothered by claims of copyright infringement by the model

    • @[email protected]
      link
      fedilink
      English
      251 year ago

      AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour.

      And does this apply equally to all artists who have seen any of my work? Can I start charging all artists born after 1990, for training their neural networks on my work?

      Learning is not and has never been considered a financial transaction.

      • @[email protected]
        link
        fedilink
        English
        5
        edit-2
        1 year ago

        Actually, it has. The whole consept of copyright is relatively new, and corporations absolutely tried to have people who learned proprietary copyrighted information not be able to use it in other places.

        It’s just that labor movements got such non-compete agreements thrown out of our society, or at least severely restricted on humanitarian grounds. The argument is that a human being has the right to seek happiness by learning and using the proprietary information they learned to better their station. By the way, this needed a lot of violent convincing that we have this.

        So yes, knowledge and information learned is absolutely withing the scope of copyright as it stands, it’s only that the fundamental rights that humans have override copyright. LLMs (and companies for that matter) do not have such fundamental rights.

        Copyright by the way is stupid in its current implementation, but OpenAI and ChatGPT does not get to get out of it IMO just because it’s “learning”. We humans ourselves are only getting out of copyright because of our special legal status.

          • @[email protected]
            link
            fedilink
            English
            11 year ago

            https://en.wikipedia.org/wiki/Fair_use

            Fair use only works if what you create is to reflect on the original and not to supercede it. For example if ChatGPT gobbled up a work on the reproduction of firefies, if you ask it a question about the topic and it just answers, that’s not fair use since you made the original material redundant. If it did what a search engine would do and just tell you that “here’s where you can find it, you might have to pay for it”, that’s fair use. This is of course US law, so it may be different everywhere, and US law is weird so the courts may say anything.

            That’s the gist of it, fair use is fine as long as you are only creating new information and only use the copyrighted old work as is absolutely necessary for your new information to make sense, and even then, you can’t use so much of the copyrighted work that it takes away from the value of it.

            Otherwise if I pirated a movie and put subtitles on it, I could argue it’s fair use since it’s new information and transformative. If I released the subtitles separately, that would be a strong argument for fair use. If I included a 10 sec clip in it to show my customers what the thing is like in action, then that may be argued. If it’s the pivotal 10 seconds that spoils the whole movie, that’s not fair use, since I took away from the value of the original.

            ChatGPT ate up all of these authors’ works and for some, it may take away from the value they have created. It’s telling that OpenAI is trying to be shifty about it as well. If they had a strong argument, they’d want to settle it as soon as possibe as this is a big stormcloud on their company IP value. And yeah it sucks that people created something that may turn out to not be legal because some people have a right to profit from some pieces of capital assets, but that’s the story of the world the past 50 years.

              • @[email protected]
                link
                fedilink
                English
                1
                edit-2
                1 year ago

                I am not a lawyer by the way, I don’t even live in the US, so what I write is just my opinion.

                But fair use seems a ridiculous defense when we talk about the Github Copilot case, which is the first tangible lawsuit about it that I know of. The plaintiffs lay out the case of a book for Javascript developers as their example. The objective of the book to give you excercises in Javascript development, I would get the book if I wanted to do Javascript excercises. The book is copyrighted under a share-alike attribution required licence. The defendants Github and OpenAI don’t honour the license with Copilot and Codex. They claim fair use.

                So with the four factors:

                • the purpose and character of your use: .Well, they present their Javascript excercises as original work while it’s obvious they are not, they are reproducing the task they want letter by letter. It is even missing critical context that makes it hard to understand without the book, so their work does not even stand on its own. Also, they do this for monetary compensation, while not respecting the original license, which if someone was giving a commentary or criticism covered by fair use, would be as trivial as providing a citation of the book. They are also not producing information beyond what’s available in the book. Quite funnily, the plaintiffs mention that the “derivative” work is also not quite valuable, as the model answered with an example from a “what’s wrong with this, can you fix it?” section for a question about how to determine if a number is even.

                • the nature of the copyrighted work: It’s freely available, the licence only requires if you republish it, you should provide proper attribution. It is not impossible to provide use cases based on fair use while honouring the license. There is no monetary or other barrier.

                • the amount and substantiality of the portion taken: All of it, and it is reproduced verbatim.

                • the effect of the use upon the potential market: Github Copilot is in the same market as the original work and is competing with it, namely in showing people how to use Javascript.

                And again, I feel this is one layer. Copyright enforcement has never been predictable, and US courts are not predictable either. I think anything can come of this now that it’s big tech that is on the defendant side, and they have the resources to fight, not like random Joe Schmoes caught with bootleg DVDs. Maybe they abolish copyright? Maybe they get an exception? Since US courts have such wide jurisdiction and can effectively make laws, it is still a toss-up. That said, the Github Copilot class action case is the one to watch, and so far, the judge denied orders to dismiss the case, so it may go either way.

                Also by the way, the EU has no fair use protections, it only allows very specific exceptions for public criticism and such, none of which fits AI. Going by the example of Copilot, this would mean that EU users can’t use Copilot, and also that anything that was produced with the assistance of Copilot (or ChatGPT for that matter) is not marketable in the EU.

      • @[email protected]
        link
        fedilink
        English
        21 year ago

        Ehh, “learning” is doing a lot of lifting. These models “learn” in a way that is foreign to most artists. And that’s ignoring the fact the humans are not capital. When we learn we aren’t building a form a capital; when models learn they are only building a form of capital.

        • @[email protected]
          link
          fedilink
          English
          51 year ago

          Artists, construction workers, administrative clerks, police and video game developers all develop their neural networks in the same way, a method simulated by ANNs.

          This is not, “foreign to most artists,” it’s just that most artists have no idea what the mechanism of learning is.

          The method by which you provide input to the network for training isn’t the same thing as learning.

          • @[email protected]
            link
            fedilink
            English
            31 year ago

            Artists, construction workers, administrative clerks, police and video game developers all develop their neural networks in the same way, a method simulated by ANNs.

            Do we know enough about how our brain functions and how neural networks functions to make this statement?

            • @[email protected]
              link
              fedilink
              English
              21 year ago

              Do we know enough about how our brain functions and how neural networks functions to make this statement?

              Yes, we do. Take a university level course on ML if you want the long answer.

              • @[email protected]
                link
                fedilink
                English
                11 year ago

                My friends who took computer science told me that we don’t totally understand how machine learning algorithms work. Though this conversation was a few years ago in college. Will have to ask them again

          • @[email protected]
            link
            fedilink
            English
            21 year ago

            ANNs are not the same as synapses, analogous yes, but different mathematically even when simulated.

            • @[email protected]
              link
              fedilink
              English
              11 year ago

              This is orthogonal to the topic at hand. How does the chemistry of biological synapses alone result in a different type of learned model that therefore requires different types of legal treatment?

              The overarching (and relevant) similarity between biological and artificial nets is the concept of connectionist distributed representations, and the projection of data onto lower dimensional manifolds. Whether the network achieves its final connectome through backpropagation or a more biologically plausible method is beside the point.

        • @[email protected]
          link
          fedilink
          English
          11 year ago

          When we learn we aren’t building a form a capital; when models learn they are only building a form of capital.

          What do you think education is? I went to university to acquire knowledge and train my skills so that I could later be paid for those skills. That was literally building my own human capital.

    • @[email protected]
      link
      fedilink
      English
      151 year ago

      But wouldn’t this training and the subsequent output be so transformative that being based on the copyrighted work makes no difference? If I read a Harry Potter book and then write a story about a boy wizard who becomes a great hero, anyone trying to copyright strike that would be laughed at.

      • @[email protected]
        link
        fedilink
        English
        6
        edit-2
        1 year ago

        Your probability of getting copyright strike depends on two major factors -

        • How similar your story is to Harry Potter.

        • If you are making money of that story.

        • @[email protected]
          link
          fedilink
          English
          41 year ago

          It doesn’t matter how similar. Copyright doesn’t protect meaning, copyright protect form. If you read HP and then draw a picture of it, said picture becomes its separate work, not even derivative.

    • 1ird
      link
      fedilink
      English
      10
      edit-2
      1 year ago

      How is it any different from someone reading the books, being influenced by them and writing their own book with that inspiration? Should the author of the original book be paid for sales of the second book?

      • @[email protected]
        link
        fedilink
        English
        71 year ago

        Again that is dependent on how similar the two books are. If I just change the names of the characters and change the grammatical structure and then try to sell the book as my own work, I am infringing the copyright. If my book has a different story but the themes are influenced by another book, then I don’t believe that is copyright infringement. Now where the line between infringement and no infringement lies is not something I can say and is a topic for another discussion

        • @[email protected]
          link
          fedilink
          English
          1
          edit-2
          1 year ago

          change the grammatical structure

          I.e. change form. Copyright protect form, thus in coutries that judge either by spirit or letter of law instead of size of moneybags this is ok.

    • Affine Connection
      link
      fedilink
      English
      91 year ago

      using copyrighted clips in a monetized video can make you get a strike against your channel

      Much of the time, the use of very brief clips is clearly fair use, but the people who issue DMCA claims don’t care.

    • ciwolsey
      link
      fedilink
      English
      7
      edit-2
      1 year ago

      You could run a paid training course using a paid-for book, that doesn’t mean you’re breaking copyright.

    • Schadrach
      link
      fedilink
      English
      31 year ago

      I think a lot of people are not getting it. AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour.

      Only in the same way that I could argue that if you’ve ever watched any of the classic Disney animated movies then anything you ever draw for the rest of your life infringes on Disney’s copyright, and if you draw anything for money then the Disney animated movies you have seen in your life have been used in a money making endeavor. This is of course ridiculous and no one would buy that argument, but when you replace a human doing it with a machine doing essentially the same thing (observing and digesting a bunch of examples of a given kind of work, and producing original works of the general kind that meet a given description) suddenly it’s different, for some nebulous reason that mostly amounts to creatives who believed their jobs could not at least in part be automated away trying to get explicit protection from their jobs being at least in part automated away.