@[email protected] to

[email protected] • 11 months ago

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

arstechnica.com

248

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

arstechnica.com

@[email protected] to

[email protected] • 11 months ago

"Copyright today covers virtually every sort of human expression" and cannot be avoided.

Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?

“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

Chat

frog 🐸
link
fedilink
English
64•
edit-2
11 months ago
I wish I could upvote this more than once.

What people always seem to miss is that a human doesn’t need billions of examples to be able to produce something that’s kind of “eh, close enough”. Artists don’t look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isn’t looking at billions of examples: it’s looking at a few, and then practicing the skill and developing a process that allows them to convey the thing they’re trying to express.

If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.
- Phanatik
  link
  fedilink
  24•11 months ago
  Exactly! You can glean so much from a single work, not just about the work itself but who created it and what ideas were they trying to express and what does that tell us about the world they live in and how they see that world.
  
  This doesn’t even touch the fact that I’m learning to draw not by looking at other drawings but what exactly I’m trying to draw. I know at a base level, a drawing is a series of shapes made by hand whether it’s through a digital medium or traditional pen/pencil and paper. But the skill isn’t being able replicate other drawings, it’s being able to convert something I can see into a drawing. If I’m drawing someone sitting in a wheelchair, then I’ll get the pose of them sitting in the wheelchair but I can add details I want to emphasise or remove details I don’t want. There’s so much that goes into creative work and I’m tired of arguing with people who have no idea what it takes to produce creative works.
  - frog 🐸
    link
    fedilink
    English
    26•11 months ago
    It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than “well anyone can draw, children do it all the time”. They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They don’t get it, and to an extent, that’s fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.
    - @[email protected]
      link
      fedilink
      1•11 months ago
      Well, that’s not me. I’m a creative, and I see deep parallels between how LLMs work and how my own mind works.
      - frog 🐸
        link
        fedilink
        English
        6•11 months ago
        Either you’re vastly overestimating the degree of understanding and insight AIs possess, or you’re vastly underestimating your own capabilities. :)
        
        Veloxization
        link
        fedilink
        2•11 months ago
        This whole AI craze has just shown me that people are losing faith in their own abilities and their ability to learn things. I’ve heard so many who use AI to generate “artwork” argue that they tried to do art “for years” without improving, and hence have come to conclusion that creativity is a talent that only some have, instead of a skill you can learn and hone. Just because they didn’t see results as fast as they’d have liked.
        
        frog 🐸
        link
        fedilink
        English
        2•11 months ago
        Very well said! Creativity is definitely a skill that requires work, and for which there are no short cuts. It seems to me that the vast majority of people using AI for artwork are just looking for a short cut, so they can get the results without having to work hard and practice. The one valid exception is when it’s used by disabled people who have physical limitations on what they can do, which is a point that’s brought up occasionally - and if that was the one and only use-case for these models, I think a lot of artists would actually be fine with that.
        
        Veloxization
        link
        fedilink
        English
        1•11 months ago
        I started drawing seriously when I was 14. Looking at my old artwork, I didn’t start improving fast until I was around 19 or 20. Not to say I didn’t improve at all during those five to six years but the pace did get faster once I had “learned to learn” so to say. That is to say it can take a lot of patience to get to a point where you actually start seeing improvement fast enough to stay motivated. But it is 100% worth it because at the end you have a lot of things you have created with your own two hands.
        
        And regarding the point on physical limitations, I can’t blame anyone in a situation like that for using AI if they have no other chance for realising their imaginations. For others, it is completely possible and not reserved for people who have some mythical innate talent. Just grab a pen or a brush and enjoy the process of honing a fine skill regardless of the end result. ❤️
        
        @[email protected]
        link
        fedilink
        1•11 months ago
        Alternatively, you might be vastly overestimating human “understanding and insight”, or how much of it is really needed to create stuff.
        
        frog 🐸
        link
        fedilink
        English
        3•
        edit-2
        11 months ago
        Average humans, sure, don’t have a lot of understanding and insight, and little is needed to be able to draw a doodle on some paper. But trained artists have a lot of it, because part of the process is learning to interpret artworks and work out why the artist used a particular composition or colour or object. To create really great art, you do actually need a lot of understanding and insight, because everything in your work will have been put there deliberately, not just to fill up space.
        
        An AI doesn’t know why it’s put an apple on the table rather than an orange, it just does it because human artists have done it - it doesn’t know what apples mean on a semiotic level to the human artist or the humans that look at the painting. But humans do understand what apples represent - they may not pick up on it consciously, but somewhere in the backs of their minds, they’ll see an apple in a painting and it’ll make the painting mean something different than if the fruit had been an orange.
        
        @[email protected]
        link
        fedilink
        1•11 months ago
        
        it doesn’t know what apples mean on a semiotic level
        
        Interestingly, LLMs seem to show emerging semiotic organization. By analyzing the activation space of the neural network, related concepts seem to get trained into similar activation patterns, which is what allows LLMs to zero shot relationships when executed at a “temperature” (randomness level) in the right range.
        
        Pairing an LLM with a stable diffusion model, allows the resulting AI to… well, judge by yourself: https://llm-grounded-diffusion.github.io/
        
        frog 🐸
        link
        fedilink
        English
        2•11 months ago
        I’m unconvinced that the fact they’re getting better at following instructions, like putting objects where the prompter specifies, or changing the colour, or putting the right number of them, etc means the model actually understands what the objects mean beyond their appearance. It doesn’t understand the cultural meanings attached to each object, and thus is unable to truly make a decision about why it should place an apple rather than an orange, or how the message within the picture changes when it’s a red sports car rather than a beige people-carrier.
- Quokka
  link
  fedilink
  English
  9•
  edit-2
  11 months ago
  Children learn by watching others. We are trained from millions of examples starting from before birth.
- @[email protected]
  link
  fedilink
  English
  5•11 months ago
  Removed by mod
  - @[email protected]
    link
    fedilink
    3•11 months ago
    It makes sense to judge how closely LLMs mimic human learning when people are using it as a defense to AI companies scraping copyrighted content, and making the claim that banning AI scraping is as nonsensical as banning human learning.
    
    But when it’s pointed out that LLMs don’t learn very similarly to humans, and require scraping far more material than a human does, suddenly AIs shouldn’t be judged by human standards? I don’t know if it’s intentional on your part, but that’s a pretty classic example of a motte-and-bailey fallacy. You can’t have it both ways.
    - @[email protected]
      link
      fedilink
      English
      1•11 months ago
      Removed by mod
  - @[email protected]
    link
    fedilink
    3•
    edit-2
    11 months ago
    In general I agree with you, but AI doesn’t learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think it’s important to make the distinction.
    
    That is why current models aren’t regarded as actual intelligence, although people already call them that…
    - @[email protected]
      link
      fedilink
      English
      1•
      edit-2
      11 months ago
      Removed by mod
- @[email protected]
  link
  fedilink
  4•11 months ago
  What you count as “one” example is arbitrary. In terms of pixels, you’re looking at millions right now.
  
  The ability to train faster using fewer examples in real time, similar to what an intelligent human brain can do, is definitely a goal of AI research. But right now, we may be seeing from AI what a below average human brain could accomplish with hundreds of lifetimes to study.
  
  If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.
  
  I mean, no, if you only ever look at public domain stuff you literally wouldn’t know the state of the art, which is historically happening for profit. Even the most untrained artist “doing their own thing” watches Disney/Pixar movies and listens to copyrighted music.
  - frog 🐸
    link
    fedilink
    English
    9•11 months ago
    If we’re going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.
    
    And humans don’t require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just… go outside and draw things they see themselves, because the sky above them and the tree across the street aren’t copyrighted. And in fact, I’d argue that a good artist should go out and find real things to draw.
    
    OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities - too vast for them to simply compensate all the creators. So it genuinely is not comparable to a human, because humans can, in fact, learn without using copyrighted material. If OpenAI’s argument is actually that their AI can’t compete commercially with modern art without using copyrighted works, then they should be honest about that - but then they’d be showing their hand, wouldn’t they?
    - @[email protected]
      link
      fedilink
      English
      2•11 months ago
      Removed by mod
      - @[email protected]
        link
        fedilink
        2•11 months ago
        I think the difference in artistic expression between modern humans and humans in the past comes down to the material available (like the actual material to draw with).
        
        Humans can draw without seeing any image ever. Blind people can create art and draw things because we have a different understanding of the world around us than AI has. No human artist needs to look at a thousand or even at 1 picture of a banana to draw one.
        
        The way AI sees and “understands” the world and how it generates an image is fundamentally different from how the human brain conveys the object banana into an image of a banana.
        
        @[email protected]
        link
        fedilink
        English
        1•11 months ago
        Removed by mod
    - @[email protected]
      link
      fedilink
      2•11 months ago
      
      Sure, if they want to compete with modern artists, they would need to look at modern artists
      
      Which is the literal goal of Dall-E, SD, etc.
      
      But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works
      
      They could definitely learn some amount of skill, I agree. I’d be very interested to see the best that an AI could achieve using only PD and CC content. It would be interesting. But you’d agree that it would look very different from modern art, just as an alien who has only been consuming earth media from 100+ years ago would be unable to relate to us.
      
      the sky above them and the tree across the street aren’t copyrighted.
      
      Yeah, I’d consider that PD/CC content that such an AI would easily have access to. But obviously the real sky is something entirely different from what is depicted in Starry Night, Star Wars, or H.P. Lovecraft’s description of the cosmos.
      
      OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities
      
      Yeah, I’d consider that a strong claim on their part; what they really mean is, it’s the easiest way to make progress in AI, and we wouldn’t be anywhere close to where we are without it.
      
      And you could argue “convenient that it both saves them money, and generates money for them to do it this way”, but I’d also point out that the alternative is they keep the trained models closed source, never using them publicly until they advance the tech far enough that they’ve literally figured out how to build/simulate a human brain that is able to learn as quickly and human-like as you’re describing. And then we find ourselves in a world where one or two corporations have this incredible proprietary ability that no one else has.
      
      Personally, I’d rather live in the world where the information about how to do all of this isn’t kept for one or two corporations to profit from, I would rather live in the version where they publish their work publicly, early, and often, show that it works, and people are able to reproduce it, open source it, train their own models, and advance the technology in a space where anyone can use it.
      
      You could hypothesize of a middle ground where they do the research, but aren’t allowed to profit from it without licensing every bit of data they train on. But the reality of AI research is that it only happens to the extent that it generates revenue. It’s been that way for the entire history of AI. Douglas Hofstadter has been asking deep important questions about AI as it relates to consciousness for like 60 years (ex. GEB, I am a Strange Loop), but there’s a reason he didn’t discover LLMs and tech companies did. That’s not to say his writings are meaningless, in fact I think they’re more important than ever before, but he just wasn’t ever going to get to this point with a small team of grad students, a research grant, and some public domain datasets.
      
      So, it’s hard to disagree with OpenAI there, AI definitely wouldn’t be where it is without them doing what they’ve done. And I’m a firm believer that unless we figure our shit out with energy generation soon, the earth will be an uninhabitable wasteland. We’re playing a game of climb the Kardashev scale, we opted for the “burn all the fossil fuels as fast as possible” strategy, and now we’re a the point where either spent enough energy fast enough to figure out the tech needed to survive this, or we suffocate on the fumes. The clock is ticking, and AI may be our best bet at saving the human race that doesn’t involve an inordinate number of people dying.
      - frog 🐸
        link
        fedilink
        English
        4•11 months ago
        OpenAI are not going to make the source code for their model accessible to all to learn from. This is 100% about profiting from it themselves. And using copyrighted data to create open source models would seem to violate the very principles the open source community stands for - namely that everybody contributes what they agree to, and everything is published under a licence. If the basis of an open source model is a vast quantity of training data from a vast quantity of extremely pissed off artists, at least some of the people working on that model are going to have a “are we the baddies?” moment.
        
        The AI models are also never going to produce a solution to climate change that humans will accept. We already know what the solution is, but nobody wants to hear it, and expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous. And an AI that is trained specifically on knowledge about the climate and technologies that can improve it, with the purpose of innovating some hypothetical technology that will fix everything without humans changing any of their behaviour, categorically does not need the entire contents of ArtStation in its training data. AIs that are trained to do specific tasks, like the ones trained to identify new antibiotics, are trained on a very limited set of data, most of which is not protected by copyright and any that is can be easily licenced because the quantity is so small - and you don’t see anybody complaining about those models!
        
        @[email protected]
        link
        fedilink
        2•11 months ago
        
        OpenAI are not going to make the source code for their model accessible to all to learn from
        
        OpenAI isn’t the only company doing this, nor is their specific model the knowledge that I’m referring to.
        
        The AI models are also never going to produce a solution to climate change that humans will accept.
        
        It is already being used to further fusion research beyond anything we’ve been able to do with standard algorithms
        
        We already know what the solution is, but nobody wants to hear it
        
        Then it’s not a solution. That’s like telling your therapist, “I know how to fix my relationship, my partner just won’t do it!”
        
        expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous
        
        Lol. Yeah, I agree, that’s never going to work.
        
        categorically does not need the entire contents of ArtStation in its training data.
        
        That’s a strong claim to make. Regardless of the ethics involved, or the problems the AI can solve today, the fact is we seeing rapid advances in AI research as a direct result of these ethically dubious models.
        
        In general, I’m all for the capitalist method of artists being paid their fair share for the work they do, but on the flip side, I see a very possible mass extinction event on the horizon, which could cause suffering the likes of which humanity has never seen. If we assume that is the case, and we assume AI has a chance of preventing it, then I would prioritize that over people’s profits today. And I think it’s perfectly reasonable to say I’m wrong.
        
        And then there’s the problem of actually enforcing any sort of regulation, which would be so much more difficult than people here are willing to admit. There’s basically nothing you can do even if you wanted to. Your Carlin example is exactly the defense a company would use: “I guess our AI just happened to create a movie that sounds just like Paul Blart, but we swear it’s never seen the film. Great minds think alike, I guess, and we sell only the greatest of minds”.
        
        frog 🐸
        link
        fedilink
        English
        1•11 months ago
        Personally I think the claim that the entire contents of ArtStation will lead to working technology that fixes climate change is the bolder claim - and if there was any merit to it, there would be some evidence for it that the corporations who want copyright to be disapplied to artists would be able to produce. And if we’re saying that getting rid of copyright protections will save the planet, then perhaps Disney should give up theirs as well. Because that’s the reality here: we’re expecting humans to be obliterated by AI but are not expecting the rich and powerful to make any sacrifices at all. And art is part of who we are as a species, and has been for hundreds of thousands of years. Replacing artists with AI because somehow that will fix climate change is not only a massive stretch, but what would we even be saving humanity for at that point? So that everybody can slave away in insecure, meaningless work so the few can hoard everything for themselves? Because the Star Trek utopia where AI does all the work and humans can pursue self-enrichment is not an option on the table. The tech bros just want you to think it is.
  - @[email protected]
    link
    fedilink
    4•11 months ago
    Humans learn mostly from real life. Go touch some grass
- @[email protected]
  link
  fedilink
  2•11 months ago
  When you look at one painting, is that the equivalent of one instance of the painting in the training data? There is an infinite amount of information in the painting, and each time you look you process more of that information.
  
  I’d say any given painting you look at in a museum, you process at least a hundred mental images of aspects of it. A painting on your wall could be seen ten thousand times easily.

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

642 users / day
1.55K users / week
2.86K users / month
7.42K users / 6 months
37.7K subscribers
3.29K Posts
71.5K Comments
Modlog