@[email protected] to

[email protected]English • 10 months ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

www.tomshardware.com

1.71K

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

www.tomshardware.com

@[email protected] to

[email protected]English • 10 months ago

Stack Overflow is overflowing with salt.

Chat

@[email protected]
link
fedilink
English
90•10 months ago
Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.
- @[email protected]
  link
  fedilink
  English
  109•10 months ago
  That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren’t planning on doing that this is massively shitting on the concept of licensing.
  - JohnEdwa
    link
    fedilink
    English
    25•
    edit-2
    10 months ago
    CC attribution doesn’t require you to necessarily have the credits immediately with the content, but it would result in one of the world’s longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.
    - @[email protected]
      link
      fedilink
      English
      1•10 months ago
      They don’t need to republish the 60 million questions, they just have to credit the authors, which are surely way fewer (but IANAL)
      - JohnEdwa
        link
        fedilink
        English
        1•10 months ago
        
        appropriate credit — If supplied, you must provide the name of the creator and attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material. CC licenses prior to Version 4.0 also require you to provide the title of the material if supplied, and may have other slight differences.
        
        Maybe that could be just a link to the user page, but otherwise I would see it as needing to link to each message or comment they used.
  - @[email protected]
    link
    fedilink
    English
    18•10 months ago
    IF its outputs are considered derivative works.
    - @[email protected]
      link
      fedilink
      English
      22•10 months ago
      Ethically and logically it seems like output based on training data is clearly derivative work. Legally I suspect AI will continue to be the new powerful tool that enables corporations to shit on and exploit the works of countless people.
      - @[email protected]
        link
        fedilink
        English
        2•10 months ago
        The problem is the legal system and thus IP law enforcement is very biased towards very large corporations. Until that changes corporations will continue, as they already were, exploiting.
        
        I don’t see AI making it worse.
    - @[email protected]
      link
      fedilink
      English
      3•10 months ago
      They are not. A derivative would be a translation, or theater play, nowadays, a game, or movie. Even stuff set in the same universe.
      
      Expanding the meaning of “derivative” so massively would mean that pretty much any piece of code ever written is a derivative of technical documentation and even textbooks.
      
      So far, judges simply throw out these theories, without even debating them in court. Society would have to move a lot further to the right, still, before these ideas become realistic.
  - @[email protected]
    link
    fedilink
    English
    7•10 months ago
    Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.
    
    I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.
- @[email protected]
  link
  fedilink
  English
  64•10 months ago
  Share Alike
  
  I can’t wait to download my own version of the latest gpt model
  - @[email protected]
    link
    fedilink
    English
    11•10 months ago
    It does help to know what those funny letters mean. Now we wait for regulators to catch up…
    
    /tangent
    
    If anything, we’re a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being “AGI” and now more and more data is necessary to achieving that.
    
    If you know the internet, you know there’s a lot of garbage. I for one can’t wait for garbage-in garbage-out to start taking its toll.
    
    Also I’m surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for “brainstorming” in the loosest terms, as I generally know what I’m expecting, but it’s sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.
    
    It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

6.88K users / day
10.7K users / week
17.6K users / month
31.8K users / 6 months
63K subscribers
13.5K Posts
566K Comments
Modlog