FearTheCron@lemmy.world to Lemmy.World Announcements@lemmy.world · 2 年前

What is your opinion of the Large Language Model (LLM) argument made by Reddit?

10

What is your opinion of the Large Language Model (LLM) argument made by Reddit?

FearTheCron@lemmy.world to Lemmy.World Announcements@lemmy.world · 2 年前

One of the arguments made for Reddit’s API changes is that they are now the go to place for LLM training data (e.g. for ChatGPT).

https://www.reddit.com/r/reddit/comments/145bram/addressing_the_community_about_changes_to_our_api/jnk9izp/?context=3

I haven’t seen a whole lot of discussion around this and would like to hear people’s opinions. Are you concerned about your posts being used for LLM training? Do you not care? Do you prefer that your comments are available to train open source LLMs?

(I will post my personal opinion in a comment so it can be up/down voted separately)

You must log in or # to comment.

Chat

FearTheCron@lemmy.worldOP
link
fedilink
arrow-up
15·
2 年前
My personal opinion is that high API usage fees hurt open source LLMs (e.g. GPT4All). I would rather not see this new technology monopolized by those who can pay API fees.
- realslef@fedia.io
  link
  fedilink
  arrow-up
  2·
  2 年前
  Yes, LLMs are a problem for server operators, but Reddit’s attempted cure has horrible side-effects.
  - FearTheCron@lemmy.worldOP
    link
    fedilink
    arrow-up
    1·
    2 年前
    I totally agree that Reddit’s approach has horrible side effects. However, if hosting costs were not an issue, how would you feel about people using your comments for LLM training?
jon@lemmy.tf
link
fedilink
arrow-up
10·
2 年前
Sure, the LLMs are more than welcome to make use of my memes and random comments. Anything on the internet is public for all to use, after all.
lynny@lemmy.world
link
fedilink
arrow-up
8·
2 年前
I think it’s a good idea to monetize the API for that, but not to the extreme Reddit has gone. When your API is so expensive that it kills off anything other than data scraping bots you messed up.
meli nasa@feddit.de
link
fedilink
arrow-up
5·
2 年前
I don’t really care to be honest. If something’s public on social media, it’s public, and it’s no longer on you to decide how it will be used. I really like the Stack Exchange policy that all posts are publicized under a Creative Commons license. Though they seem hell-bent on killing that, too.
- FearTheCron@lemmy.worldOP
  link
  fedilink
  arrow-up
  3·
  2 年前
  Yeah, I think a creative commons style license makes sense and that was always my intent when posting things. However, when you post creative commons content, you do get to decide the restrictions (e.g. commercial or noncommercial).
  
  I think its currently an open question how this applies to generative AI and LLMs. Perhaps the output of generative AI should retain the license of the training data? Or perhaps that is overly restrictive? There are those who believe that training commercial generative AI on data under permissive licenses is a problem.
  
  https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/
  
  https://slate.com/technology/2022/12/lensas-a-i-avatars-the-uncomfortable-places-their-magic-comes-from.html
  
  I am not really sure where I stand on the overall issue. But the worst case scenario in my opinion is one where open source generative AI is hobbled by regulation paving the way for corporate control. My biggest fear about the Reddit API changes prevent anyone except Google, Facebook, Microsoft, Amazon, etc from using user comments as a training set.
  - meli nasa@feddit.de
    link
    fedilink
    arrow-up
    3·
    2 年前
    I don’t know either. I’ll agree with you though that not restricting AI so that only big tech companies who have lots of lawyers can research it (and not release it) is the worst case scenario. And I fear that it’s either that or complete dysregulation. OpenAI etc. just have too much money for lobbying, and given this is all happening in the US, which seems to be quite susceptible to monetary influence in politics, so I doubt any laws are gonna be passed to restrict them. Besides, there’s the national interest in not letting China take the lead.
OptimusPrime
link
fedilink
arrow-up
4·
edit-2
2 年前
deleted by creator
- FearTheCron@lemmy.worldOP
  link
  fedilink
  arrow-up
  3·
  2 年前
  Certainly the archived Reddit posts will be used for that for years to come regardless. What I am curious about is how do you feel about your posts contributing to the output of a LLM (independent of API usage costs)?
  
  LLMs can be specialized to tasks by training them further on a curated set of data. For example, a LLM trained specifically on your posts will sound more like you than the LLM before the training. Does it bother you that someone may use your posts for this purpose?
  - OptimusPrime
    link
    fedilink
    arrow-up
    1·
    edit-2
    2 年前
    deleted by creator
TheBananaKing@lemmy.world
link
fedilink
arrow-up
1·
2 年前
I don’t care if people train models off my posts. I released the content into the wild; I don’t much care what happens to it after that. Attribution of direct quotes is nice to have, but twiddling some weights in a language model is far too abstruse for me to care about.

And sure, if openAI is inhaling all of reddit, it’s reasonable to charge for that.

But shutting down third-party apps was never about that.

Lemmy.World Announcements@lemmy.world

lemmyworld@lemmy.world

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news 🐘

Outages 🔥

https://status.lemmy.world/

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to [email protected] e-mail.

Report contact

DM https://lemmy.world/u/lwreport
Email [email protected] (PGP Supported)

Donations 💗

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Join the team

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
6 users / week
37 users / month
4K users / 6 months
136 local subscribers
30.5K subscribers
850 Posts
39.4K Comments
Modlog