@[email protected] to

[email protected]English • 9 months ago

OpenAI strikes Reddit deal to train its AI on your posts

www.theverge.com

526

OpenAI strikes Reddit deal to train its AI on your posts

www.theverge.com

@[email protected] to

[email protected]English • 9 months ago

Reddit’s deal with OpenAI will plug its posts into “ChatGPT and new products”

www.theverge.com

Reddit’s signed AI licensing deals with Google and OpenAI.

Chat

yeehaw
link
fedilink
English
10•9 months ago
If only snapshots and backups were a thing…
- @[email protected]
  link
  fedilink
  English
  6•9 months ago
  It’s theoretically possible, but the issue that anyone trying to do that would run into is consistency.
  
  How do you restore the snapshots of a database to recover deleted comments but also preserve other comments newer than the snapshot date?
  
  The answer is that it’s nearly impossible. Not impossible, but not worth the massive monumental effort when you can just focus on existing comments which greatly outweigh any deleted ones.
  - yeehaw
    link
    fedilink
    English
    1•9 months ago
    It’s a piece of cake. Some code along the lines of:
    
    If ($user.modifyCommentRecentlyCount > 50){
    
    Print “user is nuking comments” $comment = $previousComment }
    
    Or some shit. It can be done quite easily, trust me.
    - @[email protected]
      link
      fedilink
      English
      1•9 months ago
      
      It can be done quite easily, trust me.
      
      The words of every junior dev right before I have to spend a weekend undoing their crap.
      
      I’ve been there too many times.
      
      There are always edge cases you need to account for, and you can’t account for them until you run tests and then verify the results.
      
      And you’d be parsing billions upon billions of records. Not a trivial thing to do when running multiple tests to verify. And ultimately for what is a trivial payoff.
      
      You don’t screw around with infinitely invaluable prod data of your business without exhausting every single possibility of data modification.
      
      It’s a piece of cake.
      
      It hurts how often I’ve heard this and how often it’s followed by a massive screw up.
      - yeehaw
        link
        fedilink
        English
        1•9 months ago
        
        The words of every junior dev right before I have to spend a weekend undoing their crap.
        
        There are so many ways this can be done that I think you are not thinking of. Say a user goes to “shreddit” (or some other similar app) their comments. They likely have thousands. On every comment edit, it’s quite easy to check the last time the users edited one of their comments. All they need is some check like checking if the last 10 consecutive comments were edited in hours or milliseconds/seconds. After that, reddit could easily just tell the user it’s editing their comments but it’s not. Like a shadowban kind of method. Another way would be at the data structure level. We don’t know what their databases and hardware are like, but I can speculate. What if each user edited comment is not an update query on a database, but an add/insert. Then all you need to do is update the live comments where the date is before the malicious date where the username=$username. Not to mention when you start talking Nimble storage and stuff like that, the storage is extremely quick to respond. Hell I would wager it didn’t even hit storage yet, probably still on some all flash cache or in memory. Another way could be at the filesystem level. Ever heard of zfs? What if each user had their own dataset or something, it’s extremely easy and quick to roll back a snapshot, or to clone the previous snapshot. There are so many ways.
        
        At the end of the day a user is triggering this action, so we don’t necessarily need to parse “billions” of records. Just the records for a single user.
        
        @[email protected]
        link
        fedilink
        English
        1•
        edit-2
        9 months ago
        
        There are so many ways this can be done that I think you are not thinking of.
        
        No, I can think of countless ways to do this. I do this kind of thing every single day.
        
        What I’m saying is that you need to account for every possibility. You need to isolate all the deleted comments that fit the criteria of the “Reddit Exodus”.
        
        How do you do that? Do you narrow it down to a timeframe?
        
        The easiest way to do this is identify all deleted accounts, find the backup with the most recent version of their profile with non-deleted comments, and insert that user back into the main database (not the prod db).
        
        Now you need to parse billions upon billions upon billions of records. And yes, it’s billions because you need the system to search through all the records to know which record fits the parameters. And you need to do that across multiple backups for each deleted profile/comment.
        
        It’s a lot of work. And what’s the payoff? A few good comments and a ton of “yes this ^” comments.
        
        I sincerely doubt it’s worth the effort.
        
        Edit: formatting
        
        yeehaw
        link
        fedilink
        English
        1•9 months ago
        
        How do you do that? Do you narrow it down to a timeframe?
        
        When a user edits a comment, they submit a response. When they submit a response, they trigger an action. An action can do validation steps and call methods, just like I said above, for example. When the edit action is triggered, check the timestamp against the previously edited comment’s timestamp. If the previous - or previous 5 are less than a given timeframe, flag it. “Shadowban” the user. Make it look like they’ve updated their comments to them, but in reality they’re the same.
        
        We’ve had detection methods for this sort of thing for a long time. Thing about how spam filtering works. If you’re using some tool to scramble your data, they likely have patterns. To think reddit doesn’t have some means to protect itself against this is naive. It’s their whole business. All these user submitted comments are worth money.
        
        Now you need to parse billions upon billions upon billions of records. And yes, it’s billions because you need the system to search through all the records to know which record fits the parameters. And you need to do that across multiple backups for each deleted profile/comment.
        
        This makes me thing you don’t understand my meaning. I think you’re talking about one day reddit decides to search for an restore obfuscated and deleted comments. Yes, that would be a large undertaking. This is not what I’m suggesting at all. Stop it while it’s happening, not later. Patterns and trends can easily identify when a user is doing something like shreddit or the like, then the code can act on it.
        
        It’s a lot of work. And what’s the payoff? A few good comments and a ton of “yes this ^” comments.
        
        this
        
        @[email protected]
        link
        fedilink
        English
        1•9 months ago
        
        This makes me thing you don’t understand my meaning. I think you’re talking about one day reddit decides to search for an restore obfuscated and deleted comments.
        
        Yes, that is what we’re talking about. There were a large amount of users that updated their comments to something basic and then deleted those comments. I’m fairly confident that before they happened they had zero need to implement a spam prevention system like you’re suggesting. The fact that all those users’ (including myself) comments are still <deleted> is evidence of that.
        
        They may have implemented someone like that recently, but not before.
  - @[email protected]
    link
    fedilink
    English
    1•9 months ago
    Just collate them based on edit/deletion date… Each post will have a last-edited attribute that can be used for sorting. Even more so once the AI is bootstrapped enough to start recognizing the standard protest edit messages. At that point you hardly even need human oversight anymore, because the bot will be able to recognize “that’s a fuck spez edit, ignore that; this post looks good; that’s a Shreddit/PowerDelete edit, ignore that” and so on. Can even have it fetch the previous edit automatically when it comes across something like that, to a point where a comment removed by a PowerDelete tool is nothing more than a cover letter that states “there was once a real human-generated comment in this location”.
- @[email protected]
  link
  fedilink
  English
  3•9 months ago
  Yea that’s the problem isn’t it. I had a great idea involving bullshit-efying my comments by editing them slowly with a LLM via long running script and repeatedly over months.
  
  I realised that they probably don’t delete the original text on edit anyway which, as you say is probably buried in a backup someplace.
  - Ace! _SL/S
    link
    fedilink
    English
    2•9 months ago
    I don’t think it is in backups only. My guess is they store your full edit history for each comment/post/whatever. Newest one will be shown on the frontend, rest is for data vampires
    - yeehaw
      link
      fedilink
      English
      2•9 months ago
      This is it exactly. Edits to use are “changed”. To the back end it’s just an iteration while the rest still exist.

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

6.88K users / day
10.7K users / week
17.6K users / month
31.6K users / 6 months
63K subscribers
13.5K Posts
566K Comments
Modlog