@[email protected] to

[email protected]English • 7 months ago

Reddit if full of bots: thread reposted exactly the same, comment by comment, 10 months later

1.45K

Reddit if full of bots: thread reposted exactly the same, comment by comment, 10 months later

@[email protected] to

[email protected]English • 7 months ago

For the threads with the older one on the left: https://lemmy.world/post/14859950

(Thank you @[email protected] )

Chat

Anti-Face Weapon
link
fedilink
217•7 months ago
My understanding of how this works is that that left one is real accounts making real comments, at least in the majority.

Then when the link gets reposted, either by a bot or naturally, potentially depending on the title, the bots scrape the old comments and post them.

It’s content farming. And Reddit is probably okay with this.
- @[email protected]
  link
  fedilink
  177•7 months ago
  The right one is the “real” accounts. Notice how the left one is newer and all the accounts have names ending with four digits, except where they aren’t copies from the right.
  - @[email protected]
    link
    fedilink
    34•7 months ago
    No, the left one is older and most the names in the right contain four numbers.
    
    What’s going on here?
    
    Maybe op updated the picture?
    - @[email protected]
      link
      fedilink
      24•7 months ago
      I did, because other people complained in another comment that it was confusing to not have the older thread on the left.
      
      Anyway, it’s pretty obvious which one is which one
      - @[email protected]
        link
        fedilink
        17•7 months ago
        Thanks I almost thought I’m delusional
        
        @[email protected]
        link
        fedilink
        3•7 months ago
        I also thought you were, lmao.
      - @[email protected]
        link
        fedilink
        1•6 months ago
        deleted by creator
    - @[email protected]
      link
      fedilink
      2•7 months ago
      yeah they did for some reason it seems
  - @[email protected]
    link
    fedilink
    10•7 months ago
    The list of names at the left creeps me the fuck out.
    - @[email protected]
      link
      fedilink
      14•7 months ago
      I saw this exact same style of bot account years ago on Tumblr. They always follow the same naming scheme: one word or two words combined and then a string of 4 digits. I bet if you go to any of their profiles, you’ll find like 4 comments that are all copied from old threads and a bunch of upvotes on completely random subs, possibly even all of them being on other bot accounts’ posts and comments.
      
      The real question is whether they’re being used to fake activity on Reddit, sway public opinion by posting this sort of political slant, or will they later be used to advertise scams and this is just to make them seem legitimate.
      - @[email protected]
        link
        fedilink
        9•7 months ago
        Why not all of the above? If you have a service, you want to sell it to as many customers as possible.
        
        @[email protected]
        link
        fedilink
        2•7 months ago
        Very good point.
      - @[email protected]
        link
        fedilink
        1•7 months ago
        I thought the names followed that format because that’s the format reddit used for suggestions when signing up.
        
        I think the accounts are kind of “warmed up” this way to make them harder for reddit to identify as bots when they’re used for vote manipulation.
        
        Like a bot that just voted in /r/politics threads world be easier to identify than one which comments here and there and gets a few upvotes itself.
- livus
  link
  fedilink
  88•7 months ago
  Reddit is going to poison LLMs sooner than I thought.
  - @[email protected]
    link
    fedilink
    25•7 months ago
    LMAO while AIs reading training data sets get stuck in infinite loops.
  - bjorney
    link
    fedilink
    3•7 months ago
    Reddit probably omits bot accounts when it sells its data to AI companies
    - @[email protected]
      link
      fedilink
      35•7 months ago
      I doubt Reddit is in charge of many of the existing bots on their site.
      - bjorney
        link
        fedilink
        4•7 months ago
        Reddit has access to its own data - they absolutely know which users are posting unique content and which user’s content is a 100% copy of data that exists elsewhere on their own platform
        
        @[email protected]
        link
        fedilink
        20•7 months ago
        I know they could be I’m just not sure they’re that competent. These bots often aren’t single user or just copy paste either, there’s usually some effort to mix it up or change wording slightly. Reddits internal search function is infamously shit but they “know” which users are unlabeled bots with some effort put behind them?
        
        @[email protected]
        link
        fedilink
        5•7 months ago
        Removed by mod
        
        bjorney
        link
        fedilink
        2•7 months ago
        I know everyone here likes to circle jerk over “le Reddit so incompetent” but at the end of the day they are a (multi) billion dollar company and it’s willfully ignorant to infer that there isn’t a single engineer at the company who knows how to measure string similarity between two comment trees (hint: import difflib in python)
        
        @[email protected]
        link
        fedilink
        8•
        edit-2
        7 months ago
        
        To compare every comment on reddit to every other comment in reddit’s entire history would require an index, and if you want to find similar comments instead of exact matches, it becomes a lot harder to do that efficiently. ElasticSearch might be able to do it, but then you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much when people are leaving new comments, and that would probably be expensive.
        
        Comparing combinations of comments is probably impossible. Reddit has a massive number of comments to begin with, and the number of possible subtrees of those comments would just be absurd. If you only care about comparing entire threads and not subtrees, then this doesn’t apply, but I don’t know how useful that will be.
        
        Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.
        
        bjorney
        link
        fedilink
        1•
        edit-2
        7 months ago
        
        To compare every comment on reddit to every other comment in reddit’s entire history would require an index
        
        You think in Reddit’s 20 year history no one has thought of indexing comments for data science workloads? A cursory glance at their engineering blog indicates they perform much more computationally demanding tasks on comment data already for purposes of content filtering
        
        you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much
        
        Analytics workflows are never run on the production database, always on read replicas which are taken asynchronously and built from the transaction logs so as not to affect production database read/write performance
        
        Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.
        
        Reddit’s entire monetization strategy is collecting user data and selling it to advertisers - It’s incredibly naive to think that they don’t have a vested interest in identifying organic engagement
    - livus
      link
      fedilink
      15•7 months ago
      Doubt it, they are interwoven into almost any conversation with more than 70 comments.
      - bjorney
        link
        fedilink
        6•7 months ago
        If you have access to the entire Reddit comment corpus it’s trivial to see which users are only reposting carbon copies of content that appears elsewhere on the site
        
        @[email protected]
        link
        fedilink
        11•7 months ago
        It’s probably not as easy as you imagine for reddit to identify and cleanse all bot content.
        
        livus
        link
        fedilink
        2•7 months ago
        Of course it’s not. Nor do they want to.
        
        I think the person you’re talking to thinks all bots are like the easy ones in this screenshot.
        
        bjorney
        link
        fedilink
        1•
        edit-2
        7 months ago
        Look at the picture above - this is trivially easy. We are talking about identifying repost bots, not seeing if users pass/fail the Turing test
        
        If 99% of a user’s posts can be found elsewhere, word for word, with the same parent comment, you are looking at a repost bot
        
        @[email protected]
        link
        fedilink
        5•7 months ago
        That’s easy in an isolated case like this, but the reality of the entire reddit comment base is much more complex.
        
        livus
        link
        fedilink
        3•
        edit-2
        7 months ago
        The low level bots in OPs screenshot, sure, because it’s identical. Not the rest.
        
        I used to hunt bots on reddit for a hobby and give the results to Bot Defense.
        
        Some of them use rewrites of comments with key words or phrases changed to other words or phrases from a thesaurus to avoid detection. Some of them combine elements from 2 comments to avoid detection. Some of them post generic comments like 💯. Doubtless there are some using AI rewrites of comments now.
        
        My thought process is if generic bots have been allowed to go so rampant they fill entire threads that’s an indication of how bad the more sophisticated bot problem has become.
        
        And I think @phdepressed is right, no one at reddit is going to hunt these sophisticated bots because they inflate numbers. Part of killing the API use was to kill bot detection after all.
        
        bjorney
        link
        fedilink
        1•
        edit-2
        7 months ago
        Reddit has way more data than you would have been exposed to via the API though - they can look at things like user ARN (is it coming from a datacenter), whether they were using a VPN, they track things like scroll position, cursor movements, read time before posting a comment, how long it takes to type that comment, etc.
        
        no one at reddit is going to hunt these sophisticated bots because they inflate numbers
        
        You are conflating “don’t care about bots” with “don’t care about showing bot generated content to users”. If the latter increases activity and engagement there is no reason to put a stop to it, however, when it comes to building predictive models, A/B testing, and other internal decisions they have a vested financial interest in making sure they are focusing on organic users - how humans interact with humans and/or bots is meaningful data, how bots interact with other bots is not
- @[email protected]
  link
  fedilink
  71•7 months ago
  It’s account farming. They make fake accounts look legitimate so they can use them to influence opinions on the site.
  - livus
    link
    fedilink
    4•7 months ago
    They also use them in groups of 3 to lure people to malicious sites and scam sites. Especially fake merchandise sites.
- kubica
  link
  fedilink
  36•7 months ago
  Basically replaying a thread to make it look like there’s activity in the sub.
- @[email protected]
  link
  fedilink
  11•7 months ago
  deleted by creator
  - Anti-Face Weapon
    link
    fedilink
    2•6 months ago
    The left predates the right by 10 months
    - @[email protected]
      link
      fedilink
      1•6 months ago
      deleted by creator

[email protected]

[email protected]

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

News and Discussions about Reddit

Welcome to !reddit. This is a community for all news and discussions about Reddit.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules

Rule 1- No brigading.

**You may not encourage brigading any communities or subreddits in any way. **

YSKs are about self-improvement on how to do things.

Rule 2- No illegal or NSFW or gore content.

**No illegal or NSFW or gore content. **

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That’s it.

Rule 5- No baiting or sealioning or promoting an agenda.

Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding META posts.

Provided it is about the community itself, you may post non-Reddit posts using the [META] tag on your post title.

Rule 7- You can't harass or disturb other members.

If you vocally harass or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- Majority of bots aren't allowed to participate here.

95 users / day
565 users / week
1.68K users / month
8.47K users / 6 months
17.4K subscribers
735 Posts
29.9K Comments
Modlog