• livus
    link
    fedilink
    157 months ago

    Doubt it, they are interwoven into almost any conversation with more than 70 comments.

    • bjorney
      link
      fedilink
      67 months ago

      If you have access to the entire Reddit comment corpus it’s trivial to see which users are only reposting carbon copies of content that appears elsewhere on the site

      • @[email protected]
        link
        fedilink
        117 months ago

        It’s probably not as easy as you imagine for reddit to identify and cleanse all bot content.

        • livus
          link
          fedilink
          27 months ago

          Of course it’s not. Nor do they want to.

          I think the person you’re talking to thinks all bots are like the easy ones in this screenshot.

        • bjorney
          link
          fedilink
          1
          edit-2
          7 months ago

          Look at the picture above - this is trivially easy. We are talking about identifying repost bots, not seeing if users pass/fail the Turing test

          If 99% of a user’s posts can be found elsewhere, word for word, with the same parent comment, you are looking at a repost bot

          • @[email protected]
            link
            fedilink
            57 months ago

            That’s easy in an isolated case like this, but the reality of the entire reddit comment base is much more complex.

      • livus
        link
        fedilink
        3
        edit-2
        7 months ago

        The low level bots in OPs screenshot, sure, because it’s identical. Not the rest.

        I used to hunt bots on reddit for a hobby and give the results to Bot Defense.

        Some of them use rewrites of comments with key words or phrases changed to other words or phrases from a thesaurus to avoid detection. Some of them combine elements from 2 comments to avoid detection. Some of them post generic comments like 💯. Doubtless there are some using AI rewrites of comments now.

        My thought process is if generic bots have been allowed to go so rampant they fill entire threads that’s an indication of how bad the more sophisticated bot problem has become.

        And I think @phdepressed is right, no one at reddit is going to hunt these sophisticated bots because they inflate numbers. Part of killing the API use was to kill bot detection after all.

        • bjorney
          link
          fedilink
          1
          edit-2
          7 months ago

          Reddit has way more data than you would have been exposed to via the API though - they can look at things like user ARN (is it coming from a datacenter), whether they were using a VPN, they track things like scroll position, cursor movements, read time before posting a comment, how long it takes to type that comment, etc.

          no one at reddit is going to hunt these sophisticated bots because they inflate numbers

          You are conflating “don’t care about bots” with “don’t care about showing bot generated content to users”. If the latter increases activity and engagement there is no reason to put a stop to it, however, when it comes to building predictive models, A/B testing, and other internal decisions they have a vested financial interest in making sure they are focusing on organic users - how humans interact with humans and/or bots is meaningful data, how bots interact with other bots is not