• Lvxferre
    link
    fedilink
    English
    179 months ago

    As I often mention when this subject pops up: while the current statistics-based generative models might see some application, I believe that they’ll be eventually replaced by better models that are actually aware of what they’re generating, instead of simply reproducing patterns. With the current models being seen as “that cute 20s toy”.

    In text generation (currently dominated by LLMs), for example, this means that the main “bulk” of the model would do three things:

    • convert input tokens into sememes (units of meaning)
    • perform logic operations with the sememes
    • convert sememes back into tokens for the output

    Because, as it stands, LLMs are only chaining tokens. They might do this in an incredibly complex way, but that’s it. That’s obvious when you look at what LLM-fuelled bots output as “hallucination” - they aren’t the result of some internal error, they’re simply an undesired product of a model that sometimes outputs desirable stuff too.

    Sub “tokens” and “sememes” with “pixels” and “objects” and this probably holds true for image generating models, too. Probably.

    Now, am I some sort of genius for noticing this? Probably not; I’m just some nobody with a chimp avatar, rambling in the Fediverse. Odds are that people behind those tech giants already noticed the same ages ago, and at least some of them reached the same conclusion - that better gen models need more awareness. If they are not doing this already, it means that this shit would be painfully expensive to implement, so the “better models” that I mentioned at the start will probably not appear too soon.

    Most cracks will stay there; Google will hide them with an obnoxious band-aid, OpenAI will leave them in plain daylight, but the magic trick will still not be perfect, at least in the foreseeable future.

    And some might say “use MOAR processing power!”, or “input MOAR training data!”, in the hopes that the current approach will “magically” fix itself. For those, imagine yourself trying to drain the Atlantic with a bucket: does it really matter if you use more buckets, or larger buckets? Brute-forcing problems only go so far.

    Just my two cents.

    • @[email protected]
      link
      fedilink
      English
      7
      edit-2
      9 months ago

      I agree 100%, and I think Zuckerberg’s attempt at a massive 340,000 of Nvidia’s H100 GPUs AI based on LLM with the aim to create a generel AI sounds stupid. Unless there’s a lot more to their attempt, it’s doomed to fail.

      I suppose the idea is something about achieving critical mass, but it’s pretty obvious, that that is far from the only factor missing to achieve general AI.

      I still think it’s impressive what they can do with LLM. And it seems to be a pretty huge step forward. But It’s taken about 40 years from we had decent “pattern recognition” to get here, the next step could be another 40 years?

      • Lvxferre
        link
        fedilink
        English
        79 months ago

        I think that Zuckerberg’s attempt is a mix of publicity stunt and “I want [you] to believe!”. Trying to reach AGI through a large enough LLM sounds silly, on the same level as “ants build, right? If we gather enough ants, they’ll build a skyscraper! Chrust me.”

        In fact I wonder if the opposite direction wouldn’t be a bit more feasible - start with some extremely primitive AGI, then “teach” it Language (as a skill) and a language (like Mandarin or English or whatever).

        I’m not sure on how many years it’ll take for an AGI to pop up. 100 years perhaps, but I’m just guessing.

    • @[email protected]
      link
      fedilink
      English
      1
      edit-2
      9 months ago

      That’s a huge oversimplification of the way LLMs work. They’re not statistical in the way a Markov chain is. They use neural networks, which are a decent analogy for the human brain. The way the synapses between neurons are wired is obviously different, and the way the neurons are triggered and the types of signals they can send to other neurons is obviously different. But overall, similar capabilities can in theory be achieved with either method. If you’re going to call neural networks statistics based, you might as well call the human brain statistics based as well.

      • Lvxferre
        link
        fedilink
        English
        19 months ago

        That’s a huge oversimplification of the way LLMs work.

        I’m sticking to what matters for the sake of the argument. Anyone who wants to inform themself further has a plethora of online resources to do so.

        They’re not statistical in the way a Markov chain is.

        Implied: “you’re suggesting that they work like Markov chains, they don’t.”

        In no moment I mentioned or even implied Markov chains. My usage of the verb “to chain” is clearly vaguer within that context; please do not assume words onto my mouth.

        They use neural networks, which are a decent analogy for the human brain. The way the synapses between neurons are wired is obviously different, and the way the neurons are triggered and the types of signals they can send to other neurons is obviously different. But overall, similar capabilities can in theory be achieved with either method.

        I don’t disagree with the conclusion (i.e. I believe that neural networks can achieve human-like capabilities), but the argument itself is such a fallacious babble (false equivalence) that I’m not bothering further with your comment.

        And it’s also an “ackshyually” given this context dammit. I’m not talking about the bloody neural network, but how it is used.

        • @[email protected]
          link
          fedilink
          English
          1
          edit-2
          9 months ago

          No need to get offended. Maybe I misunderstood the intent behind your original message. I think you made a lot of good points.

          I brought up the Markov chain because a common misconception I’ve seen on the Internet and in real life is that LLMs work pretty much the same as Markov chains under the hood. And I saw no mention of neural networks in your original comment.