• @[email protected]
    link
    fedilink
    English
    215 hours ago

    Am I misunderstanding your comment or does it completely ignore context windows? Not that context windows are long-term, but it’s not zero.

    • @[email protected]
      link
      fedilink
      4
      edit-2
      15 hours ago

      The context window is indeed the LLM’s memory.

      …But its also muddy.

      Many LLMs get ‘dumber’ and less attentive as their context windows grow, and OpenAI’s models just happen to be one of these. It’s awful close to the full 128K, even with the full GPT-4. Mistral models are also really bad at long context understanding while, conversely, I find that Google Gemini and Qwen 2.5 are really good close to their limits.

      There are attempts to try and measure this performance objectively, like: https://github.com/NVIDIA/RULER