I’ve installed koboldcpp on a thinkpad x1 with 32gb RAM and a i7-1355U, no GPU. Sure, it’s only just around 1 token/s but for a chat it is still usable (about 15 s per reply). The setup was easier than expected.

  • KinkyThoughts
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 months ago

    15 seconds per reply with just 1 token/s?! How short are they? What’s the context size to be processed? I get like 5 tokens per second on my GPU and need 1-2 minutes per reply on 4k context size.

    • raffaOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      context size default of 4096, replies are like 16 tokens or so.

      • KinkyThoughts
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        I mean the actual context size to be processed for the message, based on chat history, character cards, world info, etc. And which model?