I’ve installed koboldcpp on a thinkpad x1 with 32gb RAM and a i7-1355U, no GPU. Sure, it’s only just around 1 token/s but for a chat it is still usable (about 15 s per reply). The setup was easier than expected.

  • @raffaOP
    link
    English
    23 days ago

    My first test was with Starcannon-Unleashed-12B-v1.0-f16, a 23Gbyte model. I did not expect that laptop to be usable at all.

    • @magn418M
      link
      English
      1
      edit-2
      3 days ago

      I think doing the calculations at full precision (FP16) is a waste. You should try somewhere between the Q4_K_M version to Q6_K (or at least Q8_0, that’s supposed to the same quality as FP16). That way it should be considerably faster… At least twice as fast.

      (The GGUF page of that model has a list of recommended quantization levels.)

      • @raffaOP
        link
        English
        23 days ago

        thanks for the tips!