CPU-only i7-1355U koboldcpp works surprisingly well

raffa · 5 months ago

CPU-only i7-1355U koboldcpp works surprisingly well

NSFW

magn418 · edit-2 5 months ago

I think doing the calculations at full precision (FP16) is a waste. You should try somewhere between the Q4_K_M version to Q6_K (or at least Q8_0, that’s supposed to the same quality as FP16). That way it should be considerably faster… At least twice as fast.

(The GGUF page of that model has a list of recommended quantization levels.)

raffa · 5 months ago

thanks for the tips!