I’m interested in hosting something like this, and I’d like to know experiences regarding this topic.

The main reason to host this for privacy reasons and also to integrate my own PKM data (markdown files, mainly).

Feel free to recommend me videos, articles, other Lemmy communities, etc.

  • exu
    link
    fedilink
    English
    311 months ago

    If you’re using llama.cpp, have a look at the GGUF models by TheBloke on huggingface. He puts approximate RAM required in the readme based on the quantisation level.

    From personal experience I’d estimate 12G for 7B models based on how full RAM was with 16 gigs. For mixtral at least 32G.

    • @ReallyActuallyFrankenstein
      link
      English
      111 months ago

      Thanks, appreciate it (I’m new to local text CPU models, I know it was a stupid question).