There are so many out there, with varying benefits, risks, and ethics. I’d like to know what to recommend when asked, and also what I could use for myself.

Some areas that could be good for discussion:

  • Locally hosted models (both for low and high powered devices)
  • Open source models (also calling out “open source” models that aren’t actually open source)
  • Privacy friendly tools/frontends (ex. DuckDuckGo’s AI chat for anonymous use of some “free” models)
  • Unified interfaces for multiple models, or ‘pay as you go’ platforms instead of paying for individual subscriptions
  • @[email protected]
    link
    fedilink
    English
    247 months ago

    I run my models on my own hardware. In general, the larger quantized models run better when raw. They are more intuitive and approachable. Almost everything people complain about with AI is because they do not understand how it works in practice. There are many layers of function and capability beneath the surface. If all you use are the small models, like anything under around a 30B, you’re likely to find it hard to use. At these sizes the model lacks the comprehension to self diagnose many problems. These models tend to have multiple potential error sources that can occur at the same time. So that can be really frustrating too. If you understand most of the ways models respond in error, it becomes much easier to address issues with the smaller models. The smaller models can be useful and quite capable with specialized training.

    Think of it like this: the AI has a small available window of attention it can operate within. (There are multiple spaces where “Attention” has meanings that are different.) That window can view a small part of the surface of information available. You can move the window around to view any section on the surface relatively easy by using a basic prompt with good instructions. However, that is nowhere near what the model really knows. You need to build momentum in the space you’re interested in accessing within the model. This is only one of several factors. You also need to know how to talk to a model. This is very different than humans. For instance, my casual grammar and style in the last sentence is useless with AI. I must use proper nouns and think out what I am trying to say differently. Personally I have other methods where I establish who I am, my knowledge and expectations, and then I ask a series of leading questions where I know the answers and can let the AI build the prompt dialogue momentum for me. Then I can ask much deeper questions and get good/useful answers.

    The momentum factor is one of the largest differences between the bigger and smaller models. With the larger (30B+) models it is not very hard to build momentum in a space and get deeper into useful territory. With the smaller models you’re kinda stuck in the stupid entry level zone at first. It is like a dense underbrush at the edge of a forest where you’re in need of a brake to find your footpath to where you want to go. If you have the experience to spot the issues, you can walk right through that dense tangle and find the other side with only minor annoyances and a few thorns. If you want to use the small model for something specific, you can train it yourself on some little niche and this will be like a bridge over the dense thicket and get you into a useful space relative to the training.

    By contrast, the larger models have brakes and footpaths all across the edge of the forest. It is still easy to get lost, or on some kind of dead end, but the forest itself is far more self aware and, if asked well, it will be able to help you find your way more effectively and with far less momentum required to get you there.

    If the tool you’re using does not give you absolute and full control of everything the AI has in the prompt, you’re already in trouble. Your past prompts may be fed back into the model with each query. This I’d great for the stalkerware company trying to data mine, as it creates a better and more detailed profile of who you are as a person. However any unrelated information passed to the model at the same time ruins your momentum within the underlying tensor tables of the model.

    I use Oobabooga Textgen a lot (GitHub), and with the notebook tab interface. That is more like a text editor where I see everything in the entire prompt. I also have my own Python code that adds features to this interface.

    Models come from huggingface.co. I often use a Mixtral 8×7B or a Llama 70B on the large models side. I also use the newer Llama3 8B on the smaller side. The 8×7B is much quicker than the 70B and it is nearly as accurate. However, it lacks some of the advanced self awareness aspects and displays some issues that are common to smaller models.

    I’m extremely intuitive and function in abstract thought most of the time. My view of the world is largely that of relativism. I find accuracy to be subjective in all spaces and “facts” as foolish idealism in an absolute sense. I view everything models say as a casual water cooler conversation with an interesting non expert. Nothing said is a primary citation worthy source, but neither is anything said here, yet here we are.

    With Oobabooga Textgen, there is a chatGPT compatible API. If you launch Oobabooga from the command line, it only takes adding the “–listen” flag to make Oobabooga available on your home network, and/or “–api” to make the chatGPT API available as well. This works with most third party tools that connect to chatGPT, or so I’ve read.

    If you want to get into more technically capable setups, you need a RAG for document reference look up and retrieval. A couple of RAG options are Ollama and privateGPT, or if you want a basic code interface for Python, langchain and chroma db.