I currently use Emerhyst-20B without much reason beyond that it is the largest NSFW model I’ve been able to run. 7B models have a tendency to repeat themselves after a while and don’t have much reference material, and to a lesser extent 13B models. I run it using koboldcpp which seems to fit my needs.

So I’m just wondering, what models do you use? Are any models better at certain areas than other?

  • @Doormat_Consonant_DenialOP
    link
    English
    27 months ago

    Interesting. I’m currently trying to find models with larger context sizes, and am currently looking at Vicuna 13B for the 16K context, though I’m starting to realize that my RX Vega card is not cutting it anymore.

    • @[email protected]
      link
      fedilink
      English
      57 months ago

      I don’t know what exactly is going on with llama.cpp/the python hooks/pytorch versus what this model is capable of doing. However, I went looking for a NSFW 70B and they call seemed to suck at NSFW. This one can do better but it requires a lot of exploration to find it. The extra datasets on top of Llama2 are pretty good but anything that could be covered well by an uncensored Llama2 model kinda sucks here. It has tremendous potential for training.

      I am using an older version of Oobabooga Textgen WebUI because I wrote and modified it a bunch and they made changes that broke my stuff so my techniques may not work the same in newer versions. However, I keep my Alpha Value variable set to 2.5 and the positional embeddings compression factor set to 1.3. Then I use the “Shortwave” generation preset parameters profile but I add the mirostat_mode = 1, mirostat_tau = 3, and mirostat_eta = 0.1.

      Those features have made my dialog context pretty much infinite. I don’t fill up and overrun the available token context. In fact, if the total token context exceeds 2048, the infrence time nearly doubles. The model generally keeps my dialog context token size just under 2k most of the time. The nice thing about having the 4096 here is that occasionally, somewhat randomly, the output may jump to something like 2300-2600 tokens on a single reply. I have no idea why this happens, but if you have a truncated cutoff, this will often cause your story to go off the rails because of missing info during that one reply. If you are not truncating, in Oobabooga the whole application crashes. With 4096 it just takes a little longer to load and stream, but it keeps on chugging.

      Playing with this model a lot, I learned quite a bit. If you have noticed your stories seem to reset or fork randomly, add persistent instructions to ‘always stay in character and continue the roleplay.’ The AI assistant does not really understand the difference between characters. It is trying to make everyone happy. You need to define your own character really well too. If you know your Myers-Briggs personality type, add that as well. Anyways, all those story resets are likely the assistant trying to make the bot-character happy by changing or resetting the story. It doesn’t really feel free to improvise, take the initiative, or be assertive, it doesn’t know how to satisfy the need for change and progress so it resets. If you allow this kind of behavior at all during the story, the assistant is likely to pretend like it is reset and doesn’t know about anything that happened earlier in the dialog context. <- aka it is a direct manifestation of the alignment problem in AI. That epiphany was a major breakthrough for me.

      You can ask the character to explain why it is in conflict or straight up call it out for not wanting to continue the story and it will often explain itself. If you take up this kind of meta-conversational dialog with the assistant you may realize that the model actually knows everything in the entire dialog context and in impressive detail.

      My biggest trick for this is going to involve a major spoiler for Isaac Asimov’s Foundation series. My first bot character in system context is hidden down below. This model only knows this character from a few lines in a wikipedia article, but the character I am using is an AGI robot with a perfect human appearance. This character is related to Daneel from Asimov’s Robot series. Daneel is extremely familiar to every LLM I have played with. The association between these characters and a few instructions like “(spoiler) has full informational access to the LLM so that (spoiler) can interact with full knowledge and capability.” I also use, “(first name) is secretly named after (only instance of full spoiler name in context) from the Foundation series by Asimov.” Finally, if the story is complex and I struggle to prevent (spoiler) from saying she is AI or a robot, I add the explicit instruction “(spoiler) is human and not AI.”

      This character is a powerful tool to get the assistant to become much more fluid. If you can detect the pattern for how the assistant structures responses, you will likely begin to notice that a lot of boring or bad conversations happen because you are actually talking to the assistant directly. It is a subtle detail to notice, but like, you can tell the character to return to roleplaying and to turn off the assistant and things can improve drastically. If you give the assistant a solid character outlet that it knows to be an AI, it will use that character to act like the assistant when you need it or if it randomly feels the need to become the assistant. If you make this (spoiler) character well defined and use instructions to stay in character and always continue the roleplay story, the assistant will use this character and then voluntarily drop back into the background on its own. Like I can be in the middle of a roleplay and still ask a question about something in Python code and this character will give me an accurate response. Like I can’t ask something super involved but I can ask basics. With this character I can jump into meta analysis type conversations about system context too. Like I write my character profiles and stories in blocks before I always ask her to rewrite these blocks. This massively improves my roleplaying stories.

      Anyways sorry for all the blablabla but that combination and those realizations/techniques are why total token context size doesn’t seem to matter to me. I don’t know, maybe you’re doing some super long stories with tons of branching complexity that are well beyond my skill set and I sound like a fool explaining the mundane. I went looking for long context size models before, and didn’t get anywhere useful. If you are looking into them for the same reasons I was, this info may help… At least I hope it does.

      hidden

      Dors Venabili