Obviously there’s not a lot of love for OpenAI and other corporate API generative AI here, but how does the community feel about self hosted models? Especially stuff like the Linux Foundation’s Open Model Initiative?

I feel like a lot of people just don’t know there are Apache/CC-BY-NC licensed “AI” they can run on sane desktops, right now, that are incredible. I’m thinking of the most recent Command-R, specifically. I can run it on one GPU, and it blows expensive API models away, and it’s mine to use.

And there are efforts to kill the power cost of inference and training with stuff like matrix-multiplication free models, open source and legally licensed datasets, cheap training… and OpenAI and such want to shut down all of this because it breaks their monopoly, where they can just outspend everyone scaling , stealiing data and destroying the planet. And it’s actually a threat to them.

Again, I feel like corporate social media vs fediverse is a good anology, where one is kinda destroying the planet and the other, while still niche, problematic and a WIP, kills a lot of the downsides.

  • @scarabine
    link
    English
    51 month ago

    I’m most excited where it’s most open. Clear training process, legal data sets, fully open code bases, published reports, etc. I think we’re going to see the local models boom in sophistication once that’s more common.

    Do you know of any good local models that fit that kind of description?

    • ArchRecord
      link
      fedilink
      English
      31 month ago

      I don’t know of any super high-quality ones that run well, but the Open Assistant project, (now archived) collected responses from voluntary participants (myself included) to build what is now considered a very high-quality dataset of chat conversation pairs, truly open source, and all voluntarily submitted instead of scraped.

      The models are reasonable for fine-tuning, but aren’t very good compared to newer models from large companies.

    • @[email protected]OP
      link
      fedilink
      21 month ago

      Cutting edge ones? Unfortunately, rarely. Right now there’s a sliding scale between “open and transparent” and “smart and performant” because they’re just so darn expensive to train.

      I think some of the closest ones to your requirements are Nvidia’s research models, excluding Mistral Nemo which isn’t as well documented (as its really a Mistral Model). And you can see a lot of the open “alternative” efforts like RWKV, openllama and such are severely underfunded and undertrained.

      The datasets are there, the highly optimized implementations are getting there, pieces are there, a lot of of models have detailed papers, fully open codebases, but the funding to actually do it is just too much to deal with most of the time.

      Another factor is that “closed” datasets like whatever Mistral, Facebook, Cohere and such use do seem to have an edge.