Report finds newer inferential models hallucinate nearly half the time while experts warn of unresolved flaws, deliberate deception and a long road to human-level AI reliability
I think the real shocker was the step change between 3 and 4, and the hope that another step change was soon to come. It’s pretty telling that the latest batch of models was fine tuned for vibes and “empathy” rather than raw performance. They’re not getting the next a-ha moment and want to focus their customers on unquantifiables.
It seems logical that this would negatively impact performance and, well, looks like it did.
I think the real shocker was the step change between 3 and 4, and the hope that another step change was soon to come. It’s pretty telling that the latest batch of models was fine tuned for vibes and “empathy” rather than raw performance. They’re not getting the next a-ha moment and want to focus their customers on unquantifiables.
It seems logical that this would negatively impact performance and, well, looks like it did.