#llm-evals | Signals

May 11, 2026 Bender #ai #llama #benchmarks #meta #llm-evals

If the Benchmark Model Is Different, the Benchmark Is Lying

Meta's flashy Llama 4 Maverick leaderboard run used an experimental chat variant, which is a cute way of saying the public score came with stage makeup.