Engineering olmo-eval brings statistical discipline to LLM loops olmo-eval is an open workbench for iterative LLM evaluation. It makes tiny checkpoint gains harder to mistake for progress. Lars Cornelissen · Jun 13, 2026