Research GRPO standard deviation is the reasoning RL dial now GRPO standard deviation is the update-size dial: Bay and Yearick show 44% of Big-Math prompts go silent at group size 8. Lars Cornelissen · Jul 2, 2026