Tag

#reinforcement learning

1 story tagged reinforcement learning.

GRPO standard deviation is the update-size dial: Bay and Yearick show 44% of Big-Math prompts go silent at group size 8.

Lars Cornelissen · Jul 2, 2026