logo
LLM
AI Fundamentals

Limit of RLVR

5/28/2025 • limit-of-rlvr.github.io
Limit of RLVR

Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity

Read Full Article...

C4AIL Commentary

[…] RLVR narrows the model’s exploration, favoring known high-reward paths instead of discovering new reasoning strategies. Crucially, all correct solutions from RL-trained models already exist in the base model’s distribution, proving RLVR enhances sampling efficiency, not reasoning capacity, while inadvertently shrinking the solution space.

Project: https://limit-of-rlvr.github.io/ Paper: https://arxiv.org/abs/2504.13837