LLM
AI Fundamentals
Limit of RLVR
5/28/2025 • limit-of-rlvr.github.io

Reasoning LLMs Are Just Efficient Samplers: RL Training Elicits No Transcending Capacity
Read Full Article...C4AIL Commentary
[…] RLVR narrows the model’s exploration, favoring known high-reward paths instead of discovering new reasoning strategies. Crucially, all correct solutions from RL-trained models already exist in the base model’s distribution, proving RLVR enhances sampling efficiency, not reasoning capacity, while inadvertently shrinking the solution space.
Project: https://limit-of-rlvr.github.io/ Paper: https://arxiv.org/abs/2504.13837