Train Real-World Long-Horizon Agents via Reinforcement Learning

Most existing RL frameworks are optimized for tasks that involve stateless interactions over short horizons, such as search-augmented reasoning or simple code execution. In contrast, real-world tasks, like those represented in SWE-Bench, benefit from long-horizon planning in stateful, dynamic environments. This presents new challenges in both infrastructure and training algorithms.
We introduce SkyRL, our RL training pipeline for multi-turn tool use LLMs, optimized for long-horizon, real-environment tasks like SWE-Bench, built on top of VeRL and OpenHands.
Using SkyRL, we are able to achieve promising results on SWE-Bench-Verified across model lines, using around 300 samples of training data!
- SkyRL-Agent-7B-v0 from OpenHands-7B-Agent: 11.0% → 14.6%
- SkyRL-Agent-8B-v0 from Qwen3-8B (no thinking): 3.6% → 9.4%
- SkyRL-Agent-14B-v0 from Qwen3-14B (thinking-enabled): 18.0% → 21.6%
Contributors
Shiyi Cao, Sumanth Hegde, Dacheng Li, Tyler Griggs, Shu Liu, Eric Tang, Jiayi Pan, Xingyao Wang, Akshay Malik, Graham Neubig, Kourosh Hakhamaneshi, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica