Sky Seminar: Yi Wu (Tsinghua) – Towards Building Efficient and Scalable Reinforcement Learning Systems for LLMs

Speaker: Yi Wu
Location: Soda Hall, 510
Date: April 17th, 2025
Time: 12 – 1 pm PT

Title: Towards Building Efficient and Scalable Reinforcement Learning Systems for LLMs

Abstract:
Reinforcement Learning (RL) has become a new engine for recent AI breakthroughs. Different from canonical machine learning where a model is trained on a static dataset, RL requires a much more complex process, where the policy iteratively interacts with an environment and self-evolves based on the collected interaction data. Moreover, when the policy becomes large, i.e., an LLM policy, how to design an efficient and scalable system can be more challenging.

In this talk, we will share our attempts on developing large-scale RL systems for LLMs. First, we will introduce ReaLHF, an efficient RLHF system specialized for LLMs. It delivers 3x higher throughput than existing open-source solutions and an 81% performance improvement over heuristic approaches based on Megatron-LM. Then, we will present AReaL, our latest RL system under development. AReaL is built upon RealHF but specialized for large reasoning models. AReaL adopt SGLang for fast inference and adopt various techniques to handle long chain-of-thought outputs. With AReaL and 256 H800 GPUs, we can finish training a SOTA 7B reasoning models within 2 days.

Our projects can be found at https://github.com/openpsi-project/ReaLHF and https://github.com/inclusionAI/AReaL.

Speaker Bio:
Yi Wu is now an assistant professor at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. He obtained his Ph.D. degree from UC Berkeley in 2019 and was a researcher at OpenAI. His research focuses on deep reinforcement learning and multi-agent learning. His representative works include the value iteration network, the MADDPG/MAPPO algorithm, and OpenAI’s hide-and-seek project. He received the best paper award at NIPS2016 and the best demo award finalist at ICRA2024.