Book · Advanced · 35+ hours

Advanced Reinforcement Learning

Volume II — Alignment, Multi-Agent, and Frontier

The frontier half of the RL curriculum: imitation and offline RL (CQL, IQL, Decision Transformer), hierarchical and meta-RL, multi-agent (MADDPG, QMIX, MAPPO, PSRO), RL for language models (RLHF, DPO, GRPO, DAPO, DeepSeek-R1), distributed engineering, and six capstone projects. Volume 2 of 2 — assumes the foundations covered in Volume 1.

19Chapters

92Sections

34hReading

5Parts

Start chapter 01 Browse curriculum

Part IImitation & Offline RL17 Part IIBeyond Single-Agent15 Part IIIRL for Language Models21 Part IVEngineering & Applications14 Part VCapstone Projects25

Part I·3 chapters · 17 sections

Imitation & Offline RL— Learning from data, not interaction.

Back to top

Imitation and Inverse RL

Learning from demonstrations and inferring rewards

5 sections106 min read

Advanced Reinforcement Learning

Imitation & Offline RL— Learning from data, not interaction.

Imitation and Inverse RL

Offline Reinforcement Learning

Sequence-Modeling Reinforcement Learning

Beyond Single-Agent— Hierarchy, meta-learning, and multi-agent.

Hierarchical and Goal-Conditioned RL

Meta-RL

Multi-Agent Reinforcement Learning

RL for Language Models— RLHF, DPO, GRPO, and reasoning.

RLHF Foundations

Direct Preference Optimization (DPO)

Reasoning RL: GRPO and Verifiable Rewards

RLAIF and Constitutional AI

Engineering & Applications— Scaling RL and shipping it.

RL Engineering at Scale

RL in the Real World

Evaluation and Benchmarking

Capstone Projects— Six end-to-end agents.

Capstone: Rainbow on Atari

Capstone: SAC on Humanoid

Capstone: AlphaZero on Connect Four

Capstone: DreamerV3 on Crafter

Capstone: RLHF + DPO on a Small Language Model

Capstone: GRPO for GSM8K Math Reasoning

Where the book lands in practice.