Book · Advanced · 45+ hours

Reinforcement Learning from Scratch with PyTorch

Volume I — Foundations and Deep RL

Master reinforcement learning from multi-armed bandits to DreamerV3 and MuZero. Derive every algorithm, then implement it in PyTorch: tabular methods, DQN/Rainbow, PPO, SAC, TD3, model-based RL, MCTS, AlphaZero. The classical-through-frontier foundation (Volume 1 of 2).

23Chapters

121Sections

42hReading

8Parts

Start chapter 00 Browse curriculum

Part IFoundations21 Part IITabular Methods21 Part IIIFunction Approximation9 Part IVValue-Based Deep RL17 Part VPolicy Gradient Methods20 Part VIOff-Policy Actor-Critic11 Part VIIExploration6 Part VIIIModel-Based & Planning16

Part I·4 chapters · 21 sections

Foundations— RL framing, bandits, and MDPs.

Back to top

Development Environment

Tools and frameworks for hands-on RL

4 sections52 min read

Reinforcement Learning from Scratch with PyTorch

Foundations— RL framing, bandits, and MDPs.

Development Environment

What Is Reinforcement Learning?

Multi-Armed Bandits

Markov Decision Processes

Tabular Methods— DP, Monte Carlo, TD, Dyna.

Dynamic Programming

Monte Carlo Methods

Temporal-Difference Learning

Planning and Learning with Dyna

Function Approximation— From tables to gradients.

From Tables to Function Approximation

The Policy Gradient Theorem

Value-Based Deep RL— DQN, Rainbow, distributional.

Deep Q-Networks (DQN)

DQN Improvements and Rainbow

Distributional Reinforcement Learning

Policy Gradient Methods— Actor-critic, TRPO, PPO, distributed.

Actor-Critic Methods

Trust-Region Methods

Proximal Policy Optimization (PPO)

Distributed On-Policy RL

Off-Policy Actor-Critic— DDPG, TD3, SAC.

DDPG and TD3

Soft Actor-Critic (SAC)

Exploration— Beyond ε-greedy: RND, ICM, NGU.

Exploration in Deep RL

Model-Based & Planning— Dreamer, MCTS, AlphaZero, MuZero.

Learned World Models

Planning with Search

MuZero and Beyond

121 sections. Begin with one.