Reinforcement Learning
Temporal-difference learning, policy gradients, off-policy methods (DQN, PPO, GRPO), and the theory underneath. The mathematical core that the rest of the division extends, and the substrate behind every applied piece the lab ships.
- TD · SARSA · Q-Learning
- DQN · PPO · GRPO
- Off-policy methods
