Reading Group on Natural and Artificial Reinforcement Learning
Reading group course (2025) where each session centered on a seminal paper in reinforcement learning, spanning animal learning, cognitive neuroscience, and modern machine learning.
Associative learning foundations: showed that conditioning goes beyond simple stimulus-response pairings, emphasizing richer statistical and structural relationships in learning.
Policy-gradient foundations (REINFORCE): introduced unbiased gradient estimators for policy optimization and established a core model-free paradigm still used today.
Temporal-difference milestone: an early high-impact value-based RL success, demonstrating that self-play and TD updates could reach expert-level behavior in complex games.
Cortico-BG-PFC loops in human RL: a biologically grounded actor-critic account of working-memory gating and control, bridging neural circuits with computational RL.
Hierarchical and neurosymbolic RL in humans: framed social cognition as Bayesian inverse planning, inferring latent goals from observed actions.
DQN revolution: combined deep neural networks with Q-learning and replay/target stabilization to learn strong Atari control directly from pixels.
Hierarchical RL with temporal abstraction: learned options end-to-end, including intra-option policies and termination, making hierarchy part of optimization.
Hippocampal replay and model-based RL: proposed a normative replay-prioritization rule linking planning utility to forward/backward replay patterns observed in biology.
Advanced model-based RL (DreamerV2): learned a world model and optimized behavior in imagined trajectories, showing strong Atari performance with latent-space planning.
Prospective and retrospective RL in brain circuits: highlighted successor and predecessor style representations to explain forward- and backward-looking credit assignment.
RL in large language models (GRPO): showed how group-relative policy optimization can improve mathematical reasoning while keeping training efficient.