Reading Group on Natural and Artificial Reinforcement Learning

Reading group course (2025) where each session centered on a seminal paper in reinforcement learning, spanning animal learning, cognitive neuroscience, and modern machine learning.


Associative learning foundations: showed that conditioning goes beyond simple stimulus-response pairings, emphasizing richer statistical and structural relationships in learning.

Policy-gradient foundations (REINFORCE): introduced unbiased gradient estimators for policy optimization and established a core model-free paradigm still used today.

Temporal-difference milestone: an early high-impact value-based RL success, demonstrating that self-play and TD updates could reach expert-level behavior in complex games.

Cortico-BG-PFC loops in human RL: a biologically grounded actor-critic account of working-memory gating and control, bridging neural circuits with computational RL.

Hierarchical and neurosymbolic RL in humans: framed social cognition as Bayesian inverse planning, inferring latent goals from observed actions.

DQN revolution: combined deep neural networks with Q-learning and replay/target stabilization to learn strong Atari control directly from pixels.

Hierarchical RL with temporal abstraction: learned options end-to-end, including intra-option policies and termination, making hierarchy part of optimization.

Hippocampal replay and model-based RL: proposed a normative replay-prioritization rule linking planning utility to forward/backward replay patterns observed in biology.

Advanced model-based RL (DreamerV2): learned a world model and optimized behavior in imagined trajectories, showing strong Atari performance with latent-space planning.

Prospective and retrospective RL in brain circuits: highlighted successor and predecessor style representations to explain forward- and backward-looking credit assignment.

RL in large language models (GRPO): showed how group-relative policy optimization can improve mathematical reasoning while keeping training efficient.