Reading Group on Natural and Artificial Reinforcement Learning

Reading group course (2025) where each session centered on a seminal paper in reinforcement learning, spanning animal learning, cognitive neuroscience, and modern machine learning.

Pavlovian conditioning: It's not what you think it is (Rescorla, 1988)

Associative learning foundations: showed that conditioning goes beyond simple stimulus-response pairings, emphasizing richer statistical and structural relationships in learning.

Paper

Simple statistical gradient-following algorithms for connectionist reinforcement learning (Williams, 1992)

Policy-gradient foundations (REINFORCE): introduced unbiased gradient estimators for policy optimization and established a core model-free paradigm still used today.

Paper

Temporal difference learning and TD-Gammon (Tesauro, 1995)

Temporal-difference milestone: an early high-impact value-based RL success, demonstrating that self-play and TD updates could reach expert-level behavior in complex games.

Paper

Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia (O'Reilly and Frank, 2006)

Cortico-BG-PFC loops in human RL: a biologically grounded actor-critic account of working-memory gating and control, bridging neural circuits with computational RL.

Paper

Action understanding as inverse planning (Baker, Saxe, and Tenenbaum, 2009)

Hierarchical and neurosymbolic RL in humans: framed social cognition as Bayesian inverse planning, inferring latent goals from observed actions.

Paper

Human-level control through deep reinforcement learning (Mnih et al., 2015)

DQN revolution: combined deep neural networks with Q-learning and replay/target stabilization to learn strong Atari control directly from pixels.

Paper

The Option-Critic Architecture (Bacon, Harb, and Precup, 2017)

Hierarchical RL with temporal abstraction: learned options end-to-end, including intra-option policies and termination, making hierarchy part of optimization.

Paper

Prioritized memory access explains planning and hippocampal replay (Mattar and Daw, 2018)

Hippocampal replay and model-based RL: proposed a normative replay-prioritization rule linking planning utility to forward/backward replay patterns observed in biology.

Paper

Mastering Atari with discrete world models (Hafner et al., 2021)

Advanced model-based RL (DreamerV2): learned a world model and optimized behavior in imagined trajectories, showing strong Atari performance with latent-space planning.

Paper

The learning of prospective and retrospective cognitive maps within neural circuits (Namboodiri and Stuber, 2021)

Prospective and retrospective RL in brain circuits: highlighted successor and predecessor style representations to explain forward- and backward-looking credit assignment.

Paper

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024)

RL in large language models (GRPO): showed how group-relative policy optimization can improve mathematical reasoning while keeping training efficient.

Paper