Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 6 - LLM Reasoning // TRAIN BRAIN

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 6 - LLM Reasoning

For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education
November 7, 2025
This lecture covers:
• Reasoning models
• RL for reasoning
• GRPO
• Scaling
To follow along with the course schedule and syllabus, visit: https://cme295.stanford.edu/syllabus/
Chapters:
00:00:00 Introduction
00:12:43 Reasoning models
00:27:49 Benchmarks
00:32:04 Pass@k metric
00:48:07 Scaling with RL
00:57:44 GRPO
01:06:03 Comparison between GRPO and PPO
01:16:14 Length bias
01:25:00 DAPO, Dr. GRPO
01:29:38 DeepSeek R1 recipe
Afshine Amidi is an Adjunct Lecturer at Stanford University.
Shervine Amidi is an Adjunct Lecturer at Stanford University.

Stanford Online

You can gain access to a world of education through Stanford Online, the Stanford School of Engineering’s portal for academic and professional education offered by schools and units throughout Stanford University. https://online.stanford.edu/ Our robust ...

Stanford CS547 HCI Seminar | Winter 2026 | Visual and Algorithmic Interpretation for Responsible AI

Stanford Robotics Seminar ENGR319 | Winter 2026 | Robot Motion Learning w/Physics-Based PDE Priors

Stanford CS193p: iOS Development with SwiftUI | 2025 | L16: Shapes, Gestures, Persistence

Stanford CS193p: iOS Development with SwiftUI | 2025 | L15: Multithreading

Stanford CS193p: iOS Development with SwiftUI | 2025 | L14: SwiftData Demonstration

Stanford CS193p: iOS Development with SwiftUI | 2025 | L13: SwiftData

Climate Innovation: Digital Technology for Sustainable Agriculture and Infrastructure Development

Stanford Webinar - AI and Climate: From Grids to Data Centers — AI Strategy & Innovation

Stanford Robotics Seminar ENGR319 | Autumn 2025 | The Dynamics of Fluid Physical Interaction

Stanford CS547 HCI Seminar | Autumn 2025 | Tracing and Shaping Paths in Design Space

Stanford CS230 | Autumn 2025 | Lecture 9: Career Advice in AI

Stanford CS230 | Autumn 2025 | Lecture 10: What’s Going On Inside My Model?

Stanford Robotics Seminar ENGR319 | Autumn 2025 | Next Generation Dexterous Manipulation

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 1: Class Intro

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 2: Imitation Learning

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 3: Policy Gradients

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 4: Actor-Critic Methods

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 5: Off-Policy Actor Critic

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 6: Q-Learning

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 7: Offline RL

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 8: Reward Learning

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 9: RL for LLMs

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 10: RL for LLM Reasoning

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 11: Model-Based RL

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 12: Multi-Task RL

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 13: Meta RL

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 14: Exploration

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 15: Hierarchical RL and IL

Stanford CS193p: iOS Development with SwiftUI | 2025 | L12: Even More Complex UIs

Stanford CS193p: iOS Development with SwiftUI | 2025 | L11: iPad and Mac

Stanford CS193p: iOS Development with SwiftUI | 2025 | L10: Building Complex UIs

Stanford CS193p: iOS Development with SwiftUI | 2025 | L9: Protocols

Stanford CS193p: iOS Development with SwiftUI | 2025 | L8: Animation Demonstration

Stanford CS193p: iOS Development with SwiftUI | 2025 | L7: Animation

Using AI to Train Junior Leaders: Insights from Raymond Levitt

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

A Key Aspect of Creating a Startup: Insights from Michael Lyons

Stanford Robotics Seminar ENGR319 | Autumn 2025 | The Graph Physical AI Approach

Top 5 Generative AI Trends: James Landay Reacts and Responds

Stanford Lecture: The Internal Details of TeX82 - Session 12 (July 30, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 11 (July 30, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 10 (July 30, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 9 (July 30, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 8 (July 29, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 7 (July 29, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 6 (July 29, 1982)

0Stanford Lecture: The Internal Details of TeX82 - Session 5 (July 29, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 4 (July 28, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 2 (July 28, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 3 (July 28, 1982)

Stanford Lecture: The Internal Details of TeX82 - Session 1 (July 28, 1982)

Stanford CS547 HCI Seminar | Autumn 2025 | Towards Globally Equitable AI

AI in Healthcare Series: AI as Your Personal Health Partner

Generative AI: Looking Beyond the Hype with James Landay

Stanford CS193p: iOS Development with SwiftUI | 2025 | L6: Demonstrating Data Flow

Stanford CS193p: iOS Development with SwiftUI | 2025 | L5: Layout & Data Flow

Stanford CS193p: iOS Development with SwiftUI | 2025 | L4: CodeBreaker's Model

Stanford Robotics Seminar ENGR319 | Autumn 2025 | Embodied Foundation Models

How AI Has Changed Entrepreneurship in the Last 10 Years: Insights from Michael Lyons

Stanford Robotics Seminar ENGR319 | Autumn 2025 | General Compliant Robot Interaction

Generative AI at a Glance: An Overview from James Landay

Stanford CS230 | Autumn 2025 | Lecture 8: Reinforcement Learning

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 7 - Agentic LLMs

Stanford CS547 HCI Seminar | Autumn 2025 | What Is a (Future) Designer?

Stanford Webinar - A Toolkit for Decision-Making: Navigating Complex Choices

It's Never Too Late to Learn to Code: Insights from Mehran Sahami

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 6 - LLM Reasoning

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 5 - LLM tuning

Stanford CS193p: iOS Development with SwiftUI | 2025 | L3: Model and UI & Swift Type System

Stanford CS193p: iOS Development with SwiftUI | 2025 | L2: Code Breaker App

Stanford CS193p: iOS Development with SwiftUI | 2025 | L1: Intro to Xcode and SwiftUI

Stanford Webinar - AI Safety

Stanford CS547 HCI Seminar | Autumn 2025 | Going Beyond Linear Conversation

Stanford Robotics Seminar ENGR319 | Autumn 2025 | Adaptive Robots

Why Code Breaks - Mehran Sahami, Professor and Chair, Computer Science Department

AI & Cybersecurity: Dan Boneh Interviews Sam Altman

Ethical Dimensions of AI Tools - Mehran Sahami, Professor and Chair, Computer Science Department

Choosing Your Path: AI Professional Program Course Selection Guide

The Journey of Computer Science Education with Mehran Sahami