Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures // TRAIN BRAIN

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Learn more details about this course: https://online.stanford.edu/courses/cme296-diffusion-and-large-vision-models
To follow along with the course schedule and syllabus, visit: https://cme296.stanford.edu/syllabus/
Chapters:
00:00:00 Introduction
00:05:26 Objective
00:09:58 Convolutions, filters
00:14:44 Receptive field
00:17:14 Pooling
00:19:06 U-Net
00:27:52 Timestep representation
00:30:31 Class label representation
00:33:21 Timeline of U-Net models
00:35:43 Diffusion Transformer (DiT)
00:48:08 Adaptive layer normalization (adaLN)
01:02:30 DiT end-to-end example
01:12:57 Multimodal DiT (MM-DiT)
01:23:33 Qwen-Image, Z-Image, FLUX.1
01:24:27 Timeline of DiT models
01:25:25 Absolute position embeddings
01:38:48 Rotary position embeddings (RoPE)
01:39:59 2D RoPE variants
For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education
Afshine Amidi is an Adjunct Lecturer at Stanford University.
Shervine Amidi is an Adjunct Lecturer at Stanford University.
View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNdy8rt2rZ4T2xM0OjADnfu

Stanford Online

You can gain access to a world of education through Stanford Online, the Stanford School of Engineering’s portal for academic and professional education offered by schools and units throughout Stanford University. https://online.stanford.edu/ Our robust ...

Stanford CS153 Frontier Systems | Building the Frontier Ecosystem

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything

Stanford CS547 HCI Seminar | Spring 2026 | The Modern Motivators of Play

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Applied AI

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Guest Lecture: Dan Fu

Stanford Robotics Seminar ENGR319 | Spring 2026 | Leveraging Geometry in Robot Learning

Stanford CS25: Transformers United V6 I From Language Models to Native Multimodal Intelligence

Stanford CS25: Transformers United V6 I Serving Transformers: Lessons from the Trenches

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 17: Alignment - Multimodality

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Stanford CS153 Frontier Systems | The Road Ahead: Resilience Required

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 16: Post-Training - RLVR

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 14: Data

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Infrastructure, Capstone Case

Stanford CS25: Transformers United V6 I Advancing Science and Medicine with Collaborative AI Agents

Stanford CS153 Frontier Systems | The Discipline of Delivering Value per Gigawatt

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Enterprise Internal Knowledge

Stanford MS&E435 | Spring 2026 | Economics of Generative AI

Stanford Robotics Seminar ENGR319 | Spring 2026 | Integrated Learning and Planning

Stanford Robotics Seminar ENGR319 | Spring 2026 | Interactive Autonomy

Stanford CS25: Transformers United V6 I Distinct Modes of Generalization from Parameters and Context

Stanford CS153 Frontier Systems | The AI Native Company: How One Founder Becomes a 1000x Engineer

Stanford CS547 HCI Seminar | Spring 2026 | HCI and Human-Centered AI for Digital Health

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 13: Data (Sources, Datasets)

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 12: Evaluation

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 11: Scaling Laws

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 6 - Model Training

Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence

Stanford CS153 Frontier Systems | Scott Nolan from General Matter on Energy Bottlenecks

Stanford Robotics Seminar ENGR319 | Spring 2026 | Unlocking Autonomous Medical Robotics

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 10: Inference

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Stanford CS25: Transformers United V6 I From Next-Token Prediction to Next-Generation Intelligence

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs

Stanford CS153 Frontier Systems | Ben Horowitz from a16z on Venture Capital Systems, Network Effects

Stanford CS153 Frontier Systems | Nikhyl Singhal from Skip on Product Management in the AI Era

Stanford CS153 Frontier Systems | Amit Jain from Luma AI on Unified Intelligence Systems

Stanford Online AI Programs Top Questions: When and How to Enroll in Online AI Courses

Stanford Online AI Programs Top Questions: Enrolling in Online Courses vs Self Study

Stanford Online AI Programs Top Questions: What's the Learning Experience Like?

Stanford Online AI Programs Top Questions: Ready to Start? Preparing for Success

Stanford Online AI Programs Top Questions: Choosing Your AI Program and Getting Started

Stanford Online AI Programs Top Questions: Graduate vs Professional - Which Is Right for You?

Stanford CS153 Frontier Systems | Andreas Blattmann from Black Forest Labs on Visual Intelligence

Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems

Stanford's Code in Place Info Session with Mehran Sahami

Stanford CS153 Frontier Systems | Anjney Midha from AMP PBC on Frontier Systems

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 9: Scaling Laws

Stanford CS547 HCI Seminar | Spring 2026 | Observing the User Experience in 2026

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 8: Parallelism

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 7: Parallelism

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 6: Kernels, Triton, XLA

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 4 - Latent Space & Guidance

Stanford CS25: Transformers United V6 I On the Tradeoffs of State Space Models and Transformers

Stanford CS25: Transformers United V6 I From Representation Learning to World Modeling

Stanford CS25: Transformers United V6 I Overview of Transformers

Stanford Robotics Seminar ENGR319 | Spring 2026 | Mechanical Intelligence in Locomotion

Stanford Robotics Seminar ENGR319 | Spring 2026 | Robot Learning from Human Experience

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs

Stanford Course - Web Security

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 3 - Flow matching

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 3: Architectures

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 2: PyTorch (einops)

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 1: Overview, Tokenization

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 2 - Score matching

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 1 - Diffusion

Stanford Robotics Seminar ENGR319 | Winter 2026 | Gen Control, Action Chunking, Moravec’s Paradox

Stanford CS547 HCI Seminar | Winter 2026 | Computational Ecosystems

Stanford AA228V I Validation of Safety Critical Systems I Explainability

Manage the competing demands of leadership

Course Overview: Systems Leadership

Stanford CS547 HCI Seminar | Winter 2026 | Visual and Algorithmic Interpretation for Responsible AI

Stanford Robotics Seminar ENGR319 | Winter 2026 | Robot Motion Learning w/Physics-Based PDE Priors

Stanford CS193p: iOS Development with SwiftUI | 2025 | L16: Shapes, Gestures, Persistence

Stanford CS193p: iOS Development with SwiftUI | 2025 | L15: Multithreading