Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures
Learn more details about this course: https://online.stanford.edu/courses/cme296-diffusion-and-large-vision-models
To follow along with the course schedule and syllabus, visit: https://cme296.stanford.edu/syllabus/
Chapters:
00:00:00 Introduction
00:05:26 Objective
00:09:58 Convolutions, filters
00:14:44 Receptive field
00:17:14 Pooling
00:19:06 U-Net
00:27:52 Timestep representation
00:30:31 Class label representation
00:33:21 Timeline of U-Net models
00:35:43 Diffusion Transformer (DiT)
00:48:08 Adaptive layer normalization (adaLN)
01:02:30 DiT end-to-end example
01:12:57 Multimodal DiT (MM-DiT)
01:23:33 Qwen-Image, Z-Image, FLUX.1
01:24:27 Timeline of DiT models
01:25:25 Absolute position embeddings
01:38:48 Rotary position embeddings (RoPE)
01:39:59 2D RoPE variants
For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education
Afshine Amidi is an Adjunct Lecturer at Stanford University.
Shervine Amidi is an Adjunct Lecturer at Stanford University.
View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNdy8rt2rZ4T2xM0OjADnfu
Stanford Online
You can gain access to a world of education through Stanford Online, the Stanford School of Engineering’s portal for academic and professional education offered by schools and units throughout Stanford University. https://online.stanford.edu/ Our robust ...