Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks
For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education
October 3, 2025
This lecture covers:
• Attention approximation
• MHA, MQA, GQA
• Position embeddings (regular, learned)
• RoPE and applications
• Transformer-based architectures
• BERT and its derivatives
To follow along with the course schedule and syllabus, visit: https://cme295.stanford.edu/syllabus/
Chapters:
00:00:00 Introduction
00:01:30 Recap of Transformers
00:10:37 Overview of position embeddings
00:15:36 Sinusoidal embeddings
00:25:56 T5 bias, ALiBi
00:31:02 RoPE
00:43:42 Layer normalization
00:50:39 Sparse attention
00:55:38 Sharing attention heads
01:02:42 Transformer-based models
01:11:38 BERT deep dive
01:33:24 BERT finetuning
01:43:30 Extensions of BERT
Afshine Amidi is an Adjunct Lecturer at Stanford University.
Shervine Amidi is an Adjunct Lecturer at Stanford University.
Stanford Online
You can gain access to a world of education through Stanford Online, the Stanford School of Engineering’s portal for academic and professional education offered by schools and units throughout Stanford University. https://online.stanford.edu/ Our robust ...