📚 Systematic Study (6 hours)
Master all core Transformer components
🎯 Learning goals
After 6 hours you will be able to:
- ✅ Understand all core Transformer components
- ✅ Explain design choices via controlled experiments
- ✅ Implement a simple Transformer from scratch
📋 Learning path
Stage 1: Foundation (5.5 hours)
Study the four core modules in order:
1. Normalization (1 hour)
What to do:
- 📖 Read teaching.md (30 min)
- 🔬 Run all experiments (20 min)
- 📝 Finish quiz.md (10 min)
Completion criteria:
- [ ] Explain gradient vanishing/explosion
- [ ] Implement RMSNorm from scratch
- [ ] Understand Pre-LN vs Post-LN
2. Position Encoding (1.5 hours)
What to do:
- 📖 Read teaching.md (40 min)
- 🔬 Run experiments 1-3 (40 min)
- 📝 Self-check (10 min)
Completion criteria:
- [ ] Understand permutation invariance in Attention
- [ ] Explain the rotation idea behind RoPE
- [ ] Understand the role of multi-frequency components
3. Attention (2 hours)
What to do:
- 🔬 Run all experiments (1.5 hours)
- 💻 Read the source code (30 min)
Completion criteria:
- [ ] Understand the roles of Q, K, and V
- [ ] Understand the benefits of multi-head attention
- [ ] Understand GQA (Grouped Query Attention)
4. FeedForward (1 hour)
What to do:
- 🔬 Run the experiments (40 min)
- 💻 Understand the SwiGLU activation (20 min)
Completion criteria:
- [ ] Understand the expand-compress pattern in FFN
- [ ] Understand the division of labor: Attention vs FFN
- [ ] Implement SwiGLU from scratch
Stage 2: Architecture (0.5 hours)
What to do:
- 📖 Read Architecture README (30 min)
- Understand how components assemble into a Transformer block
Completion criteria:
- [ ] Draw the data flow of a Pre-LN Transformer block
- [ ] Understand the role of residual connections
- [ ] Implement a Transformer block from scratch
🎯 Checklist
After finishing Systematic Study, make sure you can:
Foundation modules
- [ ] ✅ Complete Normalization
- [ ] ✅ Complete Position Encoding
- [ ] ✅ Complete Attention
- [ ] ✅ Complete FeedForward
Practical skills
- [ ] ✅ Implement a Transformer block from scratch
- [ ] ✅ Pass all module quizzes
- [ ] ✅ Explain each design choice
📚 Next steps
Want to go deeper?
- 🎓 Deep Mastery (30 hours) - train a full LLM from scratch
- 📝 Record notes - track your learning progress
- 🗺️ Full roadmap - view the complete learning path