Skip to content

📚 Systematic Study (6 hours)

Master all core Transformer components

🎯 Learning goals

After 6 hours you will be able to:

  • ✅ Understand all core Transformer components
  • ✅ Explain design choices via controlled experiments
  • ✅ Implement a simple Transformer from scratch

📋 Learning path

Stage 1: Foundation (5.5 hours)

Study the four core modules in order:

1. Normalization (1 hour)

What to do:

  • 📖 Read teaching.md (30 min)
  • 🔬 Run all experiments (20 min)
  • 📝 Finish quiz.md (10 min)

Completion criteria:

  • [ ] Explain gradient vanishing/explosion
  • [ ] Implement RMSNorm from scratch
  • [ ] Understand Pre-LN vs Post-LN

Start learning →


2. Position Encoding (1.5 hours)

What to do:

  • 📖 Read teaching.md (40 min)
  • 🔬 Run experiments 1-3 (40 min)
  • 📝 Self-check (10 min)

Completion criteria:

  • [ ] Understand permutation invariance in Attention
  • [ ] Explain the rotation idea behind RoPE
  • [ ] Understand the role of multi-frequency components

Start learning →


3. Attention (2 hours)

What to do:

  • 🔬 Run all experiments (1.5 hours)
  • 💻 Read the source code (30 min)

Completion criteria:

  • [ ] Understand the roles of Q, K, and V
  • [ ] Understand the benefits of multi-head attention
  • [ ] Understand GQA (Grouped Query Attention)

Start learning →


4. FeedForward (1 hour)

What to do:

  • 🔬 Run the experiments (40 min)
  • 💻 Understand the SwiGLU activation (20 min)

Completion criteria:

  • [ ] Understand the expand-compress pattern in FFN
  • [ ] Understand the division of labor: Attention vs FFN
  • [ ] Implement SwiGLU from scratch

Start learning →


Stage 2: Architecture (0.5 hours)

What to do:

  • 📖 Read Architecture README (30 min)
  • Understand how components assemble into a Transformer block

Completion criteria:

  • [ ] Draw the data flow of a Pre-LN Transformer block
  • [ ] Understand the role of residual connections
  • [ ] Implement a Transformer block from scratch

🎯 Checklist

After finishing Systematic Study, make sure you can:

Foundation modules

  • [ ] ✅ Complete Normalization
  • [ ] ✅ Complete Position Encoding
  • [ ] ✅ Complete Attention
  • [ ] ✅ Complete FeedForward

Practical skills

  • [ ] ✅ Implement a Transformer block from scratch
  • [ ] ✅ Pass all module quizzes
  • [ ] ✅ Explain each design choice

📚 Next steps

Want to go deeper?

Built on MiniMind for learning and experiments