📚 Systematic Study (6 hours)

Master all core Transformer components

🎯 Learning goals

After 6 hours you will be able to:

✅ Understand all core Transformer components
✅ Explain design choices via controlled experiments
✅ Implement a simple Transformer from scratch

📋 Learning path

Stage 1: Foundation (5.5 hours)

Study the four core modules in order:

1. Normalization (1 hour)

What to do:

📖 Read teaching.md (30 min)
🔬 Run all experiments (20 min)
📝 Finish quiz.md (10 min)

Completion criteria:

[ ] Explain gradient vanishing/explosion
[ ] Implement RMSNorm from scratch
[ ] Understand Pre-LN vs Post-LN

Start learning →

2. Position Encoding (1.5 hours)

What to do:

📖 Read teaching.md (40 min)
🔬 Run experiments 1-3 (40 min)
📝 Self-check (10 min)

Completion criteria:

[ ] Understand permutation invariance in Attention
[ ] Explain the rotation idea behind RoPE
[ ] Understand the role of multi-frequency components

Start learning →

3. Attention (2 hours)

What to do:

🔬 Run all experiments (1.5 hours)
💻 Read the source code (30 min)

Completion criteria:

[ ] Understand the roles of Q, K, and V
[ ] Understand the benefits of multi-head attention
[ ] Understand GQA (Grouped Query Attention)

Start learning →

4. FeedForward (1 hour)

What to do:

🔬 Run the experiments (40 min)
💻 Understand the SwiGLU activation (20 min)

Completion criteria:

[ ] Understand the expand-compress pattern in FFN
[ ] Understand the division of labor: Attention vs FFN
[ ] Implement SwiGLU from scratch

Start learning →

Stage 2: Architecture (0.5 hours)

What to do:

📖 Read Architecture README (30 min)
Understand how components assemble into a Transformer block

Completion criteria:

[ ] Draw the data flow of a Pre-LN Transformer block
[ ] Understand the role of residual connections
[ ] Implement a Transformer block from scratch

🎯 Checklist

After finishing Systematic Study, make sure you can:

Foundation modules

[ ] ✅ Complete Normalization
[ ] ✅ Complete Position Encoding
[ ] ✅ Complete Attention
[ ] ✅ Complete FeedForward

Practical skills

[ ] ✅ Implement a Transformer block from scratch
[ ] ✅ Pass all module quizzes
[ ] ✅ Explain each design choice

📚 Next steps

Want to go deeper?

🎓 Deep Mastery (30 hours) - train a full LLM from scratch
📝 Record notes - track your learning progress
🗺️ Full roadmap - view the complete learning path

📚 Systematic Study (6 hours) ​

🎯 Learning goals ​

📋 Learning path ​

Stage 1: Foundation (5.5 hours) ​

1. Normalization (1 hour) ​

2. Position Encoding (1.5 hours) ​

3. Attention (2 hours) ​

4. FeedForward (1 hour) ​

Stage 2: Architecture (0.5 hours) ​

🎯 Checklist ​

Foundation modules ​

Practical skills ​

📚 Next steps ​

📚 Systematic Study (6 hours)

🎯 Learning goals

📋 Learning path

Stage 1: Foundation (5.5 hours)

1. Normalization (1 hour)

2. Position Encoding (1.5 hours)

3. Attention (2 hours)

4. FeedForward (1 hour)

Stage 2: Architecture (0.5 hours)

🎯 Checklist

Foundation modules

Practical skills

📚 Next steps