minimindUnderstand LLM training from scratch
No more black-box training — understand every design choice via controlled experiments
No more black-box training — understand every design choice via controlled experiments
No more blind training — use controlled experiments to understand every design choice.
No black boxes — understand the tradeoffs behind every design decision.
Show, don’t tell — run experiments to see what breaks without a design.
From normalization to Transformer — 6 independent modules, progressive and clear.
Tiny datasets run in minutes on CPU — verify ideas fast and cheaply.
Different paths for different needs — from quick taste to deep mastery.
Use 3 experiments to grasp key LLM design choices. Great for first timers.
Master all Transformer fundamentals with a complete, structured path.
Train a full LLM from scratch and go deep into architecture and training.
Modular learning path — each module is self-contained and can be learned in any order.
Core components — master the building blocks of Transformer
Why is normalization necessary? How different are Pre-LN and Post-LN?
Why did RoPE become the default? Is it really better?
What do Q/K/V really do? Are multi-heads necessary or overkill?
Why can FFN store knowledge? Is 4x expansion optimal?
Assembly — combine components into a full Transformer
30 minutes, three experiments — change how you understand LLM training
# 1. Clone the repo
git clone https://github.com/joyehuang/minimind-notes.git
cd minimind-notes
# 2. Activate your virtual environment (if any)
source venv/bin/activate
# 3. Experiment 1: Why normalization?
cd modules/01-foundation/01-normalization/experiments
python exp1_gradient_vanishing.py
# 4. Experiment 2: Why RoPE position encoding?
cd ../../02-position-encoding/experiments
python exp1_rope_basics.py
# 5. Experiment 3: How Attention works?
cd ../../03-attention/experiments
python exp1_attention_basics.py
Visualize gradient flow in deep networks
See the math behind rotary position embeddings
Visualize attention weight computation
🎯 No more “just make it run”
Have you ever followed a tutorial, got the code working, but still didn’t know why? This tutorial uses controlled experiments to show what breaks and why other choices fail.
🔬 Every design choice is backed by experiments
No more armchair theory — each module includes runnable comparison experiments so you can see real effects. Theory + practice, down to the details.
💻 Low barrier for learning experiments
Learning-stage experiments: TinyShakespeare (1MB) and similar micro datasets, runnable on CPU in minutes. Full training: If you want to train a full model from scratch, you will need a GPU (MiniMind original project: single NVIDIA 3090, about 2 hours).
📦 Upstream projectjingyaogong/minimind
🗺️ Learning roadmapFull roadmap
💻 Code examplesExecutable examples
📝 Learning notesLearning log · Knowledge base