minimindUnderstand LLM training from scratch

No more black-box training — understand every design choice via controlled experiments

🚀 Start now

📖 Learning Roadmap

💻 View code

A 30 second overview: modular lessons, experiments, and roadmap.

Core Highlights

Truly understand how LLMs are trained

No more blind training — use controlled experiments to understand every design choice.

Principles First

No black boxes — understand the tradeoffs behind every design decision.

Controlled Experiments

Show, don't tell — run experiments to see what breaks without a design.

Modular Learning

From normalization to Transformer — 6 independent modules, progressive and clear.

Low Barrier

Tiny datasets run in minutes on CPU — verify ideas fast and cheaply.

Pick Your Learning Path

Choose the best path for your time and goals

Different paths for different needs — from quick taste to deep mastery.

Quick Start

Use 3 experiments to grasp key LLM design choices. Great for first timers.

30 min

Start Learning

Comprehensive

Systematic Study

Master all Transformer fundamentals with a complete, structured path.

6 hours

Start Learning

Ultimate Challenge

Deep Mastery

Train a full LLM from scratch and go deep into architecture and training.

30+ hours

Start Learning

Module Navigator

From core components to full architecture

Modular learning path — each module is self-contained and can be learned in any order.

Tier 1 · Foundation

Core components — master the building blocks of Transformer

01-normalization Complete

Normalization

Why is normalization necessary? How different are Pre-LN and Post-LN?

2 experiments

02-position-encoding Complete

Position Encoding

Why did RoPE become the default? Is it really better?

4 experiments

03-attention Complete

Attention

What do Q/K/V really do? Are multi-heads necessary or overkill?

3 experiments

04-feedforward Complete

FeedForward

Why can FFN store knowledge? Is 4x expansion optimal?

1 experiment

Tier 2 · Architecture

Assembly — combine components into a full Transformer

01-residual-connection Planned

Residual Connection

The savior for deep nets — or something else?

02-transformer-block Planned

Transformer Block

The golden assembly order — why this one?

Complete: teaching + experiments + quiz

Experiments done: docs in progress

Planned: structure only

Quick Start

30 minutes, three experiments — change how you understand LLM training

⌘Terminal

# 1. Clone the repo

git clone https://github.com/joyehuang/minimind-notes.git

cd minimind-notes

# 2. Activate your virtual environment (if any)

source venv/bin/activate

# 3. Experiment 1: Why normalization?

cd modules/01-foundation/01-normalization/experiments

python exp1_gradient_vanishing.py

# 4. Experiment 2: Why RoPE position encoding?

cd ../../02-position-encoding/experiments

python exp1_rope_basics.py

# 5. Experiment 3: How Attention works?

cd ../../03-attention/experiments

python exp1_attention_basics.py

📊

Gradient Vanishing

Visualize gradient flow in deep networks

🔄

RoPE Encoding

See the math behind rotary position embeddings

🎯

Attention

Visualize attention weight computation

💡 Why choose this tutorial?

🎯 No more “just make it run”

Have you ever followed a tutorial, got the code working, but still didn’t know why? This tutorial uses controlled experiments to show what breaks and why other choices fail.

🔬 Every design choice is backed by experiments

No more armchair theory — each module includes runnable comparison experiments so you can see real effects. Theory + practice, down to the details.

💻 Low barrier for learning experiments

Learning-stage experiments: TinyShakespeare (1MB) and similar micro datasets, runnable on CPU in minutes. Full training: If you want to train a full model from scratch, you will need a GPU (MiniMind original project: single NVIDIA 3090, about 2 hours).

🔗 Resources

📦 Upstream projectjingyaogong/minimind

🗺️ Learning roadmapFull roadmap

💻 Code examplesExecutable examples

📝 Learning notesLearning log · Knowledge base

minimindUnderstand LLM training from scratch

Truly understand how LLMs are trained

Principles First

Controlled Experiments

Modular Learning

Low Barrier

Choose the best path for your time and goals

Quick Start

Systematic Study

Deep Mastery

From core components to full architecture

Normalization

Position Encoding

Attention

FeedForward

Residual Connection

Transformer Block

Quick Start

Gradient Vanishing

RoPE Encoding

Attention

💡 Why choose this tutorial? ​

🔗 Resources ​

💡 Why choose this tutorial?

🔗 Resources