Skip to content

MiniMindLearning Guide

Principles + experiments + practice

Learning modules

Foundation

01 Normalization

Focus: Pre-LN vs Post-LN, why normalization matters Time: 1 hour | Status: Complete

Start learning →


02 Position Encoding

Focus: RoPE and position encoding choices Time: 1.5 hours | Status: Complete

Start learning →


03 Attention

Focus: Q/K/V, multi-head attention Time: 2 hours | Status: Complete

Start learning →


04 FeedForward

Focus: FFN design and SwiGLU Time: 1 hour | Status: Complete

Start learning →

Architecture

Transformer Block Assembly

Focus: assemble components into a Transformer block Time: 2.5 hours | Status: In progress

Open architecture overview →

Quick Start

Quick Start in 30 Minutes

Understand core design choices with 3 experiments

Each experiment takes 5–10 minutes on CPU. Quickly grasp the essentials behind LLM training.

Step 01
5 min

Why normalization?

Observe gradient vanishing and see how RMSNorm stabilizes training.

Normalization
Start Experiments
Step 02
10 min

Why RoPE?

Compare absolute position encoding and learn why RoPE extrapolates better.

Position Encoding
Start Experiments
Step 03
5 min

Why residual connections?

Validate gradient flow issues in deep nets and see the power of residuals.

Residual Connection
Start Experiments

Run the first experiment

bash
git clone https://github.com/joyehuang/minimind-notes.git
cd minimind-notes
source venv/bin/activate
bash
# Experiment 1: Why normalization?
cd modules/01-foundation/01-normalization/experiments
python exp1_gradient_vanishing.py

# What you will see:
# ❌ No normalization: activation std drops (vanishing gradients)
# ✅ RMSNorm: activation std stays stable
bash
# Read teaching notes for the why/what/how
cat modules/01-foundation/01-normalization/teaching.md

Learning principles

✅ Principles first

Run experiments first, then read theory. Focus on why each design choice exists.

🔬 Experiment-driven learning

Each module includes experiments that answer: “What breaks if we don’t do this?”

💻 Low barrier

TinyShakespeare (1MB) or TinyStories (10–50MB) run on CPU in minutes. GPU is optional for learning.

Resources

Upstream projectjingyaogong/minimind

Learning roadmapRoadmap

Executable examplesLearning materials

Learning notesLearning log · Knowledge base

Built on MiniMind for learning and experiments