Position Encoding Quiz
Answer the following questions to check your understanding.
🎮 Interactive Quiz (Recommended)
Why does Attention need positional encoding?
What is the core idea of RoPE?
Why does RoPE use multiple frequencies?
Where is RoPE applied in Attention?
What is the main advantage of RoPE over absolute positional embeddings?
🎯 Comprehensive Questions
Q6: Practical scenario
Assume you trained a RoPE model with max_seq_len=512. Now you need to process sequences of length 2048. What issues might occur, and how can you fix them?
Show reference answer
Possible issues:
Performance drop:
- Extrapolation is possible, but the model never saw such long sequences
- Long-range attention patterns may be inaccurate
Over-rotation:
- Position 2000 rotates 4× more than training length
- High-frequency dimensions may wrap too many times and lose information
Solutions:
Use YaRN (recommended):
python# enable in MiniMind config.inference_rope_scaling = True- rescale rotation frequencies
- smooths long-sequence rotations
Position Interpolation:
python# scale positions into training range pos_ids = pos_ids * (512 / 2048)- simple but effective
- compresses the position range
Continue training:
- fine-tune with long-sequence data
- adapt the model to long-range attention
Best practices:
- use a larger rope_base in training (e.g., 1e6)
- enable YaRN in inference
- validate with task-specific tests
Q7: Conceptual understanding
In your own words: why does the RoPE dot product depend only on relative position?
Show reference answer
Math intuition:
Assume:
- Query at position m:
(rotate by mθ) - Key at position n:
(rotate by nθ)
Dot product:
Key property: the transpose of a rotation matrix is its inverse
Conclusion:
- the dot product contains
only - no absolute positions
or
Intuitive analogy:
- two people facing each other
- no matter where they stand in the room
- their relative angle stays the same
That’s why:
- positions (5, 8) and (100, 103) have the same attention score
- the model learns “distance = 3” patterns
✅ Completion check
After finishing all questions, check whether you can:
- [ ] Get Q1–Q5 all correct: solid basics
- [ ] Provide 2+ solutions in Q6: practical ability
- [ ] Explain Q7 clearly: deep conceptual understanding
If anything is unclear, return to teaching.md or rerun the experiments.
🎓 Advanced challenge
Want to go deeper? Try:
Modify experiment code:
- implement a simple absolute positional encoding
- compare length extrapolation vs RoPE
- test different rope_base values
Read papers:
Implement variants:
- implement ALiBi (another positional encoding)
- compare RoPE vs ALiBi
Next: go to 03. Attention to learn attention mechanisms.