Position Encoding Quiz

Answer the following questions to check your understanding.

🎮 Interactive Quiz (Recommended)

🎯 Comprehensive Questions

Q6: Practical scenario

Assume you trained a RoPE model with max_seq_len=512. Now you need to process sequences of length 2048. What issues might occur, and how can you fix them?

Show reference answer

Possible issues:

Performance drop:
- Extrapolation is possible, but the model never saw such long sequences
- Long-range attention patterns may be inaccurate
Over-rotation:
- Position 2000 rotates 4× more than training length
- High-frequency dimensions may wrap too many times and lose information

Solutions:

Use YaRN (recommended):
python
```
# enable in MiniMind
config.inference_rope_scaling = True
```
1
2
- rescale rotation frequencies
- smooths long-sequence rotations
Position Interpolation:
python
```
# scale positions into training range
pos_ids = pos_ids * (512 / 2048)
```
1
2
- simple but effective
- compresses the position range
Continue training:
- fine-tune with long-sequence data
- adapt the model to long-range attention

Best practices:

use a larger rope_base in training (e.g., 1e6)
enable YaRN in inference
validate with task-specific tests

Q7: Conceptual understanding

In your own words: why does the RoPE dot product depend only on relative position?

Show reference answer

Math intuition:

Assume:

Query at position m: $q_{m} = R (m θ) \cdot q$ (rotate by mθ)
Key at position n: $k_{n} = R (n θ) \cdot k$ (rotate by nθ)

Dot product:

q_{m} \cdot k_{n} = [R (m θ) \cdot q] \cdot [R (n θ) \cdot k]

Key property: the transpose of a rotation matrix is its inverse

= q \cdot R (- m θ) \cdot R (n θ) \cdot k

= q \cdot R ((n - m) θ) \cdot k

Conclusion:

the dot product contains $(n - m) θ$ only
no absolute positions $m$ or $n$

Intuitive analogy:

two people facing each other
no matter where they stand in the room
their relative angle stays the same

That’s why:

positions (5, 8) and (100, 103) have the same attention score
the model learns “distance = 3” patterns

✅ Completion check

After finishing all questions, check whether you can:

[ ] Get Q1–Q5 all correct: solid basics
[ ] Provide 2+ solutions in Q6: practical ability
[ ] Explain Q7 clearly: deep conceptual understanding

If anything is unclear, return to teaching.md or rerun the experiments.

🎓 Advanced challenge

Want to go deeper? Try:

Modify experiment code:
- implement a simple absolute positional encoding
- compare length extrapolation vs RoPE
- test different rope_base values
Read papers:
- RoFormer original paper
- YaRN length extrapolation
Implement variants:
- implement ALiBi (another positional encoding)
- compare RoPE vs ALiBi

Next: go to 03. Attention to learn attention mechanisms.

Position Encoding Quiz

🎮 Interactive Quiz (Recommended)

Why does Attention need positional encoding?

What is the core idea of RoPE?

Why does RoPE use multiple frequencies?

Where is RoPE applied in Attention?

What is the main advantage of RoPE over absolute positional embeddings?

🎯 Comprehensive Questions

Q6: Practical scenario

Q7: Conceptual understanding

✅ Completion check

🎓 Advanced challenge

Position Encoding Quiz ​

🎮 Interactive Quiz (Recommended) ​

Why does Attention need positional encoding?

What is the core idea of RoPE?

Why does RoPE use multiple frequencies?

Where is RoPE applied in Attention?

What is the main advantage of RoPE over absolute positional embeddings?

🎯 Comprehensive Questions ​

Q6: Practical scenario ​

Q7: Conceptual understanding ​

✅ Completion check ​

🎓 Advanced challenge ​

Position Encoding Quiz

🎮 Interactive Quiz (Recommended)

🎯 Comprehensive Questions

Q6: Practical scenario

Q7: Conceptual understanding

✅ Completion check

🎓 Advanced challenge