Attention Mechanism

How transformers understand language

es
1 / 7

Tokenization

Words become numbers

The first step is breaking text into tokens — small pieces the model can process, each mapped to a unique number.

"I ate a banana on Friday"
Key Insight

Each word gets a unique ID from the model's vocabulary. "banana" is always token #39127. Real tokenizers (like BPE) can split words into sub-pieces — "eating" might become "eat" + "ing".

Use arrow keys to navigate