When you type a message to an AI, the model doesn't read it word by word the way you do. It breaks your text into tokens — small chunks that might be a whole word, part of a word, or a punctuation mark. Think of tokens as the model's alphabet: the smallest pieces it can work with.
A rough rule of thumb: one token is about four characters in English, or about three-quarters of a word. So "hello" might be one token, while "unbelievable" could be two or three. Punctuation counts too — a period or comma is usually its own token. That's why a short prompt with lots of punctuation can use more tokens than you'd expect.
Why does this matter? AI models have a context window — a maximum number of tokens they can consider at once. When you see "128K context" or "1 million tokens," that's the size of the model's working memory. Longer conversations, bigger documents, and more detailed instructions all consume tokens. Hit the limit, and the model either truncates older content or refuses the request.
Tokens also drive cost and speed. Most AI APIs charge per token (input and output separately). A 500-word article might be 600–700 tokens; a back-and-forth chat can quickly add up. Shorter, clearer prompts use fewer tokens and respond faster. For teams building AI applications, token usage is a key metric for both performance and budget.