Definition
Tokens are sections of text that are inputted into and generated by the machine learning model. They include individual characters, whole words, parts of words, or even larger chunks of text. When dealing with tokens, the standard is that one token usually corresponds to ~4 characters of text for common English text. This is equivalent to ¾ of a word (so 100 tokens ~= 75 words) [1]. AI tokens are not limited to text alone. They can be in various data forms and play a key role in AI’s ability to understand and learn from them. For instance, in computer vision, an AI token may denote an image segment, like a group of pixels or a single pixel. Similarly, in audio processing, a token might be a snippet of sound [2].
To better
illustrate tokens in AI, lets take a look at the two sentences below. Each
colored box represents one token - a unit of text that AI processes;
The example
above shows each token as a distinct colored box while also providing a summary
table at the bottom comparing tokens, words, and characters for both sentences.
This clearly demonstrates how the three metrics differ - tokens are typically
close to but not exactly the same as word count, while character count is
substantially higher than both. With this demonstration, users can now understand how AI companies calculate their pricing models, as they typically charge
based on token count rather than words or characters.
Origin
The concept of
tokens dates back to the early days of computer science, when researchers were
exploring ways to represent and process human language. One of the earliest
tokenization algorithms was developed in the 1950s by the linguist and computer
scientist, Noam Chomsky. Chomsky’s algorithm, known as the “Chomsky Normal
Form,” was used to parse and analyze the structure of sentences.
In the 1980s, the development of the “bag-of-words” model revolutionized the field of natural language processing (NLP). The bag-of-words model represented text as a collection of individual words, without considering the order or context of the words. This model was widely used in early NLP applications, such as text classification and information retrieval [3].
Context and Usage
Tokens play a key role in the field of natural language processing (NLP) and generative AI. Understanding the concept of tokens is fundamental. They are the building blocks that models like ChatGPT, Gemini, MetaAI, and Claude use to process and generate language [4].
Why it Matters
Grasping how AI tokens work is necessary for the efficient use of language models. The ability to count them is critical for the optimization of costs and the best use of tools like ChatGPT. Skillful management of AI tokens enables you to control costs and ensure that you can easily fit within the imposed limits [5].
Related Terms
- Word Tokens: Each word is treated as a separate token.
- Subword Tokens: Words are broken down into smaller meaningful units to handle out-of-vocabulary words better. E.g. "cats" can be broken down into "cat" and "s".
- Phrase Tokens: These consist of multiple words that are grouped together, such as "Benin City" or "Artificial Intelligence".
In Practice
A real-life case
study of a company practicing tokens in AI can be see in the case of OpenAI
with their GPT models which uses a token system for processing language. When
you use ChatGPT or the OpenAI API, your text is broken down into tokens (word
fragments), and you're charged based on token usage. This token-based approach
allows for precise measurement of computational resources used.
References
- The Ministry of AI. (2023). Demystifying Tokens: A Beginners Guide to Understanding AI Building Blocks.
- Miquido. (2025). What is an AI Token?
- Ikangai. (2025). A Brief History of Tokens.
- Beekman, j. (N.D). Tokens 101: Understanding Tokens in Generative AI Models
- The Story. (2025). AI tokens. What are AI tokens?