Tokens — Glossary | Axon Agentic

Tokens are the fundamental units that large language models use to process and generate text. Rather than working character-by-character or word-by-word, LLMs split text into subword pieces called tokens using algorithms like Byte Pair Encoding (BPE).

Rough Token Counts

Text	Approximate Tokens
1 word (English)	~1.3 tokens
1 sentence	~15–20 tokens
1 paragraph	~80–100 tokens
1 page (~500 words)	~650 tokens
This glossary entry	~300 tokens

Why Tokens Matter for Biological Research

LLM API costs are priced per token (input and output). For a system querying large biological databases:

A single HPA gene record: ~200–500 tokens
50 gene records passed to a model: ~10,000–25,000 tokens
Running 12 benchmark tests: ~$0.09–$0.19 per query at current GPT-5 pricing

Token efficiency — how much useful data you can pack into a context window — directly determines both the cost and the quality of AI-driven biological analysis.