Why short text can use too many tokens

Introduction

Chatbots do not measure memory in words, pages, or paragraphs. They measure it in tokens. This seemingly technical detail has practical consequences for anyone using artificial intelligence to analyse documents, write code, summarise reports, or maintain long conversations. A model’s available context window—its working memory during a conversation—is defined by the number of tokens it can process, and most commercial AI services also price usage by token counts. As a result, two texts of similar length can consume very different amounts of memory and cost. Understanding token counts helps explain why a chatbot may forget earlier information, reject a long document, or generate unexpectedly high API bills. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

Token Costs illustration 1

Why words and tokens do not scale evenly

Many users assume that a fixed number of words will always translate into a predictable amount of AI memory. In practice, tokenisation makes the relationship far less straightforward.

A token can be a whole word, part of a word, punctuation, or another frequently occurring text pattern. Common words often occupy a single token, while unusual technical terms, long identifiers, code snippets, URLs, and specialised vocabulary may be split into multiple tokens. OpenAI notes that token counts do not map directly to word counts and can vary significantly depending on the content. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

This means that two documents with the same word count may require different amounts of context space:

Plain conversational English is often relatively token-efficient.
Source code frequently consumes more tokens because identifiers, symbols, and formatting are fragmented.
Scientific terminology, legal citations, and structured data can increase token usage.
Some languages and writing systems may require more or fewer tokens than an equivalent English passage. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

The practical lesson is that visible length can be misleading. A document that looks compact to a human reader may occupy a surprisingly large share of a model’s available context window.

How token counts limit long prompts and documents

The context window is the amount of information a model can consider at one time. Every token in the prompt, uploaded document, conversation history, and generated response must fit within that limit. [Claude]platform.claude.comContext windowsContext windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c…

When a conversation grows, older content competes with newer content for space. If the token budget is exhausted, the system may:

Drop or compress earlier conversation history.
Truncate long documents.
Reduce the amount of detail retained from previous exchanges.
Restrict the maximum length of the generated answer. [Claude]platform.claude.comContext windowsContext windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c…

This is why a chatbot can appear to “forget” information from earlier in a long discussion. The issue is often not permanent memory loss but the fact that earlier tokens have fallen outside the active context window.

The expansion of context windows has become a major area of AI development. Anthropic’s Claude platform moved from 100,000-token contexts to much larger windows, and some enterprise systems now advertise capacities reaching one million tokens. These larger windows allow users to analyse hundreds of pages of text or entire code repositories in a single session. [Anthropic+2Anthropic]anthropic.com100k context windowsIntroducing 100K Context Windows11 May 2023 — We've expanded Claude's context window from 9K to 100K tokens, corresponding to ar…Published: May 2023

However, a larger context window does not eliminate token constraints. Every additional token still occupies memory space and computational resources.

Bigger windows are not the same as unlimited memory

A common misconception is that a larger context window automatically means perfect recall.

Research examining long-context performance suggests that practical effectiveness may decline before the theoretical maximum context length is reached. Some tasks remain accurate across very large contexts, while others show performance degradation as more information is added. The useful working memory of a model therefore depends not only on the advertised token limit but also on the type of reasoning being performed. [arXiv]arxiv.orgContext Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMsSeptember 21, 2025…Published: September 21, 2025

For users, this means that fitting a document into a context window is only the first requirement. The model must also be able to locate and use the relevant information efficiently.

Token Costs illustration 2

Why token counts directly affect cost

Most commercial AI providers charge according to token usage. Input tokens (the prompt and supplied documents) and output tokens (the generated response) are typically priced separately. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

Because billing is token-based, costs scale with:

Longer prompts.
Larger uploaded files.
More extensive conversation histories.
Longer generated answers.
Repeated processing of the same material. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

For individual users, the effect may be modest. For organisations processing millions of requests, token efficiency can become a major operational concern. Industry discussions increasingly focus on balancing AI usefulness against token expenditure, particularly in enterprise deployments where large-scale usage can produce substantial recurring costs. [Business Insider]businessinsider.comUber COO Andrew Macdonald voiced doubts about AI driving meaningful productivity gains, echoing Uber CTO’s earlier comments about exhaust…

This creates a direct connection between wording and spending. An unnecessarily verbose prompt may consume more tokens without improving the answer, while a concise prompt can reduce costs and leave more room for relevant context.

Practical ways to spot token-heavy text

People often underestimate which kinds of content consume the most tokens. Several warning signs can help identify token-heavy material before it reaches a chatbot.

Dense code blocks: Programming code often tokenises less efficiently than ordinary prose because variable names, punctuation, and formatting are treated separately.

Long lists and tables: Structured data may contain many repeated separators, numbers, and identifiers that increase token counts.

Repeated instructions: Copying the same guidance into every prompt can steadily consume context space and increase costs.

Verbose formatting: Excessive markup, nested bullet structures, and long templates add tokens even when they provide little informational value.

Large conversation histories: Retaining every previous exchange may eventually use more context than the current task requires. [Claude]platform.claude.comContext windowsContext windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c…

A useful rule of thumb is to examine whether each section of text genuinely contributes information. Every unnecessary token occupies part of the model’s working memory and may contribute to billing.

Token Costs illustration 3

Token-efficient habits that improve both memory and cost

Good prompt design is often less about writing more and more about using context strategically.

Several practices can improve efficiency:

Remove duplicated instructions.
Summarise earlier discussion instead of repeatedly pasting it.
Provide only the sections of a document relevant to the task.
Use concise wording when precision is not lost.
Break very large projects into focused stages rather than repeatedly sending entire archives. [Anthropic]anthropic.comeffective context engineering for ai agentsEffective context engineering for AI agents29 Sept 2025 — Context engineering refers to the set of strategies for curating and m…

These habits create two benefits at once. They reduce token expenditure while also leaving more room in the context window for information that actually matters to the current request.

In practical terms, token counts act as both the memory budget and the spending budget of a chatbot. Understanding that relationship helps explain why AI systems sometimes forget details, why large documents can be difficult to process, and why efficient prompts often produce better results at lower cost.

Amazon book picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Example eBay listing

A. I. Artificial Intelligence. Jude Law. Original UK Video Poster.

Search eBay.co.uk: artificial intelligence poster

Browse similar on eBay.co.uk

Example eBay listing

Artificial intelligence is no a mat Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: artificial intelligence poster

Browse similar on eBay.co.uk

Example eBay listing

A.I. Artificial Intelligence - Jude Law - One Sheet Cinema Poster

Search eBay.co.uk: artificial intelligence poster

Browse similar on eBay.co.uk

Example eBay listing

A I Artificial Intelligence 6 Movie Poster Art Print Print Classic Rare Gallery

Search eBay.co.uk: artificial intelligence poster

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: help.openai.com
Link: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
Source snippet
OpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as...
Source: platform.claude.com
Title: Context windows
Link: https://platform.claude.com/docs/en/build-with-claude/context-windows
Source snippet
Context windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c...
Source: anthropic.com
Title: 100k context windows
Link: https://www.anthropic.com/news/100k-context-windows
Source snippet
Introducing 100K Context Windows11 May 2023 — We've expanded Claude's context window from 9K to 100K tokens, corresponding to ar...

Published: May 2023
Source: anthropic.com
Title: claude opus 4 6
Link: https://www.anthropic.com/news/claude-opus-4-6
Source snippet
Introducing Claude Opus 4.65 Feb 2026 — [1] The 1M token context window is currently available in beta on the Claude Developer Platform o...
Source: arxiv.org
Link: https://arxiv.org/abs/2509.21361
Source snippet
Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMsSeptember 21, 2025...

Published: September 21, 2025
Source: arxiv.org
Link: https://arxiv.org/abs/2605.02173
Source: anthropic.com
Title: effective context engineering for ai agents
Link: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Source snippet
Effective context engineering for AI agents29 Sept 2025 — Context engineering refers to the set of strategies for curating and m...
Source: OpenAI
Link: https://openai.com/
Source snippet
comOpenAI | Research & DeploymentWe believe our research will eventually lead to artificial general intelligence, a system that can solve...
Source: community.openai.com
Title: 4096 response limit vs 128 000 context window
Link: https://community.openai.com/t/4096-response-limit-vs-128-000-context-window/656864
Source snippet
and it is shared for all language [inference]({{ 'inference-test/' | relative_url }}). The only thing confusing is that...Read more...
Source: community.openai.com
Title: assistant api what are context tokens in the billing calculation
Link: https://community.openai.com/t/assistant-api-what-are-context-tokens-in-the-billing-calculation/497675
Source snippet
API12 Nov 2023 — “Context” is OpenAI's new language for “prompt” or input. It's what is loaded into the AI model before it generates a la...
Source: anthropic.com
Title: prompting long context
Link: https://www.anthropic.com/news/prompting-long-context
Source snippet
Prompt engineering for Claude's long context window23 Sept 2023 — Claude's 100,000 token long context window enables the model to operate...
Source: platform.claude.com
Title: extended thinking
Link: https://platform.claude.com/docs/en/build-with-claude/extended-thinking
Source snippet
context window space visually, they still count toward your input token usage when cached; If thinking becomes disabled and you pass thin...
Source: businessinsider.com
Link: https://www.businessinsider.com/ai-spending-roi-concerns-tokenmaxxing-uber-coo-andrew-macdonald-reaction-2026-5
Source snippet
Uber COO Andrew Macdonald voiced doubts about AI driving [meaningful]({{ 'human-review/' | relative_url }}) productivity gains, echoing Uber CTO’s earlier comments about exhaust...
Source: github.com
Link: https://github.com/vercel/ai/issues/5205
Source snippet
anthropic count tokens api · Issue #5205 · vercel/ai - GitHubMarch 13, 2025 — I suspect that most LLM APIs will likely provide some way t...

Published: March 13, 2025
Source: reddit.com
Link: https://www.reddit.com/r/claude/comments/1s3vsm5/anthropic_broke_your_limits_with_the_1m_context/
Source snippet
Anthropic broke your limits with the 1M context updateI set my context limit to 666,666. Just put in CLAUDE.MD - your context window is 6...
Source: aws.amazon.com
Title: anthropic claude sonnet bedrock expanded context window
Link: https://aws.amazon.com/about-aws/whats-new/2025/08/anthropic-claude-sonnet-bedrock-expanded-context-window/
Source snippet
amazon.comAnthropic's Claude Sonnet 4 in Amazon Bedrock...12 Aug 2025 — Anthropic's Claude Sonnet 4 in Amazon Bedrock is launching today...
Source: linkedin.com
Link: https://www.linkedin.com/company/openai
Source snippet
OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of...
Source: linkedin.com
Link: https://www.linkedin.com/posts/hirirngdots_1m-context-is-now-generally-available-for-activity-7439361919473405952-vMfi
Source snippet
enerally available for Claude Opus 4.6 and Sonnet 4.6...
Source: linkedin.com
Link: https://www.linkedin.com/posts/jerry-liu-64390071_anthropic-just-shipped-1m-token-context-windows-activity-7438620843561160704-Updj
Source snippet
ntire shelf of contracts, research reports...
Source: linkedin.com
Link: https://www.linkedin.com/company/anthropicresearch
Source: docs.rs
Title: Anthropic Token Counter in multi_llm
Link: https://docs.rs/multi-llm/latest/multi_llm/struct.AnthropicTokenCounter.html
Source snippet
AnthropicTokenCounter in multi_llm - Rust - Docs.rsToken counter for Anthropic Claude models. Uses cl100k_base tokenizer with a 1.1x appr...
Source: Wikipedia
Title: Open AI
Link: https://en.wikipedia.org/wiki/OpenAI
Source snippet
OpenAIOpenAI is an American artificial intelligence (AI) research organization headquartered in San Francisco, consisting of OpenAI Gr...
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/Anthropic
Source snippet
AnthropicAnthropic PBC is an American artificial intelligence (AI) company headquartered in San Francisco, California. It has develope...
Source: github.com
Link: https://github.com/cline/cline/issues/4149
Source snippet
Anthropic models capped at 8192 `maxTokens` instead of...10 Jun 2025 — The current maxTokens limits for Anthropic models in the code are...
Source: mindstudio.ai
Title: claude 1m token context window agents
Link: https://www.mindstudio.ai/blog/claude-1m-token-context-window-agents/
Source snippet
Claude 1M Token Context Window: What It Means for Long...21 Mar 2026 — Anthropic recently expanded Claude Opus 4.5 and Claude Sonnet 4.5...
Source: itnews.com.au
Link: https://www.itnews.com.au/news/anthropic-opens-claude-mythos-preview-ai-program-to-australia-626399
Source snippet
Australia is now included. with up to 150 new organisations now...
Source: hexmos.com
Title: Anthropic Token Counter | Online Free Dev Tools by
Link: https://hexmos.com/freedevtools/t/anthropic-token-counter/
Source snippet
Leverage Claude's large context window for...
Source: youtube.com
Link: https://www.youtube.com/watch?v=Uv0mJ3AhqPw
Source snippet
Anthropic Gets a 1M Token Context WindowClaude Sonnet 4 now supports up to 1 million tokens in a single request. This is a 5x increase th...
Source: youtube.com
Link: https://www.youtube.com/%40anthropic-ai
Source snippet
AnthropicWe're an AI safety and research company. Talk to our AI assistant Claude on claude.com. Download Claude on desktop, iOS, or Andr...

Additional References

Source: hakia.com
Link: https://hakia.com/tech-insights/context-windows-explained/
Source snippet
Context Windows Explained: Why Token Limits MatterOpenAI models: Range from 16K to 128K tokens depending on model tier; Anthropic Claude...
Source: theverge.com
Link: https://www.theverge.com/ai-artificial-intelligence/757998/anthropic-just-made-its-latest-move-in-the-ai-coding-wars
Source snippet
This leap enables the AI to handle vast amounts of data—including up to 2,500 pages of text or entire code bases of 75,000–110,000 lines—...
Source: facebook.com
Link: https://www.facebook.com/groups/aisaas/posts/3840456629607057/
Source snippet
Tokens, what are they and why they matterSpecifically, tokens are the segments of text that are fed into and generated by the machine lea...
Source: aws.amazon.com
Link: https://aws.amazon.com/about-aws/whats-new/2025/08/count-tokens-api-anthropics-claude-models-bedrock/
Source snippet
Tokens API supported for Anthropic's Claude models now in...August 22, 2025 — The Count Tokens API is now available in Amazon Bedrock, e...

Published: August 22, 2025
Source: reddit.com
Link: https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/
Source snippet
What does the 128k context window mean for ChatGPT...I am a ChatGPT plus user and I don’t understand how the newly announced context win...
Source: blog.mlq.ai
Link: https://blog.mlq.ai/tokens-context-window-llms/
Source snippet
Tokens & Context WindowsTokens are the basic building blocks for LLMs and represent the smallest unit of text the model can understand an...
Source: blog.devgenius.io
Title: deciphering llm costs pricing and context window comparison f67490360203
Link: https://blog.devgenius.io/deciphering-llm-costs-pricing-and-context-window-comparison-f67490360203
Source snippet
LLM Costs: Pricing and Context Window...27 Feb 2024 — What are tokens? Tokens are units of data, usually words or subword units — parts...
Source: stackoverflow.com
Title: Best way to count tokens for Anthropic Claude Models using the API?
Link: https://stackoverflow.com/questions/78767238/best-way-to-count-tokens-for-anthropic-claude-models-using-the-api
Source snippet
July 19, 2024 — I'm working with Anthropic's Claude models and need to accurately count the number of tokens in my prompts and responses...

Published: July 19, 2024
Source: dev.to
Title: llm context windows managing tokens in [production]({{ ‘retrieval-failures/’ | relative_url }}) ai apps 11l
Link: https://dev.to/whoffagents/llm-context-windows-managing-tokens-in-production-ai-apps-11l
Source snippet
LLM Context Windows: Managing Tokens in Production AI...7 Apr 2026 — LLM Context Windows: Managing Tokens in Production AI Apps · The To...
Source: reddit.com
Title: How do you count/estimate token input/outputs with Claude 3?
Link: https://www.reddit.com/r/ClaudeAI/comments/1bgg5v0/how_do_you_countestimate_token_inputoutputs_with/
Source snippet
March 16, 2024 — I'm currently writing a translation application using calls to Claude 3's API, and I need a way to count the input token...

Published: March 16, 2024

Why short text can use too many tokens

Introduction

Why words and tokens do not scale evenly

How token counts limit long prompts and documents

Bigger windows are not the same as unlimited memory

Why token counts directly affect cost

Practical ways to spot token-heavy text

Token-efficient habits that improve both memory and cost

Further Reading

Hands-On Large Language Models

Build a Large Language Model (From Scratch)

Natural Language Processing with Transformers

Speech and Language Processing: Pearson New International Edi...

Marketplace Samples

A. I. Artificial Intelligence. Jude Law. Original UK Video Poster.

Artificial intelligence is no a mat Framed Wall Art Poster Canvas Print Picture

A.I. Artificial Intelligence - Jude Law - One Sheet Cinema Poster

A I Artificial Intelligence 6 Movie Poster Art Print Print Classic Rare Gallery

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 2