Within Tokenization

Why short text can use too many tokens

Because models measure context in tokens rather than words, wording choices can affect how much text fits and how expensive a request becomes.

On this page

  • Why words and tokens do not scale evenly
  • How token counts limit long prompts and documents
  • Practical ways to spot token heavy text
Preview for Why short text can use too many tokens

Introduction

Chatbots do not measure memory in words, pages, or paragraphs. They measure it in tokens. This seemingly technical detail has practical consequences for anyone using artificial intelligence to analyse documents, write code, summarise reports, or maintain long conversations. A model’s available context window—its working memory during a conversation—is defined by the number of tokens it can process, and most commercial AI services also price usage by token counts. As a result, two texts of similar length can consume very different amounts of memory and cost. Understanding token counts helps explain why a chatbot may forget earlier information, reject a long document, or generate unexpectedly high API bills. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

Token Costs illustration 1

Why words and tokens do not scale evenly

Many users assume that a fixed number of words will always translate into a predictable amount of AI memory. In practice, tokenisation makes the relationship far less straightforward.

A token can be a whole word, part of a word, punctuation, or another frequently occurring text pattern. Common words often occupy a single token, while unusual technical terms, long identifiers, code snippets, URLs, and specialised vocabulary may be split into multiple tokens. OpenAI notes that token counts do not map directly to word counts and can vary significantly depending on the content. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

This means that two documents with the same word count may require different amounts of context space:

  • Plain conversational English is often relatively token-efficient.
  • Source code frequently consumes more tokens because identifiers, symbols, and formatting are fragmented.
  • Scientific terminology, legal citations, and structured data can increase token usage.
  • Some languages and writing systems may require more or fewer tokens than an equivalent English passage. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

The practical lesson is that visible length can be misleading. A document that looks compact to a human reader may occupy a surprisingly large share of a model’s available context window.

How token counts limit long prompts and documents

The context window is the amount of information a model can consider at one time. Every token in the prompt, uploaded document, conversation history, and generated response must fit within that limit. [Claude]platform.claude.comContext windowsContext windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c…

When a conversation grows, older content competes with newer content for space. If the token budget is exhausted, the system may:

  • Drop or compress earlier conversation history.
  • Truncate long documents.
  • Reduce the amount of detail retained from previous exchanges.
  • Restrict the maximum length of the generated answer. [Claude]platform.claude.comContext windowsContext windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c…

This is why a chatbot can appear to “forget” information from earlier in a long discussion. The issue is often not permanent memory loss but the fact that earlier tokens have fallen outside the active context window.

The expansion of context windows has become a major area of AI development. Anthropic’s Claude platform moved from 100,000-token contexts to much larger windows, and some enterprise systems now advertise capacities reaching one million tokens. These larger windows allow users to analyse hundreds of pages of text or entire code repositories in a single session. [Anthropic+2Anthropic]anthropic.com100k context windowsIntroducing 100K Context Windows11 May 2023 — We've expanded Claude's context window from 9K to 100K tokens, corresponding to ar…Published: May 2023

However, a larger context window does not eliminate token constraints. Every additional token still occupies memory space and computational resources.

Bigger windows are not the same as unlimited memory

A common misconception is that a larger context window automatically means perfect recall.

Research examining long-context performance suggests that practical effectiveness may decline before the theoretical maximum context length is reached. Some tasks remain accurate across very large contexts, while others show performance degradation as more information is added. The useful working memory of a model therefore depends not only on the advertised token limit but also on the type of reasoning being performed. [arXiv]arxiv.orgContext Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMsSeptember 21, 2025…Published: September 21, 2025

For users, this means that fitting a document into a context window is only the first requirement. The model must also be able to locate and use the relevant information efficiently.

Token Costs illustration 2

Why token counts directly affect cost

Most commercial AI providers charge according to token usage. Input tokens (the prompt and supplied documents) and output tokens (the generated response) are typically priced separately. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

Because billing is token-based, costs scale with:

  • Longer prompts.
  • Larger uploaded files.
  • More extensive conversation histories.
  • Longer generated answers.
  • Repeated processing of the same material. [OpenAI Help Center]help.openai.comOpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as…

For individual users, the effect may be modest. For organisations processing millions of requests, token efficiency can become a major operational concern. Industry discussions increasingly focus on balancing AI usefulness against token expenditure, particularly in enterprise deployments where large-scale usage can produce substantial recurring costs. [Business Insider]businessinsider.comUber COO Andrew Macdonald voiced doubts about AI driving meaningful productivity gains, echoing Uber CTO’s earlier comments about exhaust…

This creates a direct connection between wording and spending. An unnecessarily verbose prompt may consume more tokens without improving the answer, while a concise prompt can reduce costs and leave more room for relevant context.

Practical ways to spot token-heavy text

People often underestimate which kinds of content consume the most tokens. Several warning signs can help identify token-heavy material before it reaches a chatbot.

Dense code blocks: Programming code often tokenises less efficiently than ordinary prose because variable names, punctuation, and formatting are treated separately.

Long lists and tables: Structured data may contain many repeated separators, numbers, and identifiers that increase token counts.

Repeated instructions: Copying the same guidance into every prompt can steadily consume context space and increase costs.

Verbose formatting: Excessive markup, nested bullet structures, and long templates add tokens even when they provide little informational value.

Large conversation histories: Retaining every previous exchange may eventually use more context than the current task requires. [Claude]platform.claude.comContext windowsContext windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c…

A useful rule of thumb is to examine whether each section of text genuinely contributes information. Every unnecessary token occupies part of the model’s working memory and may contribute to billing.

Token Costs illustration 3

Token-efficient habits that improve both memory and cost

Good prompt design is often less about writing more and more about using context strategically.

Several practices can improve efficiency:

  • Remove duplicated instructions.
  • Summarise earlier discussion instead of repeatedly pasting it.
  • Provide only the sections of a document relevant to the task.
  • Use concise wording when precision is not lost.
  • Break very large projects into focused stages rather than repeatedly sending entire archives. [Anthropic]anthropic.comeffective context engineering for ai agentsEffective context engineering for AI agents29 Sept 2025 — Context engineering refers to the set of strategies for curating and m…

These habits create two benefits at once. They reduce token expenditure while also leaving more room in the context window for information that actually matters to the current request.

In practical terms, token counts act as both the memory budget and the spending budget of a chatbot. Understanding that relationship helps explain why AI systems sometimes forget details, why large documents can be difficult to process, and why efficient prompts often produce better results at lower cost.

Amazon book picks

Further Reading

Books and field guides related to Why short text can use too many tokens. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Example marketplace items related to this page. Use the search link to explore similar finds on eBay.

Using USA

Endnotes

  1. Source: help.openai.com
    Link: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
    Source snippet

    OpenAI Help CenterWhat are tokens and how to count them?Tokens are the building blocks of text that OpenAI models process. They can be as...

  2. Source: platform.claude.com
    Title: Context windows
    Link: https://platform.claude.com/docs/en/build-with-claude/context-windows
    Source snippet

    Context windows - Claude API DocsAs conversations grow, you'll eventually approach context window limits. This guide explains how c...

  3. Source: anthropic.com
    Title: 100k context windows
    Link: https://www.anthropic.com/news/100k-context-windows
    Source snippet

    Introducing 100K Context Windows11 May 2023 — We've expanded Claude's context window from 9K to 100K tokens, corresponding to ar...

    Published: May 2023

  4. Source: anthropic.com
    Title: claude opus 4 6
    Link: https://www.anthropic.com/news/claude-opus-4-6
    Source snippet

    Introducing Claude Opus 4.65 Feb 2026 — [1] The 1M token context window is currently available in beta on the Claude Developer Platform o...

  5. Source: arxiv.org
    Link: https://arxiv.org/abs/2509.21361
    Source snippet

    Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMsSeptember 21, 2025...

    Published: September 21, 2025

  6. Source: arxiv.org
    Link: https://arxiv.org/abs/2605.02173

  7. Source: anthropic.com
    Title: effective context engineering for ai agents
    Link: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
    Source snippet

    Effective context engineering for AI agents29 Sept 2025 — Context engineering refers to the set of strategies for curating and m...

  8. Source: OpenAI
    Link: https://openai.com/
    Source snippet

    comOpenAI | Research & DeploymentWe believe our research will eventually lead to artificial general intelligence, a system that can solve...

  9. Source: community.openai.com
    Title: 4096 response limit vs 128 000 context window
    Link: https://community.openai.com/t/4096-response-limit-vs-128-000-context-window/656864
    Source snippet

    and it is shared for all language [inference]({{ 'inference-test/' | relative_url }}). The only thing confusing is that...Read more...

  10. Source: community.openai.com
    Title: assistant api what are context tokens in the billing calculation
    Link: https://community.openai.com/t/assistant-api-what-are-context-tokens-in-the-billing-calculation/497675
    Source snippet

    API12 Nov 2023 — “Context” is OpenAI's new language for “prompt” or input. It's what is loaded into the AI model before it generates a la...

  11. Source: anthropic.com
    Title: prompting long context
    Link: https://www.anthropic.com/news/prompting-long-context
    Source snippet

    Prompt engineering for Claude's long context window23 Sept 2023 — Claude's 100,000 token long context window enables the model to operate...

  12. Source: platform.claude.com
    Title: extended thinking
    Link: https://platform.claude.com/docs/en/build-with-claude/extended-thinking
    Source snippet

    context window space visually, they still count toward your input token usage when cached; If thinking becomes disabled and you pass thin...

  13. Source: businessinsider.com
    Link: https://www.businessinsider.com/ai-spending-roi-concerns-tokenmaxxing-uber-coo-andrew-macdonald-reaction-2026-5
    Source snippet

    Uber COO Andrew Macdonald voiced doubts about AI driving [meaningful]({{ 'human-review/' | relative_url }}) productivity gains, echoing Uber CTO’s earlier comments about exhaust...

  14. Source: github.com
    Link: https://github.com/vercel/ai/issues/5205
    Source snippet

    anthropic count tokens api · Issue #5205 · vercel/ai - GitHubMarch 13, 2025 — I suspect that most LLM APIs will likely provide some way t...

    Published: March 13, 2025

  15. Source: reddit.com
    Link: https://www.reddit.com/r/claude/comments/1s3vsm5/anthropic_broke_your_limits_with_the_1m_context/
    Source snippet

    Anthropic broke your limits with the 1M context updateI set my context limit to 666,666. Just put in CLAUDE.MD - your context window is 6...

  16. Source: aws.amazon.com
    Title: anthropic claude sonnet bedrock expanded context window
    Link: https://aws.amazon.com/about-aws/whats-new/2025/08/anthropic-claude-sonnet-bedrock-expanded-context-window/
    Source snippet

    amazon.comAnthropic's Claude Sonnet 4 in Amazon Bedrock...12 Aug 2025 — Anthropic's Claude Sonnet 4 in Amazon Bedrock is launching today...

  17. Source: linkedin.com
    Link: https://www.linkedin.com/company/openai
    Source snippet

    OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of...

  18. Source: linkedin.com
    Link: https://www.linkedin.com/posts/hirirngdots_1m-context-is-now-generally-available-for-activity-7439361919473405952-vMfi
    Source snippet

    enerally available for Claude Opus 4.6 and Sonnet 4.6...

  19. Source: linkedin.com
    Link: https://www.linkedin.com/posts/jerry-liu-64390071_anthropic-just-shipped-1m-token-context-windows-activity-7438620843561160704-Updj
    Source snippet

    ntire shelf of contracts, research reports...

  20. Source: linkedin.com
    Link: https://www.linkedin.com/company/anthropicresearch

  21. Source: docs.rs
    Title: Anthropic Token Counter in multi_llm
    Link: https://docs.rs/multi-llm/latest/multi_llm/struct.AnthropicTokenCounter.html
    Source snippet

    AnthropicTokenCounter in multi_llm - Rust - Docs.rsToken counter for Anthropic Claude models. Uses cl100k_base tokenizer with a 1.1x appr...

  22. Source: Wikipedia
    Title: Open AI
    Link: https://en.wikipedia.org/wiki/OpenAI
    Source snippet

    OpenAIOpenAI is an American artificial intelligence (AI) research organization headquartered in San Francisco, consisting of OpenAI Gr...

  23. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/Anthropic
    Source snippet

    AnthropicAnthropic PBC is an American artificial intelligence (AI) company headquartered in San Francisco, California. It has develope...

  24. Source: github.com
    Link: https://github.com/cline/cline/issues/4149
    Source snippet

    Anthropic models capped at 8192 `maxTokens` instead of...10 Jun 2025 — The current maxTokens limits for Anthropic models in the code are...

  25. Source: mindstudio.ai
    Title: claude 1m token context window agents
    Link: https://www.mindstudio.ai/blog/claude-1m-token-context-window-agents/
    Source snippet

    Claude 1M Token Context Window: What It Means for Long...21 Mar 2026 — Anthropic recently expanded Claude Opus 4.5 and Claude Sonnet 4.5...

  26. Source: itnews.com.au
    Link: https://www.itnews.com.au/news/anthropic-opens-claude-mythos-preview-ai-program-to-australia-626399
    Source snippet

    Australia is now included. with up to 150 new organisations now...

  27. Source: hexmos.com
    Title: Anthropic Token Counter | Online Free Dev Tools by
    Link: https://hexmos.com/freedevtools/t/anthropic-token-counter/
    Source snippet

    Leverage Claude's large context window for...

  28. Source: youtube.com
    Link: https://www.youtube.com/watch?v=Uv0mJ3AhqPw
    Source snippet

    Anthropic Gets a 1M Token Context WindowClaude Sonnet 4 now supports up to 1 million tokens in a single request. This is a 5x increase th...

  29. Source: youtube.com
    Link: https://www.youtube.com/%40anthropic-ai
    Source snippet

    AnthropicWe're an AI safety and research company. Talk to our AI assistant Claude on claude.com. Download Claude on desktop, iOS, or Andr...

Additional References

  1. Source: hakia.com
    Link: https://hakia.com/tech-insights/context-windows-explained/
    Source snippet

    Context Windows Explained: Why Token Limits MatterOpenAI models: Range from 16K to 128K tokens depending on model tier; Anthropic Claude...

  2. Source: theverge.com
    Link: https://www.theverge.com/ai-artificial-intelligence/757998/anthropic-just-made-its-latest-move-in-the-ai-coding-wars
    Source snippet

    This leap enables the AI to handle vast amounts of data—including up to 2,500 pages of text or entire code bases of 75,000–110,000 lines—...

  3. Source: facebook.com
    Link: https://www.facebook.com/groups/aisaas/posts/3840456629607057/
    Source snippet

    Tokens, what are they and why they matterSpecifically, tokens are the segments of text that are fed into and generated by the machine lea...

  4. Source: aws.amazon.com
    Link: https://aws.amazon.com/about-aws/whats-new/2025/08/count-tokens-api-anthropics-claude-models-bedrock/
    Source snippet

    Tokens API supported for Anthropic's Claude models now in...August 22, 2025 — The Count Tokens API is now available in Amazon Bedrock, e...

    Published: August 22, 2025

  5. Source: reddit.com
    Link: https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/
    Source snippet

    What does the 128k context window mean for ChatGPT...I am a ChatGPT plus user and I don’t understand how the newly announced context win...

  6. Source: blog.mlq.ai
    Link: https://blog.mlq.ai/tokens-context-window-llms/
    Source snippet

    Tokens & Context WindowsTokens are the basic building blocks for LLMs and represent the smallest unit of text the model can understand an...

  7. Source: blog.devgenius.io
    Title: deciphering llm costs pricing and context window comparison f67490360203
    Link: https://blog.devgenius.io/deciphering-llm-costs-pricing-and-context-window-comparison-f67490360203
    Source snippet

    LLM Costs: Pricing and Context Window...27 Feb 2024 — What are tokens? Tokens are units of data, usually words or subword units — parts...

  8. Source: stackoverflow.com
    Title: Best way to count tokens for Anthropic Claude Models using the API?
    Link: https://stackoverflow.com/questions/78767238/best-way-to-count-tokens-for-anthropic-claude-models-using-the-api
    Source snippet

    July 19, 2024 — I'm working with Anthropic's Claude models and need to accurately count the number of tokens in my prompts and responses...

    Published: July 19, 2024

  9. Source: dev.to
    Title: llm context windows managing tokens in [production]({{ ‘retrieval-failures/’ | relative_url }}) ai apps 11l
    Link: https://dev.to/whoffagents/llm-context-windows-managing-tokens-in-production-ai-apps-11l
    Source snippet

    LLM Context Windows: Managing Tokens in Production AI...7 Apr 2026 — LLM Context Windows: Managing Tokens in Production AI Apps · The To...

  10. Source: reddit.com
    Title: How do you count/estimate token input/outputs with Claude 3?
    Link: https://www.reddit.com/r/ClaudeAI/comments/1bgg5v0/how_do_you_countestimate_token_inputoutputs_with/
    Source snippet

    March 16, 2024 — I'm currently writing a translation application using calls to Claude 3's API, and I need a way to count the input token...

    Published: March 16, 2024

Topic Tree

Follow this branch

Parent topic

Tokenization Why chatbots do not really read words

Related pages 2