Word & Token Counter
Count words, characters, sentences, paragraphs and reading time in real time. Estimate LLM token usage for GPT-4, Claude and Gemini — essential for prompt engineering, content planning and API cost estimation.
Token counts are estimates using the ~4 chars/token heuristic. Actual counts vary by model and tokenizer. Use OpenAI's tokenizer or Anthropic's token counter for exact values.
Cost if this text were sent as input/prompt to each API (as of 2025 pricing). Output tokens are billed separately.
What is a Word Counter?
A word counter analyses a block of text and provides statistics including word count, character count (with and without spaces), sentence count, paragraph count, and estimated reading time. For DevOps engineers and SREs, these metrics matter in contexts beyond creative writing: commit messages have length conventions, runbook entries have readability requirements, alert descriptions must be concise enough for on-call engineers to parse at 3am, and technical documentation must meet minimum length thresholds for search engine indexing.
This tool also estimates the approximate number of tokens in the text, which is critical when working with large language models (LLMs) like GPT-4 or Claude. Every LLM has a context window measured in tokens — exceeding it truncates your input silently or raises an error. Knowing the token count of a prompt, document, or code file helps you design retrieval-augmented generation (RAG) pipelines, manage prompt budgets, and avoid unexpected API cost spikes.
When to Use This Tool
- Writing commit messages and PR descriptions: Check that your commit message subject line stays within the 50-72 character convention and that your PR description is detailed enough to give reviewers full context without being excessively long.
- Drafting runbooks and incident response docs: Measure the length of runbook steps to ensure they are concise enough to follow quickly under pressure, and that the total document is thorough enough to be useful without overwhelming on-call engineers.
- Estimating LLM token consumption: Before submitting a large document, code file, or multi-turn conversation to an LLM API, check the estimated token count to avoid hitting context window limits and to forecast API costs.
- Meeting SEO content length requirements: Verify that tool pages, blog posts, and documentation meet the minimum word count thresholds required by search engines and content policies before publishing.
How It Works
Word counting splits the input text on whitespace boundaries and filters out empty tokens, yielding the word count. Character count is computed as the total length of the string (including spaces) and again after stripping all whitespace characters. Sentence detection uses a regex that matches terminal punctuation (., !, ?) followed by whitespace or end-of-string. Paragraph count counts non-empty blocks separated by one or more blank lines. Reading time is derived by dividing the word count by an assumed reading speed of 200 words per minute for technical content. Token estimation uses the widely-cited heuristic of approximately 4 characters per token, which approximates the behaviour of tiktoken (used by OpenAI models) and similar byte-pair encoding tokenizers for English text.
Frequently Asked Questions
How are LLM tokens estimated?
Token counts are estimated using the ~4 characters per token heuristic, which holds approximately true for English prose and code with most modern tokenizers. The actual token count depends heavily on the specific model's tokenizer: GPT-4 uses OpenAI's cl100k_base encoding, Claude uses Anthropic's internal tokenizer, and other models use their own variants. Technical text with many punctuation characters, code with brackets and operators, or non-English languages can have significantly different character-to-token ratios. For exact token counts when building production LLM applications, use the model provider's official tokenizer library — OpenAI's tiktoken for GPT models, or the Anthropic API's token counting endpoint for Claude.
What is the difference between words and tokens?
Words are the human-intuitive units of text separated by spaces and punctuation — the count you would get by manually reading through a document. Tokens are sub-word units that AI language models use internally, produced by byte-pair encoding (BPE) or similar tokenisation algorithms that learn the most frequent character sequences in a training corpus. Common, short English words like "the", "is", and "cat" are typically single tokens. Rare or long words, technical terms, or words in non-English languages are often split into multiple tokens — "Kubernetes" might tokenise as Kube + rnetes, for example. On average, one English word corresponds to approximately 1.3 tokens, which is why a 1,000-word document typically contains 1,200–1,400 tokens.
How is reading time calculated?
Reading time is estimated by dividing the word count by an average reading speed expressed in words per minute (WPM). This tool uses 200 WPM as the baseline, which reflects the typical speed for reading dense technical material — studies suggest that adults read prose at 200-300 WPM but slow to 100-150 WPM for highly technical content requiring close attention. For a 500-word runbook entry, the estimate would be approximately 2.5 minutes. The displayed estimate should be treated as a planning guide for content strategy and readability assessment rather than a precise measurement, since individual reading speeds vary significantly based on technical familiarity with the subject matter and the density of jargon in the text.