🔤 Utilities

HTML Entity Encoder / Decoder

Convert special characters to HTML entities (& < > etc.) and back. Handles named entities, numeric entities and full document encoding.

📖 How to Use

▼

Paste text with special characters or HTML entities

Choose Encode or Decode

Copy the result — safe to paste in HTML documents

📝 Examples

Encode

<div class="test">Hello & World</div>

📝 Input

EncodeDecode

✅ Output

📋 Common HTML Entities Reference

Reaching for a Shell Instead of a Browser Tab

Every language you're likely to deploy with already ships an equivalent of what this page does. Python's html.escape(), PHP's htmlspecialchars(), and Node's he or template-engine auto-escaping all perform the same five-character substitution this tool runs client-side. From a raw shell, the same substitution can be chained with sed: sed 's/&/\&/g; s/</\</g; s/>/\>/g' — though the ordering matters, since encoding the ampersand first and then re-encoding the ampersands inside < would double-escape the string, a classic bug in hand-rolled sed pipelines.

This browser tool exists for the case the CLI doesn't cover well: a one-off paste from a support ticket, a Slack message, or a config value someone hands you mid-incident, where opening an editor and writing a script is slower than pasting into a text box. For anything that runs repeatedly — a build step, a CI job, a log pipeline — the library or CLI call is the right long-term home for the logic; this tool is for inspection and one-off conversions, not automation.

Encoding One String vs. Encoding a Firehose

A single paste-and-encode operation here completes in well under a millisecond, since the underlying String.prototype.replace() calls are simple linear scans. That performance profile does not extrapolate to bulk use — this tool processes one text block in one browser tab, and pasting a multi-megabyte log file will make the tab's main thread block noticeably during encoding, since JavaScript string replacement is single-threaded and the DOM update after each keystroke re-runs the whole pipeline. At genuine scale — a log-shipping pipeline encoding millions of lines per hour, or a web framework escaping every interpolated template variable on every request — the encoding work belongs in a compiled, server-side auto-escaping template engine (Django's autoescape, Rails' ERB, Go's html/template) rather than a manual per-string operation, both for throughput and because manual encoding is the kind of step a developer eventually forgets to call.

The Dashboard That Executed a Joke

A common way teams learn this lesson: an internal support tool renders customer ticket text directly into an admin dashboard so agents can read complaints at a glance. An engineer, testing whether the ticket form has any length limit, pastes <script>alert(document.cookie)</script> as a joke. If the dashboard inserts ticket bodies via innerHTML without encoding, the browser doesn't display that string — it executes it, and the alert box pops for every agent who opens that ticket afterward. In a real attack rather than a joke, the payload would silently exfiltrate the viewing agent's session cookie to an external server instead of popping an alert, which is exactly how stored XSS against internal tooling tends to get discovered: by accident, from a harmless test string, well before anyone runs a real attack.

Where the Five-Character Rule Comes From

HTML entities predate the modern web; SGML, the markup metalanguage HTML was originally derived from, already used named character references to escape reserved punctuation. HTML5, formalized by the WHATWG as the HTML Living Standard, defines an enormous table of named character references — well over 2,000 of them, covering everything from © to obscure mathematical operators — but only five characters are load-bearing for security: &, <, >, ", and '. These five are special because the HTML parser itself uses them as syntax — < opens a tag, & starts an entity reference, and quotes delimit attribute values — so any of them appearing unescaped in a context the parser doesn't expect can change how the rest of the document is interpreted. XML's predefined entity set mirrors the same five characters for the identical reason, which is why this encoding logic is portable across HTML and XML tooling without modification.

Frequently Asked Questions

What are HTML entities?

HTML entities are special sequences of characters that represent reserved or non-printable characters in HTML markup. They begin with an ampersand (&) and end with a semicolon (;). Reserved characters like < and > must be encoded as < and > because the browser's HTML parser interprets bare angle brackets as tag delimiters. Similarly, the ampersand itself must be written as & because the parser uses it as the entity escape character. Beyond the five critical characters, entities are also used to represent characters outside the ASCII range, special typographic symbols like — for an em dash, and mathematical and currency symbols that may not be reliably representable in all character encodings.

When should I encode HTML entities?

You should encode HTML entities any time you insert dynamic content into an HTML document — which in practice means whenever the value of a variable ends up between HTML tags or inside an HTML attribute. The five characters that must always be encoded are the ampersand (&), less-than sign (<), greater-than sign (>), double quote ("), and single quote ('). Failing to encode user-provided input before inserting it into HTML is the root cause of Reflected and Stored XSS vulnerabilities, which allow attackers to inject JavaScript that runs in other users' browsers. Modern frameworks like React, Vue, and Angular escape content by default, but when using raw HTML manipulation via innerHTML or server-side template engines, you must apply encoding explicitly.

What is the difference between named entities and numeric entities in HTML?

Named entities use a descriptive human-readable name preceded by & and terminated by ;, such as & for the ampersand character, < for less-than, or © for the copyright symbol. Numeric entities use the Unicode code point either in decimal form (© for the copyright symbol) or in hexadecimal form (©). Named entities exist only for a defined set of characters specified in the HTML standard, whereas numeric entities can represent any Unicode character regardless of whether a named form exists. Both representations produce identical results in the browser, so the choice between them is largely a matter of readability — named entities are more readable in source code, while numeric entities are useful for characters that have no named form.