HTML Entity Encoder / Decoder
Convert special characters to HTML entities (& < > etc.) and back. Handles named entities, numeric entities and full document encoding.
What is the HTML Entity Encoder / Decoder?
The HTML Entity Encoder converts special characters into their HTML entity equivalents โ for example, transforming < into < and & into & โ and reverses the process in decode mode. This is a critical operation in web development and security engineering because inserting raw user-generated content directly into HTML without encoding is the primary cause of Cross-Site Scripting (XSS) vulnerabilities, one of the most exploited classes of security bugs on the web.
HTML entity encoding is also essential when writing HTML documentation, embedding code samples in web pages, or working with template engines that generate HTML output. Knowing which characters must be encoded and when to encode them is a foundational web security skill for any developer building applications that display user input, parse third-party data, or render dynamic content in a browser.
When to Use This Tool
- XSS prevention testing: Encode user-supplied strings before pasting them into HTML templates to verify that your application would display the content safely without executing any embedded scripts.
- Embedding code samples in HTML: Encode angle brackets and ampersands in code examples so they display as literal text in a browser rather than being interpreted as HTML tags.
- Debugging template engine output: Decode entity-encoded strings from an API response or template output to see what the original content was before the encoding was applied.
- Email HTML authoring: Encode special characters in HTML email templates, which have stricter rendering environments than modern browsers and require proper entity encoding for consistent display across email clients.
How It Works
Encoding replaces the five HTML-critical characters โ ampersand (&), less-than (<), greater-than (>), double quote ("), and single quote (') โ with their named or numeric HTML entity equivalents using simple string replacement. Decoding takes the reverse path by assigning the entity-encoded string as the innerHTML of a temporary <textarea> element and reading back the value property, which the browser automatically decodes. This approach leverages the browser's own HTML parser for decoding, ensuring correct handling of all named and numeric entities without maintaining a full entity lookup table.
Frequently Asked Questions
What are HTML entities?
HTML entities are special sequences of characters that represent reserved or non-printable characters in HTML markup. They begin with an ampersand (&) and end with a semicolon (;). Reserved characters like < and > must be encoded as < and > because the browser's HTML parser interprets bare angle brackets as tag delimiters. Similarly, the ampersand itself must be written as & because the parser uses it as the entity escape character. Beyond the five critical characters, entities are also used to represent characters outside the ASCII range, special typographic symbols like — for an em dash, and mathematical and currency symbols that may not be reliably representable in all character encodings.
When should I encode HTML entities?
You should encode HTML entities any time you insert dynamic content into an HTML document โ which in practice means whenever the value of a variable ends up between HTML tags or inside an HTML attribute. The five characters that must always be encoded are the ampersand (&), less-than sign (<), greater-than sign (>), double quote ("), and single quote ('). Failing to encode user-provided input before inserting it into HTML is the root cause of Reflected and Stored XSS vulnerabilities, which allow attackers to inject JavaScript that runs in other users' browsers. Modern frameworks like React, Vue, and Angular escape content by default, but when using raw HTML manipulation via innerHTML or server-side template engines, you must apply encoding explicitly.
What is the difference between named entities and numeric entities in HTML?
Named entities use a descriptive human-readable name preceded by & and terminated by ;, such as & for the ampersand character, < for less-than, or © for the copyright symbol. Numeric entities use the Unicode code point either in decimal form (© for the copyright symbol) or in hexadecimal form (©). Named entities exist only for a defined set of characters specified in the HTML standard, whereas numeric entities can represent any Unicode character regardless of whether a named form exists. Both representations produce identical results in the browser, so the choice between them is largely a matter of readability โ named entities are more readable in source code, while numeric entities are useful for characters that have no named form.