Developer Tools
Unicode Normalizer
Normalize text and compare NFC, NFD, NFKC, and NFKD side by side. UTF-16, code point, UTF-8 byte, and grapheme counts with a per-code-point diff.
Try a sample
NFC (Canonical Composition)
Composes base letters and combining marks into precomposed code points where one exists. Identity for most ASCII and CJK input.
Paste text above to see the NFC output.UTF-16
0
Code points
0
UTF-8 bytes
0
Graphemes
0
Use for HTML, JSON, URLs, database storage, and anywhere a byte-stable canonical form is needed. The W3C recommends NFC for text on the web.
NFD (Canonical Decomposition)
Decomposes precomposed characters into a base letter plus separate combining marks. Hangul syllables split into jamo. Compatibility characters are kept as-is.
Paste text above to see the NFD output.UTF-16
0
Code points
0
UTF-8 bytes
0
Graphemes
0
Use as the first step of an ASCII fold or diacritic strip pipeline, and when running per-character analysis where each accent must be a separate code point.
NFKC (Compatibility Composition)
Composes like NFC, but also folds compatibility characters: ligatures (fi to fi), fullwidth forms (ABC to ABC), superscripts, the micro sign to the Greek mu, and Roman numeral glyphs to letters.
Paste text above to see the NFKC output.UTF-16
0
Code points
0
UTF-8 bytes
0
Graphemes
0
Use for search indexes, slug generation, deduplication, and any pipeline where a fullwidth A and an ASCII A should match.
NFKD (Compatibility Decomposition)
The most aggressive form. Folds compatibility characters like NFKC, then decomposes everything into base letters plus combining marks.
Paste text above to see the NFKD output.UTF-16
0
Code points
0
UTF-8 bytes
0
Graphemes
0
Use as the first step of an aggressive ASCII fold or slug normalizer. Run a diacritic strip pass after NFKD, then recompose to NFC for output.
When to use each form
NFC Canonical Composition
Composes base letters and combining marks into precomposed code points where one exists. Identity for most ASCII and CJK input.
Use for HTML, JSON, URLs, database storage, and anywhere a byte-stable canonical form is needed. The W3C recommends NFC for text on the web.
NFD Canonical Decomposition
Decomposes precomposed characters into a base letter plus separate combining marks. Hangul syllables split into jamo. Compatibility characters are kept as-is.
Use as the first step of an ASCII fold or diacritic strip pipeline, and when running per-character analysis where each accent must be a separate code point.
NFKC Compatibility Composition
Composes like NFC, but also folds compatibility characters: ligatures (fi to fi), fullwidth forms (ABC to ABC), superscripts, the micro sign to the Greek mu, and Roman numeral glyphs to letters.
Use for search indexes, slug generation, deduplication, and any pipeline where a fullwidth A and an ASCII A should match.
NFKD Compatibility Decomposition
The most aggressive form. Folds compatibility characters like NFKC, then decomposes everything into base letters plus combining marks.
Use as the first step of an aggressive ASCII fold or slug normalizer. Run a diacritic strip pass after NFKD, then recompose to NFC for output.
How to use
- Paste any text into the input. Try a sample like Precomposed cafe, Korean Hangul, or Ligatures and fullwidth to see each form's distinct behavior.
- Read the four side-by-side panels. Each shows the form's output text, a Changed or Unchanged badge, and four counts: UTF-16 code units, code points, UTF-8 bytes, and graphemes.
- Click Compare on any panel (or pick a form in the toggle below) to open the per-code-point diff. Changed rows are highlighted with the source code point on the left and the target code points on the right.
- Scroll to the code point table to see every U+ value, the glyph, and the Unicode general category for the picked form's output.
- Use Copy on any form to grab just that output, or Copy full report at the top for a plain-text comparison report covering every form and every count.
About this tool
Unicode Normalizer is a side-by-side explorer for the four Unicode normalization forms defined by UAX #15: NFC (Canonical Composition), NFD (Canonical Decomposition), NFKC (Compatibility Composition), and NFKD (Compatibility Decomposition). Paste any text and the tool runs String.prototype.normalize() under every form and shows them in four panels at once. Each panel reports the output, a Changed or Unchanged badge that flags whether the form rewrote the input, and a row of counts a developer actually needs to make a decision: UTF-16 code units (the value JavaScript .length returns), Unicode code points (the value Array.from(text).length returns), UTF-8 bytes via TextEncoder (the value a database VARCHAR or an HTTP body sees), and grapheme clusters via Intl.Segmenter (the value the user counts when they look at the screen). Pick any form as the Compare target and a unified per-code-point diff opens below: every input code point is mapped to the code points it produced under that form, the changed rows are highlighted, and a code point table lists every U+ value, the glyph (with friendly labels for spaces, tabs, newlines, zero-width joiners, and the byte-order mark), and the Unicode general category for each row. Built-in samples cover the canonical demonstrations: precomposed vs decomposed cafe, Korean Hangul syllables that split into three jamo under NFD, the fi ligature and fullwidth ABC that only NFKC and NFKD fold, the U+00B5 MICRO SIGN vs the U+03BC GREEK SMALL LETTER MU, the U+212B ANGSTROM SIGN that folds even under canonical normalization, Vietnamese tone marks, Hebrew with nikud points, and an emoji ZWJ family sequence used as a control so the user can see that normalization deliberately does not rewrite emoji. Useful for fixing combining-accent bugs in HTML and JSON, picking NFC for byte-stable storage (W3C and modern databases recommend NFC for web text), picking NFD as the first step of a diacritic strip or ASCII fold pipeline, picking NFKC for search indexes and deduplication where a fullwidth A and an ASCII A should match, picking NFKD as the most aggressive fold before falling back to ASCII, and explaining the difference between the four forms to a teammate during a code review. Copy any single form's output or the full plain-text report. Everything runs locally in your browser. The text you paste is never uploaded.
Free to use. Works in your browser. No signup, no login.
Related tools
You may also like
Unicode Character Inspector
Per-character breakdown with code points, UTF-8/UTF-16 bytes, and hidden character detection.
Open tool
TextDiacritic Remover
Strip accents, transliterate special letters, and normalize Unicode text to ASCII.
Open tool
TextUTF-8 Byte Counter
UTF-8 bytes, UTF-16, code points, graphemes, plus SMS, cookie, and payload limits.
Open tool
TextMojibake Fixer
Repair UTF-8 text misread as Windows-1252 or Latin-1, no signup.
Open tool
TextInvisible Character Detector
Find and remove zero-width spaces, BOM, bidi controls, and other hidden Unicode.
Open tool
DeveloperPunycode Converter
Convert IDN domains between Unicode and ASCII Punycode with per-label breakdown.
Open tool