Developer Tools

Unicode Normalizer

Normalize text and compare NFC, NFD, NFKC, and NFKD side by side. UTF-16, code point, UTF-8 byte, and grapheme counts with a per-code-point diff.

Paste any text0 UTF-16 units, 0 code points, 0 UTF-8 bytes, 0 graphemes

Try a sample

NFC (Canonical Composition)

Composes base letters and combining marks into precomposed code points where one exists. Identity for most ASCII and CJK input.

Unchanged

Paste text above to see the NFC output.

UTF-16

Code points

UTF-8 bytes

Graphemes

Use for HTML, JSON, URLs, database storage, and anywhere a byte-stable canonical form is needed. The W3C recommends NFC for text on the web.

NFD (Canonical Decomposition)

Decomposes precomposed characters into a base letter plus separate combining marks. Hangul syllables split into jamo. Compatibility characters are kept as-is.

Unchanged

Paste text above to see the NFD output.

UTF-16

Code points

UTF-8 bytes

Graphemes

Use as the first step of an ASCII fold or diacritic strip pipeline, and when running per-character analysis where each accent must be a separate code point.

NFKC (Compatibility Composition)

Composes like NFC, but also folds compatibility characters: ligatures (fi to fi), fullwidth forms (ABC to ABC), superscripts, the micro sign to the Greek mu, and Roman numeral glyphs to letters.

Unchanged

Paste text above to see the NFKC output.

UTF-16

Code points

UTF-8 bytes

Graphemes

Use for search indexes, slug generation, deduplication, and any pipeline where a fullwidth A and an ASCII A should match.

NFKD (Compatibility Decomposition)

The most aggressive form. Folds compatibility characters like NFKC, then decomposes everything into base letters plus combining marks.

Unchanged

Paste text above to see the NFKD output.

UTF-16

Code points

UTF-8 bytes

Graphemes

Use as the first step of an aggressive ASCII fold or slug normalizer. Run a diacritic strip pass after NFKD, then recompose to NFC for output.

When to use each form

NFC Canonical Composition
Composes base letters and combining marks into precomposed code points where one exists. Identity for most ASCII and CJK input.
Use for HTML, JSON, URLs, database storage, and anywhere a byte-stable canonical form is needed. The W3C recommends NFC for text on the web.
NFD Canonical Decomposition
Decomposes precomposed characters into a base letter plus separate combining marks. Hangul syllables split into jamo. Compatibility characters are kept as-is.
Use as the first step of an ASCII fold or diacritic strip pipeline, and when running per-character analysis where each accent must be a separate code point.
NFKC Compatibility Composition
Composes like NFC, but also folds compatibility characters: ligatures (fi to fi), fullwidth forms (ABC to ABC), superscripts, the micro sign to the Greek mu, and Roman numeral glyphs to letters.
Use for search indexes, slug generation, deduplication, and any pipeline where a fullwidth A and an ASCII A should match.
NFKD Compatibility Decomposition
The most aggressive form. Folds compatibility characters like NFKC, then decomposes everything into base letters plus combining marks.
Use as the first step of an aggressive ASCII fold or slug normalizer. Run a diacritic strip pass after NFKD, then recompose to NFC for output.

How to use

Paste any text into the input. Try a sample like Precomposed cafe, Korean Hangul, or Ligatures and fullwidth to see each form's distinct behavior.
Read the four side-by-side panels. Each shows the form's output text, a Changed or Unchanged badge, and four counts: UTF-16 code units, code points, UTF-8 bytes, and graphemes.
Click Compare on any panel (or pick a form in the toggle below) to open the per-code-point diff. Changed rows are highlighted with the source code point on the left and the target code points on the right.
Scroll to the code point table to see every U+ value, the glyph, and the Unicode general category for the picked form's output.
Use Copy on any form to grab just that output, or Copy full report at the top for a plain-text comparison report covering every form and every count.

About this tool

Unicode Normalizer is a side-by-side explorer for the four Unicode normalization forms defined by UAX #15: NFC (Canonical Composition), NFD (Canonical Decomposition), NFKC (Compatibility Composition), and NFKD (Compatibility Decomposition). Paste any text and the tool runs String.prototype.normalize() under every form and shows them in four panels at once. Each panel reports the output, a Changed or Unchanged badge that flags whether the form rewrote the input, and a row of counts a developer actually needs to make a decision: UTF-16 code units (the value JavaScript .length returns), Unicode code points (the value Array.from(text).length returns), UTF-8 bytes via TextEncoder (the value a database VARCHAR or an HTTP body sees), and grapheme clusters via Intl.Segmenter (the value the user counts when they look at the screen). Pick any form as the Compare target and a unified per-code-point diff opens below: every input code point is mapped to the code points it produced under that form, the changed rows are highlighted, and a code point table lists every U+ value, the glyph (with friendly labels for spaces, tabs, newlines, zero-width joiners, and the byte-order mark), and the Unicode general category for each row. Built-in samples cover the canonical demonstrations: precomposed vs decomposed cafe, Korean Hangul syllables that split into three jamo under NFD, the fi ligature and fullwidth ABC that only NFKC and NFKD fold, the U+00B5 MICRO SIGN vs the U+03BC GREEK SMALL LETTER MU, the U+212B ANGSTROM SIGN that folds even under canonical normalization, Vietnamese tone marks, Hebrew with nikud points, and an emoji ZWJ family sequence used as a control so the user can see that normalization deliberately does not rewrite emoji. Useful for fixing combining-accent bugs in HTML and JSON, picking NFC for byte-stable storage (W3C and modern databases recommend NFC for web text), picking NFD as the first step of a diacritic strip or ASCII fold pipeline, picking NFKC for search indexes and deduplication where a fullwidth A and an ASCII A should match, picking NFKD as the most aggressive fold before falling back to ASCII, and explaining the difference between the four forms to a teammate during a code review. Copy any single form's output or the full plain-text report. Everything runs locally in your browser. The text you paste is never uploaded.

Free to use. Works in your browser. No signup, no login.

Related tools

Unicode Normalizer

How to use

About this tool

You may also like

Unicode Character Inspector

Diacritic Remover

UTF-8 Byte Counter

Mojibake Fixer

Invisible Character Detector

Punycode Converter