Text Tools
Invisible Character Detector
Detect zero-width spaces, BOM, soft hyphens, bidi controls, and non-standard whitespace in text. Inspect by codepoint and strip them in your browser.
Load a sample
Inspection runs on your device. The input is never uploaded to a server. Codepoint counts use grapheme-aware iteration so emoji and other supplementary characters are counted as a single character.
Removal categories
Each toggle controls a family of characters. Defaults strip the most common offenders and leave variation selectors and non-standard whitespace alone unless you opt in.
Inline preview
Every invisible codepoint is replaced with a small badge that names its codepoint. Disable this with the toggle above to see the raw paste verbatim (most of the badges will then disappear from view).
The annotated preview appears here once you paste text on the left.
Why hidden characters appear in text
- Pasted from rich editors: Notion, Google Docs, Confluence, Slack, and email clients often insert U+200B zero-width spaces around link boundaries and as line-wrap hints.
- File encoding: Files saved on Windows often start with U+FEFF, the byte order mark. When the file is read and the BOM is treated as content, it becomes an invisible first character that breaks regex matches and CSV header parsing.
- Localized keyboards: Persian, Arabic, Urdu, and Hindi keyboards produce zero-width joiners and non-joiners that travel with copied text.
- CJK keyboards: U+3000 ideographic space is the standard word separator in Japanese, Chinese, and Korean. Pasted into an English form, it looks like a regular space but fails byte-level equality.
- Adversarial input: U+202D and U+202E can reverse display order so source code reviews show one logic while the file executes another. This is known as a Trojan Source attack.
Common places these break things
- Form validation: A field that looks correct fails because trailing zero-width characters change the byte length or break a regex.
- Billing and identity: Card number, IBAN, tax ID, and SKU fields reject input that contains NBSP or zero-width spaces.
- CSV imports: A column that should match a known set of values fails to lookup because the values contain ideographic spaces or non-breaking spaces from a different system.
- Diffs and code review: Whitespace-only diffs that show no visible difference are often soft hyphens, zero-width spaces, or bidi controls.
- Search and replace: Find-and-replace fails when the search term has a regular space and the document contains a narrow no-break space at the same position.
How to use
- Paste any text into the input area on the left. The scanner runs instantly using your browser's Unicode-aware iterator.
- Read the per-character breakdown to see which codepoints are present in your input, sorted by frequency. Each row links to the codepoint, official Unicode name, and the family it belongs to.
- Use the eight removal toggles to control which families of invisible characters get stripped. Defaults handle the common offenders; enable variation selectors and non-standard whitespace only when you want them gone.
- Optionally enable Preserve zero-width joiners inside emoji to keep family and skin-tone emoji sequences intact, or Replace non-standard whitespace with a regular space to normalize NBSP and ideographic space to U+0020 instead of removing them.
- Copy the cleaned output, or use Replace input with cleaned text to keep iterating on the result without juggling two panels.
About this tool
Invisible Character Detector scans any text for codepoints that produce no visible glyph but still travel through your editor, your forms, your databases, and your APIs. It catches the byte order mark (U+FEFF), the zero-width family (U+200B zero-width space, U+200C non-joiner, U+200D joiner, U+2060 word joiner, U+180E Mongolian vowel separator), the soft hyphen (U+00AD) and other format characters, the full bidirectional control set used in Trojan Source attacks (U+202A through U+202E and U+2066 through U+2069 plus the LTR and RTL marks), every variation selector (U+FE00 through U+FE0F and the supplementary block at U+E0100), tag characters at U+E0000 through U+E007F, the C0 and C1 control characters that show up when Windows-1252 text is mistaken for UTF-8, and the non-standard whitespace that looks identical to a regular space but breaks equality checks (no-break space, narrow no-break space, ideographic space, en, em, thin, hair, figure, punctuation, three-per-em, four-per-em, six-per-em, medium mathematical, line and paragraph separators, and the Ogham space mark). Each occurrence is reported with its codepoint, official Unicode name, category, line, and column position so you can find it in the source. A per-character breakdown table groups the same codepoint together with a count so you can see at a glance which two or three offenders are causing the problem. Eight removal toggles let you strip categories selectively: keep variation selectors so emoji presentation stays intact, keep the zero-width joiner inside emoji sequences so family, profession, and skin-tone emoji survive, or convert non-standard whitespace to regular spaces instead of removing it entirely so the layout still reads naturally. Sensible defaults strip the common offenders (zero-width characters except inside emoji, BOM, soft hyphens, bidi controls, tag characters, and unusual control characters) and leave the rest alone unless you opt in. An inline preview replaces each invisible codepoint with a labeled badge so you can see exactly where the offenders live before you decide what to do with them. Useful for cleaning text pasted from rich editors and SaaS forms (Notion, Confluence, Google Docs, Slack, email), preparing data for CSV import and database lookup, debugging form validation failures, auditing source files for Trojan Source attacks, normalizing names and addresses that arrived through localized keyboards, and turning a copy that mysteriously fails string equality into one that just works. Everything runs locally in your browser using grapheme-aware iteration; the text you inspect is never uploaded to a server.
Free to use. Works in your browser. No signup, no login.
Related tools
You may also like
Text Cleaner
Remove duplicate lines, blank lines, extra spaces, tabs, and invisible characters.
Open tool
TextUnicode Character Inspector
Per-character breakdown with code points, UTF-8/UTF-16 bytes, and hidden character detection.
Open tool
TextFind and Replace
Find and replace text in plain or regex mode with live match highlighting.
Open tool
TextCharacter Counter
Detailed character, letter, number, space, and line counts.
Open tool
TextUTF-8 Byte Counter
UTF-8 bytes, UTF-16, code points, graphemes, plus SMS, cookie, and payload limits.
Open tool
TextLetter Frequency Counter
Per-character frequency table with percentages, bar chart, and English reference.
Open tool