OpenReplay Logo
12k
12k

Invisible character cleaner

Detect and remove hidden Unicode — zero-width characters, bidi controls, tag characters and homoglyphs — that hide prompt-injection payloads or corrupt text. Scanned and cleaned in your browser.

Categories to clean Scanned & cleaned locally

Removal categories strip the character; Spaces & NBSP and Homoglyphs are normalized (replaced) and off by default to protect legitimate text.

No text to scan yet.
Cleaned text
Removed 0, normalized 0 characters

About this tool

Text can carry characters you can't see: zero-width spaces and joiners, byte-order marks, right-to-left and bidirectional overrides, Unicode tag characters, non-breaking spaces and look-alike homoglyphs from other scripts. They slip in from copy-paste, rich editors and PDFs — and they're increasingly used to smuggle hidden instructions into text fed to an LLM, or to disguise one string as another. This tool scans your text and lists every suspicious character with its position, code point and Unicode name.

Each class of character is a separate toggle, so you decide what to strip — clean zero-width and bidi controls while keeping legitimate emoji and accented letters, or normalize homoglyphs back to ASCII. A before/after view shows exactly what changed and you copy the cleaned result with one click. Nothing that would damage normal multilingual text is removed by default.

Detection and cleaning run entirely in your browser — the text you paste, which may itself be a suspicious payload, never leaves your device.

Frequently asked questions

What are invisible or zero-width characters?

Unicode code points that render as no visible glyph or as ordinary whitespace — zero-width space (U+200B), zero-width joiner (U+200D), the byte-order mark (U+FEFF) and others. They legitimately appear in some scripts and emoji sequences, but out of context they are often noise or a hiding place for data.

How is this related to prompt injection?

Attackers can embed instructions using characters a human reviewer won't see — hidden in a zero-width sequence or disguised with bidi overrides — so the rendered text looks benign while the model reads something else. Stripping these characters before sending text to a model removes that hiding place.

What are homoglyphs?

Characters from different scripts that look identical, like the Latin 'a' and the Cyrillic 'а'. They are used to spoof domains, usernames and keywords; the cleaner can flag and normalize common look-alikes back to their ASCII equivalents.

Will it remove emoji or accented letters?

Not unless you ask it to. Each category is an independent toggle and the defaults preserve normal multilingual text, including emoji and diacritics — only genuinely hidden or deceptive characters are targeted.