Text Tools

UTF-8 Byte Counter

Count UTF-8 bytes, UTF-16 code units, code points, and graphemes for any string, with SMS segmentation and database, cookie, and payload limits.

Paste or type text0 UTF-8 bytes

Try a sample:

Encoding comparison

How many bytes the same string takes in each encoding.

Encoding	Units	Notes
UTF-8	0 bytes	Default for the web, JSON, Postgres, MySQL utf8mb4, S3 objects, most HTTP bodies.
UTF-16 (LE)	0 code units	JavaScript String.length, Windows APIs, Java String. Each unit is 2 bytes; emoji use a surrogate pair.
UTF-32	0 code points	Fixed 4 bytes per code point. Rare on the wire; common in string processing.
ASCII only	0 bytes	UTF-8 bytes that fall in 0-127. Non-ASCII content adds 0 extra bytes.

SMS segmentation

SMS providers split long messages into 160-char (GSM-7) or 70-char (UCS-2) segments. One non-GSM character forces UCS-2 for the whole message.

GSM-7 (Latin SMS)

0 segments

0 septets (single segment up to 160; concatenated segments are 153 each).

UCS-2 fallback

0 segments

0 UTF-16 units (single segment up to 70; concatenated segments are 67 each).

Real-world limits

How your input compares to common size caps for posts, messages, payloads, and storage.

X / Twitter post

0 of 280 (code points)

Standard 280-character post. URLs, emoji, and CJK characters all count as code points.

280 left

Discord message

0 of 2,000 (code points)

Free tier message ceiling. Discord counts code points, not bytes.

2,000 left

Discord nickname

0 of 32 (code points)

32 left

Slack message

0 of 40,000 (characters)

40,000 left

GitHub bio

0 of 160 (characters)

160 left

Meta description (Google)

0 of 160 (characters (approx))

Google truncates around 155-160 characters in most desktop results.

160 left

HTTP cookie value

0 B of 4 KB (bytes)

Most browsers enforce a 4 KB cap per cookie (name + value + attributes).

4 KB left

URL (safe maximum)

0 B of 2 KB (bytes)

Safe length across browsers and search-engine crawlers; longer URLs work in practice.

2 KB left

DynamoDB item

0 B of 400 KB (bytes)

Whole item size, attribute names included. UTF-8 bytes are what AWS measures.

400 KB left

SQS / SNS message

0 B of 256 KB (bytes)

256 KB left

MongoDB document

0 B of 16 MB (bytes)

16 MB left

Lambda response (sync)

0 B of 6 MB (bytes)

Synchronous invocation payload limit. Async invocations cap at 256 KB.

6 MB left

Per-character byte breakdown

Which characters use the most bytes. Useful for trimming emoji-heavy strings to fit a database column.

How to use

Paste or type any string into the input area. The summary cards on the right update instantly with UTF-8 bytes, UTF-16 code units, code points, and grapheme clusters.
Read the encoding comparison table to see how many bytes the same string takes in UTF-8, UTF-16, UTF-32, and ASCII-only mode.
Check the SMS segmentation panel to see whether the message fits in a single GSM-7 segment or forces a UCS-2 fallback. One non-GSM character is enough to switch the whole message to UCS-2.
Scroll to the real-world limits section to compare the input against X/Twitter, Discord, DynamoDB, SQS, MongoDB, Lambda, and HTTP cookie size caps. A red bar means the input is over the cap.
Open the per-character byte breakdown to see which characters use the most bytes, then trim or replace emoji and CJK runs if you need to fit a strict column or payload limit.

About this tool

UTF-8 Byte Counter reports the size of any string under every text-measurement system that real systems use, not just the JavaScript .length value most counters report. Paste a message, a JSON payload, an emoji-heavy social post, or a multilingual paragraph, and the tool returns the UTF-8 byte count (what HTTP bodies, JSON, S3 objects, Postgres, and MySQL utf8mb4 actually measure), the UTF-16 code-unit count (what JavaScript, Windows APIs, and Java reach for), the UTF-32 code-point count (the true number of Unicode scalar values), the grapheme cluster count (user-perceived characters, including emoji that combine via zero-width joiners such as family or skin-tone modifiers), and ASCII versus non-ASCII byte splits. A separate UTF-8 byte-width panel breaks the input into 1-byte ASCII characters, 2-byte Latin and Cyrillic letters, 3-byte CJK and most BMP scripts, and 4-byte emoji and supplementary-plane code points, so you can see why a 100-character message can take 300 bytes. The SMS panel reports both GSM-7 segmentation (160 septets in a single segment, 153 per concatenated segment) and the UCS-2 fallback that kicks in the moment a non-GSM character appears (70 UTF-16 units per single segment, 67 per concatenated segment); the GSM-7 alphabet and the seven extension characters are encoded from the 3GPP TS 23.038 reference so the count matches what carriers actually bill. A real-world limits section compares the input to X/Twitter posts, Discord messages, Slack messages, GitHub bios, Google meta descriptions, HTTP cookies, URLs, DynamoDB items, SQS and SNS messages, MongoDB documents, and Lambda response payloads, and a per-character byte breakdown lists which characters are pushing the byte count up. Everything runs locally with the browser's TextEncoder and Intl.Segmenter; the strings you measure never leave your device.

Free to use. Works in your browser. No signup, no login.

Related tools

UTF-8 Byte Counter

How to use

About this tool

You may also like

Character Counter

Word Counter

Unicode Character Inspector

Data Size Converter

JSON Formatter