Text Tools
UTF-8 Byte Counter
Count UTF-8 bytes, UTF-16 code units, code points, and graphemes for any string, with SMS segmentation and database, cookie, and payload limits.
Encoding comparison
How many bytes the same string takes in each encoding.
| Encoding | Units | Bytes | Notes |
|---|---|---|---|
| UTF-8 | 0 bytes | 0 B | Default for the web, JSON, Postgres, MySQL utf8mb4, S3 objects, most HTTP bodies. |
| UTF-16 (LE) | 0 code units | 0 B | JavaScript String.length, Windows APIs, Java String. Each unit is 2 bytes; emoji use a surrogate pair. |
| UTF-32 | 0 code points | 0 B | Fixed 4 bytes per code point. Rare on the wire; common in string processing. |
| ASCII only | 0 bytes | 0 B | UTF-8 bytes that fall in 0-127. Non-ASCII content adds 0 extra bytes. |
SMS segmentation
SMS providers split long messages into 160-char (GSM-7) or 70-char (UCS-2) segments. One non-GSM character forces UCS-2 for the whole message.
GSM-7 (Latin SMS)
0 segments
0 septets (single segment up to 160; concatenated segments are 153 each).
UCS-2 fallback
0 segments
0 UTF-16 units (single segment up to 70; concatenated segments are 67 each).
Real-world limits
How your input compares to common size caps for posts, messages, payloads, and storage.
X / Twitter post
0 of 280 (code points)
Standard 280-character post. URLs, emoji, and CJK characters all count as code points.
280 left
Discord message
0 of 2,000 (code points)
Free tier message ceiling. Discord counts code points, not bytes.
2,000 left
Discord nickname
0 of 32 (code points)
32 left
Slack message
0 of 40,000 (characters)
40,000 left
GitHub bio
0 of 160 (characters)
160 left
Meta description (Google)
0 of 160 (characters (approx))
Google truncates around 155-160 characters in most desktop results.
160 left
HTTP cookie value
0 B of 4 KB (bytes)
Most browsers enforce a 4 KB cap per cookie (name + value + attributes).
4 KB left
URL (safe maximum)
0 B of 2 KB (bytes)
Safe length across browsers and search-engine crawlers; longer URLs work in practice.
2 KB left
DynamoDB item
0 B of 400 KB (bytes)
Whole item size, attribute names included. UTF-8 bytes are what AWS measures.
400 KB left
SQS / SNS message
0 B of 256 KB (bytes)
256 KB left
MongoDB document
0 B of 16 MB (bytes)
16 MB left
Lambda response (sync)
0 B of 6 MB (bytes)
Synchronous invocation payload limit. Async invocations cap at 256 KB.
6 MB left
Per-character byte breakdown
Which characters use the most bytes. Useful for trimming emoji-heavy strings to fit a database column.
How to use
- Paste or type any string into the input area. The summary cards on the right update instantly with UTF-8 bytes, UTF-16 code units, code points, and grapheme clusters.
- Read the encoding comparison table to see how many bytes the same string takes in UTF-8, UTF-16, UTF-32, and ASCII-only mode.
- Check the SMS segmentation panel to see whether the message fits in a single GSM-7 segment or forces a UCS-2 fallback. One non-GSM character is enough to switch the whole message to UCS-2.
- Scroll to the real-world limits section to compare the input against X/Twitter, Discord, DynamoDB, SQS, MongoDB, Lambda, and HTTP cookie size caps. A red bar means the input is over the cap.
- Open the per-character byte breakdown to see which characters use the most bytes, then trim or replace emoji and CJK runs if you need to fit a strict column or payload limit.
About this tool
UTF-8 Byte Counter reports the size of any string under every text-measurement system that real systems use, not just the JavaScript .length value most counters report. Paste a message, a JSON payload, an emoji-heavy social post, or a multilingual paragraph, and the tool returns the UTF-8 byte count (what HTTP bodies, JSON, S3 objects, Postgres, and MySQL utf8mb4 actually measure), the UTF-16 code-unit count (what JavaScript, Windows APIs, and Java reach for), the UTF-32 code-point count (the true number of Unicode scalar values), the grapheme cluster count (user-perceived characters, including emoji that combine via zero-width joiners such as family or skin-tone modifiers), and ASCII versus non-ASCII byte splits. A separate UTF-8 byte-width panel breaks the input into 1-byte ASCII characters, 2-byte Latin and Cyrillic letters, 3-byte CJK and most BMP scripts, and 4-byte emoji and supplementary-plane code points, so you can see why a 100-character message can take 300 bytes. The SMS panel reports both GSM-7 segmentation (160 septets in a single segment, 153 per concatenated segment) and the UCS-2 fallback that kicks in the moment a non-GSM character appears (70 UTF-16 units per single segment, 67 per concatenated segment); the GSM-7 alphabet and the seven extension characters are encoded from the 3GPP TS 23.038 reference so the count matches what carriers actually bill. A real-world limits section compares the input to X/Twitter posts, Discord messages, Slack messages, GitHub bios, Google meta descriptions, HTTP cookies, URLs, DynamoDB items, SQS and SNS messages, MongoDB documents, and Lambda response payloads, and a per-character byte breakdown lists which characters are pushing the byte count up. Everything runs locally with the browser's TextEncoder and Intl.Segmenter; the strings you measure never leave your device.
Free to use. Works in your browser. No signup, no login.
Related tools
You may also like
Character Counter
Detailed character, letter, number, space, and line counts.
Open tool
TextWord Counter
Live word, character, sentence, paragraph, and reading time stats.
Open tool
TextUnicode Character Inspector
Per-character breakdown with code points, UTF-8/UTF-16 bytes, and hidden character detection.
Open tool
ConverterData Size Converter
Convert between bytes, KB, MB, GB, TB and binary KiB, MiB, GiB with both unit systems shown.
Open tool
DeveloperJSON Formatter
Format, minify, and validate JSON in your browser.
Open tool