Zero Signup ToolsFree browser tools

Developer Tools

String Similarity Checker

Compare two strings with Levenshtein, Damerau, Jaro-Winkler, Hamming, LCS, Dice, Jaccard, and cosine similarity. Instant scores in your browser.

20 chars
20 chars
Try a sample:

Preprocessing

Choose how the inputs are normalized before comparison. Distances and similarities recompute as you toggle.

Similarity metrics

Higher is more similar. Each metric measures something different, so it is normal for them to disagree.

Levenshtein similarity

80.00% (0.8000)distance 4

1 minus edit distance divided by the longer string length. The most-cited general-purpose similarity.

Damerau-Levenshtein similarity

90.00% (0.9000)distance 2

Levenshtein plus transposition as a single edit. Often closer to perceived similarity for typos.

Jaro

96.67% (0.9667)

Counts matching characters within a sliding window and out-of-order pairs. Built for short fields like names.

Jaro-Winkler

98.00% (0.9800)

Jaro with a bonus for shared prefixes. Strongly favors strings that begin the same.

Hamming similarity

80.00% (0.8000)distance 4

Only defined when the strings are the same length. Counts positions where the characters differ.

LCS ratio

90.00% (0.9000)distance 18

2 times the length of the longest common subsequence divided by the sum of lengths. The metric a diff tool uses.

Dice coefficient (bigrams)

68.42% (0.6842)

Multiset bigram overlap. Robust for short strings; insensitive to small character swaps.

Dice coefficient (n=2)

68.42% (0.6842)

Dice on character n-grams of your chosen size.

Jaccard index (n=2)

54.17% (0.5417)

Intersection over union on character n-gram sets. Ignores repeats.

Cosine similarity (n=2)

70.09% (0.7009)

Cosine of the n-gram count vectors. Common in NLP and information retrieval.

How to use

  1. Paste one string into A and another into B. Try one of the sample pairs to see how typos and reworded text score across metrics.
  2. Set preprocessing: case insensitive, trim or collapse whitespace, and Unicode NFKC normalization. These are applied before any metric is calculated.
  3. Pick the n-gram size for Jaccard, Cosine, and Dice-n (bigrams are a sensible default; trigrams work well for longer text).
  4. Read the Overall similarity badge for a quick summary, then check the individual metric cards: each has a meter, a percent score, and a one-line explanation of when to use it.
  5. Use Swap A and B if you want to verify symmetry, then Copy report to grab a clean text summary of every metric for your notes, ticket, or dataset documentation.

About this tool

String Similarity Checker compares two strings with every common similarity and distance metric at the same time, so you can pick the right one for your job without bouncing between calculators. Paste a string into A and another into B and the tool reports the Levenshtein edit distance and its normalized similarity, the Damerau-Levenshtein distance (which counts an adjacent-letter transposition as a single edit and is better for typo correction), the Hamming distance (defined only when the two strings have the same length, useful for fixed-length codes, hashed fingerprints, and DNA k-mers), the longest common subsequence length and the LCS ratio used by diff tools, the Jaro similarity, the Jaro-Winkler similarity (Jaro plus a prefix-match bonus, originally built at the US Census Bureau for matching personal names), the Dice / Sorensen coefficient on character bigrams, the Dice coefficient on n-grams of your chosen size, the Jaccard index on n-gram sets (intersection over union), and the cosine similarity on n-gram count vectors (the standard for vector-space retrieval and many NLP pipelines). A single composite Overall similarity score blends the four most-cited metrics for a quick at-a-glance read. Preprocessing options apply before any metric runs, so you can compare strings while ignoring case, trimming or collapsing whitespace, or applying Unicode NFKC normalization (which makes the 'fi' ligature equal to 'fi' and folds full-width characters). The n-gram size for Jaccard, Cosine, and Dice-n is configurable between 1 and 8 so you can tune for short names (bigrams) or longer text fields (trigrams or 4-grams). Every algorithm is implemented from scratch in TypeScript and runs in your browser; there is no library dependency to audit and no server round trip. Useful for record-linkage and fuzzy dedupe pipelines, spell-checkers and typo-tolerant search, name and address matching, plagiarism and paraphrase checks, fuzzy SKU and EAN matching, manual review of database join keys, building or sanity-checking a similarity threshold, learning what each metric actually measures, and homework or interview prep on edit-distance algorithms. Input is capped at 5,000 characters per side, which keeps the O(n*m) algorithms responsive on a laptop while still covering paragraph-length text and any realistic dedup key. Nothing is uploaded.

Free to use. Works in your browser. No signup, no login.

Related tools

You may also like

All tools
All toolsDeveloper Tools