PDF Tools

PDF Text Extractor

Extract text from any PDF in your browser. Page-by-page view, copy, and download as .txt. The file is never uploaded.

PDF file

Drop a PDF here or click to choose

Up to 200 MB. The PDF is read in your browser and never uploaded.

Result

Drop a PDF above and the extracted text appears here along with page counts, word and character totals, and a per-page view. Everything stays in your browser; the file is never uploaded.

How to use

Drop a .pdf onto the upload area or click Choose file. Up to 200 MB is supported.
The extractor decodes the page content streams in your browser and renders the combined text along with stats: pages, words, characters, and how many pages contain text.
Switch to the By page view to step through individual pages, jump to a specific page, and copy that page on its own.
Use Copy all text or Copy page text to copy to the clipboard, or Download .txt to save the extracted text as a UTF-8 file.
Pages without extractable text are almost always scanned images without an OCR layer; the tool reports the count so you know which pages need an OCR pass.

About this tool

PDF Text Extractor pulls the readable text out of a PDF without uploading the file. Drop a .pdf and the tool indexes every PDF object in the file, locates each page dictionary, decodes the page content streams (FlateDecode is decompressed natively in the browser using the Compression Streams API, and ASCIIHexDecode and ASCII85Decode are handled too), tokenizes the standard text-showing operators (Tj, TJ, ' and "), and returns the extracted text alongside word and character totals, the page count, and a per-page view you can step through. Strings are decoded with full PDF rules: parenthesized literals with backslash escapes (including octal forms), angle-bracket hex strings, UTF-16 BE strings with the FE FF byte order mark used by Word and Acrobat, UTF-8 BOM literals used by some newer toolchains, and PDFDocEncoding for everything else (the small handful of bytes that differ from ASCII are mapped to their proper Unicode equivalents). Line breaks are inferred from Td, TD, T* and the line-show operators ' and ", and large negative kerning adjustments in TJ arrays are treated as word breaks the way Adobe's own extractor does. Every step runs inside the tab. The file is read with file.arrayBuffer() into a typed-array view, kept in memory only for the time it takes to decode, and never transmitted to a server. That makes the tool safe for contracts, signed agreements, invoices, transcripts, medical records, school forms, and anything else you would rather not hand to a third-party SaaS. Pages without a text layer (scanned image-only PDFs) are reported as empty so you know why nothing came out; this tool does not perform OCR. Encrypted PDFs are detected and flagged so you can remove the password first. Use Copy all text, Copy page text, or Download .txt to send the result wherever you need it.

Free to use. Works in your browser. No signup, no login.

Related tools

PDF Text Extractor

How to use

About this tool

You may also like

PDF Page Counter

PDF Metadata Viewer

PDF Form Field Inspector

PDF Security Inspector

HTML to Plain Text

Word Counter