← Blog
OCR · 6 min

OCR for PDFs: turn scans into searchable text

What OCR is, how it works in the browser, and when on-device beats sending files to the cloud.

What is OCR?

Optical Character Recognition turns the pixels of a scanned page into real, selectable, searchable text. Without OCR, a scanned PDF is essentially a flipbook of photos: you can't Ctrl+F, you can't copy a paragraph, and screen readers ignore it.

How browser OCR works

The OCR PDF tool uses Tesseract.js, a WebAssembly port of the well-loved Google Tesseract engine. We render each PDF page to a high-DPI canvas, hand the pixels to Tesseract, and overlay the recognised text behind the image.

Because everything happens client-side:

  • Your scans never leave the device.
  • You can OCR sensitive material (medical, legal, financial) without compliance overhead.
  • It works offline once the language model is cached.

Supported languages

PDFMaster AI ships English by default and lazy-loads French, Spanish and Arabic on demand.

Speed expectations

Roughly 1–4 seconds per page on a modern laptop, longer on a phone. For a 50-page scan, plan for a coffee.