OCR for PDFs: turn scans into searchable text
What OCR is, how it works in the browser, and when on-device beats sending files to the cloud.
What is OCR?
Optical Character Recognition turns the pixels of a scanned page into real, selectable, searchable text. Without OCR, a scanned PDF is essentially a flipbook of photos: you can't Ctrl+F, you can't copy a paragraph, and screen readers ignore it.
How browser OCR works
The OCR PDF tool uses Tesseract.js, a WebAssembly port of the well-loved Google Tesseract engine. We render each PDF page to a high-DPI canvas, hand the pixels to Tesseract, and overlay the recognised text behind the image.
Because everything happens client-side:
- Your scans never leave the device.
- You can OCR sensitive material (medical, legal, financial) without compliance overhead.
- It works offline once the language model is cached.
Supported languages
PDFMaster AI ships English by default and lazy-loads French, Spanish and Arabic on demand.
Speed expectations
Roughly 1–4 seconds per page on a modern laptop, longer on a phone. For a 50-page scan, plan for a coffee.