Automatic OCR

Free text extraction from scanned pages. Runs during normal upload processing, on every plan.

Updated May 23, 2026

When you upload a document with scanned or image-based pages, docAnalyzer runs OCR (optical character recognition) automatically during processing. The extracted text is indexed and searchable like any other content. You don't have to ask for it; it doesn't cost credits.

When automatic OCR runs

It runs as part of upload processing whenever the document has pages that don't already contain selectable text:

Scanned PDFs (image-only).
Photos of pages.
Mixed PDFs where some pages are text and some are scanned.

If the document already has selectable text, automatic OCR doesn't run; the text is already extractable directly.

What you get

Searchable text for every scanned page.
Citations from chat answers point back to the OCR'd page (the viewer still shows the scanned image).
Everything downstream (semantic search, retrieval, citations) works the same as for natively-text documents.

Language support

Automatic OCR supports 40+ languages out of the box, including:

English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Polish, Turkish, Swedish.
Chinese (simplified and traditional), Japanese, Korean.
Arabic, Hebrew, Hindi, Bengali, Tamil, Thai, Vietnamese.

Mixed-language documents work; the OCR detects language per region.

Where it works less well

Automatic OCR is best-effort. Quality is adequate for typed scans of common fonts; quality drops on:

Handwriting: variable handwriting is hard for automatic OCR.
Very low-quality scans: heavy compression artefacts, fax-quality input.
Complex layouts: multi-column with figures, tables, side notes, marginalia.
Unusual fonts: decorative, historical, or stylized typefaces.

If automatic OCR is producing poor results for one of those reasons, see Enhanced OCR.

Cost

Free, on every plan. Automatic OCR runs as part of upload and doesn't consume credits.

What's next

Enhanced OCR: premium OCR pass for the harder cases.
How docAnalyzer reads your documents: what happens after OCR finishes.
My document didn't process: when OCR isn't the issue.