Mistral OCR 4 Ships Bounding Boxes for Document AI

Mistral's June 23 release page for OCR 4 names three additions to its document extraction API: bounding boxes, block labels, and confidence scores that travel with extracted text. The same release lists 170-language support, a single-container option for enterprise customers, and availability through Mistral's API, Mistral Studio, Amazon SageMaker, and Microsoft Foundry. Snowflake Parse Document is listed as a later channel.

The benchmark section is also source-specific. Mistral said independent annotators preferred OCR 4 over competing OCR and document-AI systems at an average 72% win rate, using more than 600 documents across more than 12 languages. The company also reported 85.20 on OlmOCRBench, 93.07 on OmniDocBench, and 0.98 on its Crawl Multilingual evaluation.

Aidan Donohue, an AI engineer at Rogo, supplied the customer test in the release. His team, he said, tested OCR 4 on a financial QA dataset dense with charts and figures. “We benchmarked Mistral OCR 4 against the leading agentic document parsers across a chart and figure dense financial QA dataset and reached equivalent accuracy at roughly 8x lower cost and 17x lower latency,” Donohue said. “For production use cases at scale, that delta compounds fast.”

Key Takeaways

Mistral released OCR 4 with bounding boxes, block labels, and per-word confidence scores.
The model supports 170 languages and posted a 72% average win rate in Mistral's human evaluation.
OCR 4 returns document regions with coordinates so search and compliance systems can preserve page structure.
OCR 4 costs $4 per 1,000 pages, or $2 through Batch API; Document AI costs $5.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

The OCR response now carries page structure

The new mechanism is visible in Mistral's OCR Processor docs. A developer sends a PDF or image by URL, upload, or base64 data, then requests OCR from the same endpoint. The response can include markdown text, page dimensions, detected images, extracted tables, hyperlinks, headers, footers, and confidence data.

OCR 4 adds a block view to that response. When block extraction is enabled, each page can include entries for document regions such as body text, titles, lists, tables, images, equations, captions, code, references, side notes, headers, footers, and signatures. Each entry carries coordinates for the region and the content Mistral extracted from it.

Rogo and Anaqua are the two use cases Mistral names for that feature. Rogo's example is financial QA with charts and figures. Anaqua's is high-volume docketing, where Mihailov said page-level speed mattered against an incumbent provider.

Mistral's docs also allow page-level or word-level confidence scores. The cookbook shows those scores beside extracted words, so review can start with the terms the model marked as weak.

Get Implicator.ai in your inbox

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Document AI adds schema output

Mistral's documentation separates OCR 4 from Document AI by output. OCR 4 returns text and page structure. Document AI takes a schema or bounding-box annotation request and returns JSON fields or image descriptions.

Mistral said document annotation feeds OCR output to mistral-small-2603. For image, chart, or signature regions, the bounding-box flow can route the request to a vision-language model.

Know someone who'd find this useful? ✉️ Email it to a friend in one click, or they can subscribe free here.

Mistral's pricing table lists OCR 4 at $4 per 1,000 pages, or $2 per 1,000 pages through the Batch API. Document AI is listed at $5 per 1,000 pages. That is higher than the March 2025 OCR API, which Implicator covered at 1,000 pages per dollar with a batch discount.

Mistral ties OCR 4 to search pipelines

Mistral is also plugging OCR 4 into Search Toolkit, the open-source retrieval framework it released in public preview in May. Its docs describe OCR as the extractor for PDFs, DOCX files, PPTX files, and OpenDocument files before chunking and embedding.

Ivan Mihailov, an engineer at Anaqua, said in Mistral's release that Mistral OCR was roughly 4 times faster per page than Anaqua's incumbent provider for high-volume docketing workflows. Microsoft vice president Kimmi Grewal said Mistral Document AI with OCR 4 in Microsoft Foundry would bring structured document understanding into enterprise workflows.

Mistral still sets limits around the system. The company said OCR 4 is a document-understanding model, not a decision-maker, and is not intended for medical diagnosis, legal judgment, high-stakes financial decisions, safety-critical systems, real-time processing, or audio and video inputs. Its OCR 4 production webinar is scheduled for July 7.

Frequently Asked Questions

What is new in Mistral OCR 4?

OCR 4 adds paragraph-level bounding boxes, structural block labels, and confidence scores to extracted text. The output can show where a table, title, signature, or equation appears on the page.

How does OCR 4 work?

A developer sends a PDF or image to Mistral's OCR endpoint and can request markdown, tables, images, page metadata, block coordinates, and confidence scores. The response can then feed search, review, or data pipelines.

How is OCR 4 different from Document AI?

OCR 4 returns the document structure and extracted text. Document AI adds schemas and annotations, mapping OCR output into JSON fields or image annotations for business workflows.

How much does Mistral OCR 4 cost?

Mistral lists OCR 4 at $4 per 1,000 pages. Batch API processing cuts that to $2 per 1,000 pages. Document AI is priced at $5 per 1,000 pages.

Where is OCR 4 available?

Mistral says OCR 4 is available through its API, Mistral Studio, Amazon SageMaker, and Microsoft Foundry. Snowflake Parse Document support is planned later. Enterprise customers can self-host it in a single container.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

AI News

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

Mistral OCR 4 Ships Bounding Boxes for 170-Language Document AI

The OCR response now carries page structure

Document AI adds schema output

Mistral ties OCR 4 to search pipelines

Marcus Schuler

Get the Morning Briefing in your inbox.

Related Stories

Micron, SanDisk Lead 7.9% Chip Selloff on AI Spending Concerns

Sakana Fugu Launches With 93.2 LiveCodeBench Score After Claude Ban

Microsoft Weighs Hosting China's DeepSeek to Undercut OpenAI and Anthropic