Project Aries

❯

❯

❯

wwdc25-read-documents-vision

wwdc25-read-documents-vision

Jun 16, 20261 min read

WWDC25: Read documents using the Vision framework

Speaker: Megan Williams, Vision Framework Engineer Date: WWDC 2025

RecognizeDocumentsRequest

New API extending text recognition to understand document structure. Supports 26 languages.

Detects and groups:

Tables (rows, columns, cells, supports merged cells)
Lists (individual items)
Paragraphs (lines grouped logically)
Machine-readable codes (QR codes, barcodes)
Key data types: email addresses, phone numbers, URLs, dates, measurements, currency, flight numbers, payment identifiers, tracking numbers, postal addresses

DocumentObservation is hierarchical: Document -> text/tables/lists/barcodes -> cells/items -> content (text, other tables)

Text views available: transcript, lines, paragraphs, words (not CJK/Thai), detectedData (DataDetection framework)

All processing on-device.

DetectLensSmudgeRequest

New API helps reject low-quality images caused by smudged lenses. Returns SmudgeObservation with confidence score 0-1.

Hand Pose Detection

Updated model for hand pose detection.

Graph View

WWDC25: Read documents using the Vision framework
RecognizeDocumentsRequest
DetectLensSmudgeRequest
Hand Pose Detection

Backlinks

index

Created with Quartz v5.0.0 © 2026

Source