WWDC25: Read documents using the Vision framework
Speaker: Megan Williams, Vision Framework Engineer Date: WWDC 2025
RecognizeDocumentsRequest
New API extending text recognition to understand document structure. Supports 26 languages.
Detects and groups:
- Tables (rows, columns, cells, supports merged cells)
- Lists (individual items)
- Paragraphs (lines grouped logically)
- Machine-readable codes (QR codes, barcodes)
- Key data types: email addresses, phone numbers, URLs, dates, measurements, currency, flight numbers, payment identifiers, tracking numbers, postal addresses
DocumentObservation is hierarchical: Document -> text/tables/lists/barcodes -> cells/items -> content (text, other tables)
Text views available: transcript, lines, paragraphs, words (not CJK/Thai), detectedData (DataDetection framework)
All processing on-device.
DetectLensSmudgeRequest
New API helps reject low-quality images caused by smudged lenses. Returns SmudgeObservation with confidence score 0-1.
Hand Pose Detection
Updated model for hand pose detection.