WWDC25: Read documents using the Vision framework

Speaker: Megan Williams, Vision Framework Engineer Date: WWDC 2025

RecognizeDocumentsRequest

New API extending text recognition to understand document structure. Supports 26 languages.

Detects and groups:

  • Tables (rows, columns, cells, supports merged cells)
  • Lists (individual items)
  • Paragraphs (lines grouped logically)
  • Machine-readable codes (QR codes, barcodes)
  • Key data types: email addresses, phone numbers, URLs, dates, measurements, currency, flight numbers, payment identifiers, tracking numbers, postal addresses

DocumentObservation is hierarchical: Document -> text/tables/lists/barcodes -> cells/items -> content (text, other tables)

Text views available: transcript, lines, paragraphs, words (not CJK/Thai), detectedData (DataDetection framework)

All processing on-device.

DetectLensSmudgeRequest

New API helps reject low-quality images caused by smudged lenses. Returns SmudgeObservation with confidence score 0-1.

Hand Pose Detection

Updated model for hand pose detection.