Apple Vision Framework

Apple’s native on-device machine learning framework for computer vision tasks. Available on iOS, macOS, iPadOS, tvOS, and visionOS. All processing runs entirely on-device for privacy and performance.

Why It Matters for Project Aries

Vision provides the primary on-device OCR and document scanning capability for any iOS app. It is free, deeply integrated with the OS, and has been expanding with each WWDC — WWDC25 added structured document reading that directly supports receipt scanning.

Key Capabilities

Text Recognition (VNRecognizeTextRequest)

  • Fast mode: Character-detection + small ML model — similar to traditional OCR
  • Accurate mode: Neural network finds strings/lines, then words — human-like reading
  • Multi-language support with language correction via NLP
  • Custom word lists for domain-specific jargon
  • Confidence scores per recognized string (0-1)
  • Word-level bounding boxes

Document Scanning (VisionKit)

  • VNDocumentCameraViewController: built-in document scanning UI since iOS 13
  • Automatic edge detection and perspective correction
  • Page-by-page scanning with delegate callbacks

Document Structure (RecognizeDocumentsRequest — WWDC25)

See recognizedocumentsrequest for full details.

  • Tables, lists, paragraphs, barcodes
  • Data Detection: email, phone, URLs, dates, currency, addresses
  • 26 languages, all on-device

Other

  • 31+ existing APIs: face/body/hand detection, pose tracking, trajectory analysis
  • DetectLensSmudgeRequest (WWDC25): image quality rejection
  • Rectangle detection for custom camera experiences

Trade-offs vs ML Kit

  • Vision is 6x slower (~0.31s vs ~0.05s per image) but has richer API features
  • Slightly better on rotated text (>20°)
  • iOS-only (ML Kit is cross-platform)
  • Supports Chinese (ML Kit is Latin-only)

Platform

iOS 11+ (basic), iOS 13+ (document scanning), iOS 19+ (RecognizeDocumentsRequest)