Recognizing Text in Images — Apple Vision Framework

Overview

The Vision framework detects and recognizes multilanguage text in images. All on-device.

Two processing paths:

Fast: character-detection + small ML model; similar to traditional OCR
Accurate: neural network finds strings/lines, then words/sentences; more human-like

Optional language-correction phase (NLP-based) reduces misreadings.

API

VNRecognizeTextRequest with VNImageRequestHandler. Returns VNRecognizedTextObservation array. Use topCandidates(1).first?.string for best result.

Language Settings

Default bias toward English. Override with recognitionLanguages array. usesLanguageCorrection = true for NLP correction. customWords for domain-specific jargon given precedence during correction.

Bounding Boxes

Each observation provides normalized bounding rectangle. Convert to image coordinates via VNImageRectForNormalizedRect. Fast path: character-based boxes. Accurate path: whitespace-tokenized boxes (Chinese may give line fragments).

Structuring recognized text on a document — business card/receipt text structuring
Extracting phone numbers from text in images

Project Aries

Explorer

apple-vision-text-recognition-docs

Recognizing Text in Images — Apple Vision Framework

Overview

API

Language Settings

Bounding Boxes

Graph View

Table of Contents

Backlinks

Project Aries

Explorer

apple-vision-text-recognition-docs

Recognizing Text in Images — Apple Vision Framework

Overview

API

Language Settings

Bounding Boxes

Related Samples

Graph View

Table of Contents

Backlinks