Recognizing Text in Images — Apple Vision Framework
Overview
The Vision framework detects and recognizes multilanguage text in images. All on-device.
Two processing paths:
- Fast: character-detection + small ML model; similar to traditional OCR
- Accurate: neural network finds strings/lines, then words/sentences; more human-like
Optional language-correction phase (NLP-based) reduces misreadings.
API
VNRecognizeTextRequest with VNImageRequestHandler. Returns VNRecognizedTextObservation array. Use topCandidates(1).first?.string for best result.
Language Settings
Default bias toward English. Override with recognitionLanguages array. usesLanguageCorrection = true for NLP correction. customWords for domain-specific jargon given precedence during correction.
Bounding Boxes
Each observation provides normalized bounding rectangle. Convert to image coordinates via VNImageRectForNormalizedRect. Fast path: character-based boxes. Accurate path: whitespace-tokenized boxes (Chinese may give line fragments).
Related Samples
- Structuring recognized text on a document — business card/receipt text structuring
- Extracting phone numbers from text in images