Expensify Receipt Pipeline
Expensify’s production-proven multi-layer architecture for receipt scanning — the benchmark for what it takes to achieve 99% accuracy at scale. The key insight: OCR alone maxes out at ~85% accuracy on real-world receipts, requiring additional layers to reach production reliability.
The 6-Layer Architecture
Receipt Image
│
▼
┌────────────────────────────────┐
│ Layer 1: Mobile Capture App │ ← Built around scanning from signup
│ - Camera UX optimised for │ Designed as core workflow
│ receipt photography │
└──────────────┬─────────────────┘
▼
┌────────────────────────────────┐
│ Layer 2: Proprietary OCR │ ← ~85% accuracy ceiling
│ - Trained on millions of │
│ scanned receipts │
│ - Best-in-class OCR scouted │
│ globally │
└──────────────┬─────────────────┘
▼
┌────────────────────────────────┐
│ Layer 3: Template Parsers │ ← Frequent vendors
│ - Recognizes known formats │
│ - Home Depot, Amazon, Delta │
│ - Seen thousands of times/month │
└──────────────┬─────────────────┘
▼
┌────────────────────────────────┐
│ Layer 4: Human Verification │ ← "Secret sauce"
│ - Thousands of people 24/7 │
│ - Low-confidence items flagged │
│ - Load balancing for month-end │
│ spikes (hardest to replicate) │
└──────────────┬─────────────────┘
▼
┌────────────────────────────────┐
│ Layer 5: Bank Feed Matching │ ← Reconciliation
│ - Matches to personal/corporate │
│ card feeds │
│ - Spend insights │
└──────────────┬─────────────────┘
▼
┌────────────────────────────────┐
│ Layer 6: AI Receipt Auditing │ ← Corporate compliance
│ - Flags manual data changes │
│ - Admin visibility into │
│ alterations by employees │
└────────────────────────────────┘
Key Insight: Why OCR Alone Fails
- Easy receipts (emailed, Home Depot without tips): handled by template parsers
- Hard receipts (crumpled, handwritten tips): OCR fails, need human review
- Trust factor: “All it takes is getting one receipt wrong to lose all trust in the receipt scanning reliability”
The human verification network is the layer that competitors haven’t replicated — “this is our not-so-secret, but never copied, sauce.”
Relevance to Project Aries
This architecture provides a template for:
- MVP layer: OCR + template parsers for known formats (Layers 2-3)
- Production layer: Add LLM-based extraction for unknown formats (replaces Layer 4 partially with AI)
- Enterprise layer: Human review queue + bank matching (Layers 4-6) if accuracy demands it
For a startup receipt scanner, layers 1-3 plus LLM-based extraction (see well-ai-invoice-extractor) can substitute for the human review layer at reasonable accuracy. The human review layer is only needed at Expensify scale and accuracy requirements.
Related Pages
- ios-receipt-scanning — Comprehensive receipt scanning guide
- well-ai-invoice-extractor — LLM-based structured extraction (replaces Layer 4 partially)
- receipt-parsing-strategies — Comparison of extraction approaches