Expensify Receipt Pipeline

Expensify’s production-proven multi-layer architecture for receipt scanning — the benchmark for what it takes to achieve 99% accuracy at scale. The key insight: OCR alone maxes out at ~85% accuracy on real-world receipts, requiring additional layers to reach production reliability.

The 6-Layer Architecture

Receipt Image
    │
    ▼
┌────────────────────────────────┐
│ Layer 1: Mobile Capture App     │  ← Built around scanning from signup
│  - Camera UX optimised for      │     Designed as core workflow
│    receipt photography           │
└──────────────┬─────────────────┘
               ▼
┌────────────────────────────────┐
│ Layer 2: Proprietary OCR        │  ← ~85% accuracy ceiling
│  - Trained on millions of        │
│    scanned receipts              │
│  - Best-in-class OCR scouted     │
│    globally                      │
└──────────────┬─────────────────┘
               ▼
┌────────────────────────────────┐
│ Layer 3: Template Parsers       │  ← Frequent vendors
│  - Recognizes known formats      │
│  - Home Depot, Amazon, Delta     │
│  - Seen thousands of times/month │
└──────────────┬─────────────────┘
               ▼
┌────────────────────────────────┐
│ Layer 4: Human Verification     │  ← "Secret sauce"
│  - Thousands of people 24/7      │
│  - Low-confidence items flagged  │
│  - Load balancing for month-end  │
│    spikes (hardest to replicate) │
└──────────────┬─────────────────┘
               ▼
┌────────────────────────────────┐
│ Layer 5: Bank Feed Matching     │  ← Reconciliation
│  - Matches to personal/corporate │
│    card feeds                    │
│  - Spend insights                │
└──────────────┬─────────────────┘
               ▼
┌────────────────────────────────┐
│ Layer 6: AI Receipt Auditing    │  ← Corporate compliance
│  - Flags manual data changes     │
│  - Admin visibility into         │
│    alterations by employees      │
└────────────────────────────────┘

Key Insight: Why OCR Alone Fails

  • Easy receipts (emailed, Home Depot without tips): handled by template parsers
  • Hard receipts (crumpled, handwritten tips): OCR fails, need human review
  • Trust factor: “All it takes is getting one receipt wrong to lose all trust in the receipt scanning reliability”

The human verification network is the layer that competitors haven’t replicated — “this is our not-so-secret, but never copied, sauce.”

Relevance to Project Aries

This architecture provides a template for:

  1. MVP layer: OCR + template parsers for known formats (Layers 2-3)
  2. Production layer: Add LLM-based extraction for unknown formats (replaces Layer 4 partially with AI)
  3. Enterprise layer: Human review queue + bank matching (Layers 4-6) if accuracy demands it

For a startup receipt scanner, layers 1-3 plus LLM-based extraction (see well-ai-invoice-extractor) can substitute for the human review layer at reasonable accuracy. The human review layer is only needed at Expensify scale and accuracy requirements.