AWS Textract

Amazon’s cloud-based document text and data extraction service. The Analyze Expense feature is purpose-built for receipts and invoices, providing the highest line-item accuracy in third-party benchmarks.

Why It Matters for Project Aries

Textract is the accuracy leader for receipt parsing among cloud APIs. If Project Aries’ receipt scanning backend runs on AWS infrastructure, it’s the natural choice. The structured output is the most detailed — providing labeled fields and line items out of the box.

Key Facts

  • Provider: Amazon Web Services
  • Pricing: 0.008/page (1M+), 1,000 pages free first 3 months
  • Accuracy: 93% field-level, 89% line-item across 100 tested receipts
  • Features: Tables, forms, key-value pairs, Analyze Expense (receipt/invoice specific)

Receipt-Specific Performance

Receipt TypeField AccuracyLine-Item Accuracy
Restaurant96%93%
Fuel station95%91%
Pharmacy93%88%
Supermarket91%86%
Hardware store92%87%

Strengths

  • Highest overall line-item accuracy tested
  • Best structured output with labeled field types (VENDOR_NAME, TOTAL, INVOICE_RECEIPT_DATE, ITEM, PRICE)
  • Native AWS integration (S3, Lambda, DynamoDB)
  • Handles complex layouts (multi-column, price modifiers)

Weaknesses

  • Requires AWS account/IAM setup (steeper learning curve than REST APIs)
  • No permanent receipt-specific free tier (3 months only)
  • Verbose response format — needs SDK helpers or custom parsing
  • Cloud-only (no on-device option)