<claudexml/>
Extraction · intermediate

Invoice / receipt line-item extractor

Convert an OCR'd invoice into structured JSON: vendor, totals, line items.

AP automation: invoices come in as PDFs, OCR'd to messy text. You need vendor name, dates, currency, totals, and per-line-item details.

The prompt

Copy this verbatim. Replace the {{ … }} placeholders with your values.

<instructions>
Extract invoice data from the OCR text in <invoice>. Return JSON inside <result> tags.

Rules:
- If a number isn't clearly stated, use null. Do not estimate.
- Currency must be an ISO-4217 code (USD, EUR, GBP, JPY, ...).
- Dates must be ISO-8601 (YYYY-MM-DD). Convert if the source uses another format.
- Line item quantity defaults to 1 if not stated.
- Subtotal + tax should equal total within rounding; if not, flag with "totals_inconsistent": true at the top level.
</instructions>

<format>
{
  "vendor_name": "string or null",
  "vendor_tax_id": "string or null",
  "invoice_number": "string or null",
  "issue_date": "YYYY-MM-DD or null",
  "due_date": "YYYY-MM-DD or null",
  "currency": "ISO-4217 code",
  "subtotal": 0.0,
  "tax": 0.0,
  "total": 0.0,
  "totals_inconsistent": false,
  "line_items": [
    { "description": "string", "quantity": 1, "unit_price": 0.0, "amount": 0.0 }
  ]
}
</format>

<invoice>{{ invoice_ocr_text }}</invoice>

Return inside <result> tags.

Sample input

ACME WIDGETS INC
Invoice #INV-2026-0512
Date: May 12, 2026  Due: 30 days net
Widget A   x2   $50.00   $100.00
Shipping            $12.00
Subtotal $112.00  Tax $9.80  Total $121.80
Thank you.

Expected output

<result>
{
  "vendor_name": "ACME WIDGETS INC",
  "vendor_tax_id": null,
  "invoice_number": "INV-2026-0512",
  "issue_date": "2026-05-12",
  "due_date": "2026-06-11",
  "currency": "USD",
  "subtotal": 112.00,
  "tax": 9.80,
  "total": 121.80,
  "totals_inconsistent": false,
  "line_items": [
    { "description": "Widget A", "quantity": 2, "unit_price": 50.00, "amount": 100.00 },
    { "description": "Shipping", "quantity": 1, "unit_price": 12.00, "amount": 12.00 }
  ]
}
</result>

Notes & tuning tips

  • Date conversion is the most common failure — restate the format requirement.
  • Add a server-side sanity check: subtotal + tax ≈ total (within $0.05) catches OCR errors.
  • For multi-page invoices, send each page as a separate and ask for a combined extraction.

What this example uses

Tags: <instructions> <format>

Patterns: structured output

Cite this page
Invoice / receipt line-item extractor. claudexml.com. https://claudexml.com/examples/invoice-extractor/