N8N PDF Invoice Parser Template (GPT-4o)

n8n Template · PDF Processing · GPT-4o

Free PDF Invoice Parser → Structured JSON

A webhook-triggered n8n workflow that sends a PDF invoice to GPT-4o and returns clean, validated JSON. Accepts either a PDF URL or base64-encoded file. Takes about 4–8 seconds per invoice.

Tool
n8n (cloud or self-hosted)

Requires
OpenAI API key (GPT-4o access)

Minimum n8n plan
Starter (cloud) or any self-hosted

Nodes
10 nodes · ~4–8s per invoice


What this workflow does

POST a PDF invoice (as a URL or base64 string) to the webhook endpoint. The workflow fetches and encodes the file if needed, sends it to GPT-4o with a structured extraction prompt, parses and validates the response, then returns a JSON object with every invoice field populated.

The output includes vendor details, bill-to fields, line items with quantities and unit prices, tax, discount, totals, bank details, and payment terms. A _meta object is appended to every response with timestamp and model version for auditing.

If extraction fails — empty invoice_number is the canary — the workflow returns a 422 with a plain-English error rather than silently passing bad data downstream.

Workflow nodes

#NodeWhat it does
1WebhookReceives POST request with pdf_url or pdf_base64 in body
2Has PDF URL? (IF)Routes to fetch flow if URL provided, skips to Prepare if base64
3Fetch PDF from URLHTTP GET to download PDF as binary
4Encode PDF to Base64Converts binary to base64 string for OpenAI API
5Prepare PDF Data (Code)Merges URL and base64 paths, throws if neither present
6GPT-4o Extract Invoice DataSends PDF + extraction prompt to OpenAI, requests raw JSON output
7Parse & Validate JSON (Code)Strips markdown fences, parses JSON, appends _meta, throws on parse failure
8Extraction Successful? (IF)Checks invoice_number is non-empty
9Return Success200 response with full JSON payload
10Return Error422 response with error message and raw GPT output for debugging

How to install

  1. Download the .json file above.
  2. In n8n, go to Workflows → Import from File and select the downloaded file.
  3. Open the GPT-4o Extract Invoice Data node and connect your OpenAI credential (HTTP Header Auth with your Bearer sk-... key). Or swap to n8n’s native OpenAI node if you prefer.
  4. Click Activate on the workflow to get your live webhook URL.
  5. Test with a POST request — see the example below.

Example request

# Using a PDF URL
curl -X POST https://your-n8n-instance.com/webhook/parse-invoice \
  -H "Content-Type: application/json" \
  -d '{"pdf_url": "https://example.com/invoice-2024-001.pdf"}'

# Using base64-encoded PDF
curl -X POST https://your-n8n-instance.com/webhook/parse-invoice \
  -H "Content-Type: application/json" \
  -d '{"pdf_base64": "JVBERi0xLjQK..."}'

Example output

{
  "invoice_number": "INV-2024-0891",
  "invoice_date": "2024-03-15",
  "due_date": "2024-04-14",
  "vendor": {
    "name": "Acme Software GmbH",
    "address": "Berliner Str. 42, 10115 Berlin, Germany",
    "email": "billing@acmesoftware.de",
    "phone": "+49 30 1234567",
    "vat_number": "DE123456789"
  },
  "bill_to": {
    "name": "Otzar Sheli s.r.o.",
    "address": "Prague, Czech Republic",
    "email": "accounts@example.com"
  },
  "line_items": [
    {
      "description": "Annual SaaS License - Pro Plan",
      "quantity": 1,
      "unit_price": 1200.00,
      "total": 1200.00
    },
    {
      "description": "Onboarding & Setup Fee",
      "quantity": 1,
      "unit_price": 350.00,
      "total": 350.00
    }
  ],
  "subtotal": 1550.00,
  "tax_rate": 21,
  "tax_amount": 325.50,
  "discount": 0,
  "total_amount": 1875.50,
  "currency": "EUR",
  "payment_terms": "Net 30",
  "bank_details": {
    "bank_name": "Deutsche Bank",
    "account_number": "1234567890",
    "iban": "DE89 3704 0044 0532 0130 00",
    "swift": "DEUTDEDB"
  },
  "notes": "Please reference invoice number in payment.",
  "_meta": {
    "processed_at": "2026-04-09T10:22:31.000Z",
    "model": "gpt-4o",
    "source": "triumphoid-invoice-parser-v1"
  }
}

Known limitations

  • Scanned PDFs with poor resolution will produce incomplete output. GPT-4o needs legible text — 150 DPI minimum, 300 DPI recommended.
  • Multi-page invoices with tables split across pages sometimes lose line items on the second page. Test your specific invoice format before running at volume.
  • Cost: GPT-4o charges per token. A typical single-page invoice runs $0.01–0.03. At 84,000 invoices/month (the case from our GPT-4o vs Docparser comparison), Docparser is cheaper. Under ~5,000/month, GPT-4o wins on flexibility.
  • OpenAI rate limits will cause failures at high concurrency. Add retry logic or see our exponential backoff guide for the pattern to implement.

Extend this workflow

Common extensions teams add after the Return Success node:

  • Write to Google Sheets or Airtable for a simple invoice ledger
  • POST to your accounting API (Xero, QuickBooks, FreeAgent)
  • Push total_amount and due_date to a Notion database for AP tracking
  • Slack notification when total_amount exceeds a threshold
  • Route by currency field to different downstream systems

Related

The full writeup behind this template — including the cost comparison between GPT-4o and Docparser at different invoice volumes — is in Parsing PDF Invoices to JSON: The GPT-4o vs. Docparser Cost Showdown.

If you’re running this at scale and hitting 429 errors, read Bypassing “429 Too Many Requests”: Implementing Exponential Backoff. The pattern applies directly to n8n HTTP Request nodes too.