n8n Template · PDF Processing · GPT-4o
Free PDF Invoice Parser → Structured JSON
A webhook-triggered n8n workflow that sends a PDF invoice to GPT-4o and returns clean, validated JSON. Accepts either a PDF URL or base64-encoded file. Takes about 4–8 seconds per invoice.
What this workflow does
POST a PDF invoice (as a URL or base64 string) to the webhook endpoint. The workflow fetches and encodes the file if needed, sends it to GPT-4o with a structured extraction prompt, parses and validates the response, then returns a JSON object with every invoice field populated.
The output includes vendor details, bill-to fields, line items with quantities and unit prices, tax, discount, totals, bank details, and payment terms. A _meta object is appended to every response with timestamp and model version for auditing.
If extraction fails — empty invoice_number is the canary — the workflow returns a 422 with a plain-English error rather than silently passing bad data downstream.
Workflow nodes
| # | Node | What it does |
|---|---|---|
| 1 | Webhook | Receives POST request with pdf_url or pdf_base64 in body |
| 2 | Has PDF URL? (IF) | Routes to fetch flow if URL provided, skips to Prepare if base64 |
| 3 | Fetch PDF from URL | HTTP GET to download PDF as binary |
| 4 | Encode PDF to Base64 | Converts binary to base64 string for OpenAI API |
| 5 | Prepare PDF Data (Code) | Merges URL and base64 paths, throws if neither present |
| 6 | GPT-4o Extract Invoice Data | Sends PDF + extraction prompt to OpenAI, requests raw JSON output |
| 7 | Parse & Validate JSON (Code) | Strips markdown fences, parses JSON, appends _meta, throws on parse failure |
| 8 | Extraction Successful? (IF) | Checks invoice_number is non-empty |
| 9 | Return Success | 200 response with full JSON payload |
| 10 | Return Error | 422 response with error message and raw GPT output for debugging |
How to install
- Download the
.jsonfile above. - In n8n, go to Workflows → Import from File and select the downloaded file.
- Open the GPT-4o Extract Invoice Data node and connect your OpenAI credential (HTTP Header Auth with your
Bearer sk-...key). Or swap to n8n’s native OpenAI node if you prefer. - Click Activate on the workflow to get your live webhook URL.
- Test with a POST request — see the example below.
Example request
# Using a PDF URL
curl -X POST https://your-n8n-instance.com/webhook/parse-invoice \
-H "Content-Type: application/json" \
-d '{"pdf_url": "https://example.com/invoice-2024-001.pdf"}'
# Using base64-encoded PDF
curl -X POST https://your-n8n-instance.com/webhook/parse-invoice \
-H "Content-Type: application/json" \
-d '{"pdf_base64": "JVBERi0xLjQK..."}'
Example output
{
"invoice_number": "INV-2024-0891",
"invoice_date": "2024-03-15",
"due_date": "2024-04-14",
"vendor": {
"name": "Acme Software GmbH",
"address": "Berliner Str. 42, 10115 Berlin, Germany",
"email": "billing@acmesoftware.de",
"phone": "+49 30 1234567",
"vat_number": "DE123456789"
},
"bill_to": {
"name": "Otzar Sheli s.r.o.",
"address": "Prague, Czech Republic",
"email": "accounts@example.com"
},
"line_items": [
{
"description": "Annual SaaS License - Pro Plan",
"quantity": 1,
"unit_price": 1200.00,
"total": 1200.00
},
{
"description": "Onboarding & Setup Fee",
"quantity": 1,
"unit_price": 350.00,
"total": 350.00
}
],
"subtotal": 1550.00,
"tax_rate": 21,
"tax_amount": 325.50,
"discount": 0,
"total_amount": 1875.50,
"currency": "EUR",
"payment_terms": "Net 30",
"bank_details": {
"bank_name": "Deutsche Bank",
"account_number": "1234567890",
"iban": "DE89 3704 0044 0532 0130 00",
"swift": "DEUTDEDB"
},
"notes": "Please reference invoice number in payment.",
"_meta": {
"processed_at": "2026-04-09T10:22:31.000Z",
"model": "gpt-4o",
"source": "triumphoid-invoice-parser-v1"
}
}
Known limitations
- Scanned PDFs with poor resolution will produce incomplete output. GPT-4o needs legible text — 150 DPI minimum, 300 DPI recommended.
- Multi-page invoices with tables split across pages sometimes lose line items on the second page. Test your specific invoice format before running at volume.
- Cost: GPT-4o charges per token. A typical single-page invoice runs $0.01–0.03. At 84,000 invoices/month (the case from our GPT-4o vs Docparser comparison), Docparser is cheaper. Under ~5,000/month, GPT-4o wins on flexibility.
- OpenAI rate limits will cause failures at high concurrency. Add retry logic or see our exponential backoff guide for the pattern to implement.
Extend this workflow
Common extensions teams add after the Return Success node:
- Write to Google Sheets or Airtable for a simple invoice ledger
- POST to your accounting API (Xero, QuickBooks, FreeAgent)
- Push
total_amountanddue_dateto a Notion database for AP tracking - Slack notification when
total_amountexceeds a threshold - Route by
currencyfield to different downstream systems
Related
The full writeup behind this template — including the cost comparison between GPT-4o and Docparser at different invoice volumes — is in Parsing PDF Invoices to JSON: The GPT-4o vs. Docparser Cost Showdown.
If you’re running this at scale and hitting 429 errors, read Bypassing “429 Too Many Requests”: Implementing Exponential Backoff. The pattern applies directly to n8n HTTP Request nodes too.