n8n Template · PDF Processing · GPT-4o
A webhook-triggered n8n workflow that sends a PDF invoice to GPT-4o and returns clean, validated JSON. Accepts either a PDF URL or base64-encoded file. Takes about 4–8 seconds per invoice.
POST a PDF invoice (as a URL or base64 string) to the webhook endpoint. The workflow fetches and encodes the file if needed, sends it to GPT-4o with a structured extraction prompt, parses and validates the response, then returns a JSON object with every invoice field populated.
The output includes vendor details, bill-to fields, line items with quantities and unit prices, tax, discount, totals, bank details, and payment terms. A _meta object is appended to every response with timestamp and model version for auditing.
If extraction fails — empty invoice_number is the canary — the workflow returns a 422 with a plain-English error rather than silently passing bad data downstream.
| # | Node | What it does |
|---|---|---|
| 1 | Webhook | Receives POST request with pdf_url or pdf_base64 in body |
| 2 | Has PDF URL? (IF) | Routes to fetch flow if URL provided, skips to Prepare if base64 |
| 3 | Fetch PDF from URL | HTTP GET to download PDF as binary |
| 4 | Encode PDF to Base64 | Converts binary to base64 string for OpenAI API |
| 5 | Prepare PDF Data (Code) | Merges URL and base64 paths, throws if neither present |
| 6 | GPT-4o Extract Invoice Data | Sends PDF + extraction prompt to OpenAI, requests raw JSON output |
| 7 | Parse & Validate JSON (Code) | Strips markdown fences, parses JSON, appends _meta, throws on parse failure |
| 8 | Extraction Successful? (IF) | Checks invoice_number is non-empty |
| 9 | Return Success | 200 response with full JSON payload |
| 10 | Return Error | 422 response with error message and raw GPT output for debugging |
.json file above.Bearer sk-... key). Or swap to n8n’s native OpenAI node if you prefer.# Using a PDF URL
curl -X POST https://your-n8n-instance.com/webhook/parse-invoice \
-H "Content-Type: application/json" \
-d '{"pdf_url": "https://example.com/invoice-2024-001.pdf"}'
# Using base64-encoded PDF
curl -X POST https://your-n8n-instance.com/webhook/parse-invoice \
-H "Content-Type: application/json" \
-d '{"pdf_base64": "JVBERi0xLjQK..."}' {
"invoice_number": "INV-2024-0891",
"invoice_date": "2024-03-15",
"due_date": "2024-04-14",
"vendor": {
"name": "Acme Software GmbH",
"address": "Berliner Str. 42, 10115 Berlin, Germany",
"email": "billing@acmesoftware.de",
"phone": "+49 30 1234567",
"vat_number": "DE123456789"
},
"bill_to": {
"name": "Otzar Sheli s.r.o.",
"address": "Prague, Czech Republic",
"email": "accounts@example.com"
},
"line_items": [
{
"description": "Annual SaaS License - Pro Plan",
"quantity": 1,
"unit_price": 1200.00,
"total": 1200.00
},
{
"description": "Onboarding & Setup Fee",
"quantity": 1,
"unit_price": 350.00,
"total": 350.00
}
],
"subtotal": 1550.00,
"tax_rate": 21,
"tax_amount": 325.50,
"discount": 0,
"total_amount": 1875.50,
"currency": "EUR",
"payment_terms": "Net 30",
"bank_details": {
"bank_name": "Deutsche Bank",
"account_number": "1234567890",
"iban": "DE89 3704 0044 0532 0130 00",
"swift": "DEUTDEDB"
},
"notes": "Please reference invoice number in payment.",
"_meta": {
"processed_at": "2026-04-09T10:22:31.000Z",
"model": "gpt-4o",
"source": "triumphoid-invoice-parser-v1"
}
} Common extensions teams add after the Return Success node:
total_amount and due_date to a Notion database for AP trackingtotal_amount exceeds a thresholdcurrency field to different downstream systemsThe full writeup behind this template — including the cost comparison between GPT-4o and Docparser at different invoice volumes — is in Parsing PDF Invoices to JSON: The GPT-4o vs. Docparser Cost Showdown.
If you’re running this at scale and hitting 429 errors, read Bypassing “429 Too Many Requests”: Implementing Exponential Backoff. The pattern applies directly to n8n HTTP Request nodes too.