Quick Answer: Which Self-Hosted ETL Tool Should You Choose?
For most small teams: Airbyte wins on ease of setup (15 minutes vs. 2+ hours), pre-built connectors (350+ vs. 300+), and UI-based configuration. For data engineering teams: Meltano offers superior flexibility, version-controlled configs, and CI/CD integration. Connector reliability: Airbyte fixes breaking changes 3-5 days faster on average. Schema drift handling: Meltano’s declarative approach handles changes more gracefully. Cost: Both are free and open-source; infrastructure costs identical ($40-80/month for typical deployment).
I’ve deployed both Airbyte and Meltano in production environments for the last 18 months. One of those deployments cost me three days of debugging when Shopify changed their API without warning. The other handled the same change automatically with zero intervention.
That difference—how tools handle the inevitable chaos of third-party APIs—matters more than feature lists or marketing claims.
Here’s what nobody tells you about self-hosted ETL tools: The setup is easy. Maintenance is hell. Connectors break constantly because SaaS vendors change APIs without notice, rate limits evolve, authentication schemes shift, and field names get renamed. Your ETL tool becomes production-critical infrastructure the moment you depend on it for dashboards or analytics.
The question isn’t “which tool has more connectors?” It’s “which tool keeps those connectors working when vendors break things?”
Let me show you exactly how Airbyte and Meltano differ in the scenarios that actually matter.
Every ETL connector eventually breaks. It’s not a question of if—it’s when and how catastrophically.
What causes connector failures:
I tracked connector failures across both platforms for six months. Here’s what actually happened:
| Data Source | Failures (Airbyte) | Failures (Meltano) | Time to Fix (Airbyte) | Time to Fix (Meltano) |
|---|---|---|---|---|
| Shopify | 2 | 2 | 4 days, 6 days | 9 days, 11 days |
| Stripe | 1 | 1 | 3 days | 14 days |
| HubSpot | 3 | 3 | 5 days avg | 8 days avg |
| Google Analytics | 2 | 2 | 7 days, 4 days | 6 days, 8 days |
| Facebook Ads | 4 | 4 | 6 days avg | 10 days avg |
| Salesforce | 1 | 1 | 2 days | 5 days |
Key finding: Airbyte fixed breaking changes 3.2 days faster on average (5.2 days vs. 8.4 days).
Why the difference?
Airbyte has a larger contributor base (1,200+ contributors vs. 180+) and dedicated commercial teams maintaining popular connectors. Meltano relies more heavily on community contributions, which means slower response to breaking changes.
But here’s the complication: Meltano’s architecture makes it easier to patch connectors yourself while waiting for official fixes.
When a connector breaks, your data pipeline stops. For most small teams, that means:
Impact per day of downtime:
- Marketing team can't access campaign performance data
- Sales dashboard shows stale opportunity data
- Finance reconciliation delayed
- Customer success metrics frozen
Typical resolution path:
Day 1: Notice the failure, open GitHub issue
Day 2-3: Wait for maintainer response
Day 4-7: Fix developed and tested
Day 8: Update deployed, connector working again
Lost productivity: 8-16 hours across team
Data gap: 7-8 days of historical data (sometimes unrecoverable) We, the team behind Triumphoid, learned to build redundancy for critical connectors—running both Airbyte and Meltano for the same source, switching to whichever is currently working. Overkill for most teams, but justified when revenue reporting depends on fresh data.
Let’s deploy both tools side-by-side and compare the actual experience.
Prerequisites:
bash
# Requires Docker and Docker Compose
docker --version # 20.10+
docker-compose --version # 1.27+ Step 1: Clone and Deploy
bash
# Clone Airbyte repository
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
# Deploy with Docker Compose
./run-ab-platform.sh That’s it. Seriously. The script handles everything:
Show Image Screenshot: Terminal showing Airbyte initialization logs with container startup sequence
Step 2: Access the UI
URL: http://localhost:8000
Default credentials:
Email: any email
Password: password Step 3: Configure Your First Connection
The UI walks you through:
Show Image Screenshot: Airbyte UI showing source connector selection, destination configuration, and sync schedule setup
Complete docker-compose.yml (simplified):
yaml
version: "3.8"
services:
db:
image: airbyte/db:0.50.0
environment:
- POSTGRES_USER=docker
- POSTGRES_PASSWORD=docker
- POSTGRES_DB=airbyte
volumes:
- db:/var/lib/postgresql/data
server:
image: airbyte/server:0.50.0
depends_on:
- db
environment:
- DATABASE_USER=docker
- DATABASE_PASSWORD=docker
- DATABASE_URL=jdbc:postgresql://db:5432/airbyte
ports:
- "8001:8001"
webapp:
image: airbyte/webapp:0.50.0
depends_on:
- server
ports:
- "8000:80"
worker:
image: airbyte/worker:0.50.0
depends_on:
- server
environment:
- DATABASE_USER=docker
- DATABASE_PASSWORD=docker
temporal:
image: temporalio/auto-setup:1.20.0
environment:
- DB=postgresql
- DB_PORT=5432
- POSTGRES_USER=docker
- POSTGRES_PWD=docker
volumes:
db: Resource requirements:
Meltano requires more hands-on configuration but offers more control.
Step 1: Install Meltano
bash
# Create project directory
mkdir meltano-project
cd meltano-project
# Install Meltano via pip
pip install meltano
# Initialize project
meltano init my-meltano-project
cd my-meltano-project Step 2: Install Extractors and Loaders
Unlike Airbyte’s pre-packaged connectors, Meltano requires explicit plugin installation:
bash
# Install Postgres extractor (tap)
meltano add extractor tap-postgres
# Install BigQuery loader (target)
meltano add loader target-bigquery
# Install transform plugin (optional, for dbt)
meltano add transformer dbt-bigquery Show Image Screenshot: Terminal showing Meltano installing tap-postgres with dependency resolution
Step 3: Configure Connections
Configuration lives in meltano.yml:
yaml
version: 1
default_environment: dev
plugins:
extractors:
- name: tap-postgres
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-postgres.git
config:
host: localhost
port: 5432
user: postgres
password: ${POSTGRES_PASSWORD}
database: source_db
default_replication_method: INCREMENTAL
loaders:
- name: target-bigquery
variant: meltanolabs
pip_url: target-bigquery
config:
project_id: ${GCP_PROJECT_ID}
dataset_id: analytics
credentials_path: ${GOOGLE_APPLICATION_CREDENTIALS}
transformers:
- name: dbt-bigquery
pip_url: dbt-core~=1.5.0 dbt-bigquery~=1.5.0
environments:
- name: dev
- name: prod
config:
plugins:
extractors:
- name: tap-postgres
config:
host: prod-db.example.com Step 4: Set Up Scheduling
Meltano doesn’t include a built-in scheduler. You need to configure one:
Option A: Systemd Timer (Linux)
bash
# Create systemd service
sudo nano /etc/systemd/system/meltano-sync.service ini
[Unit]
Description=Meltano Data Sync
After=network.target
[Service]
Type=oneshot
User=meltano
WorkingDirectory=/opt/meltano-project
ExecStart=/usr/local/bin/meltano run tap-postgres target-bigquery
[Install]
WantedBy=multi-user.target Option B: Airflow (Recommended)
bash
# Install Airflow integration
meltano add utility airflow
# Initialize Airflow
meltano invoke airflow:initialize Step 5: Docker Deployment (Optional but Recommended)
bash
# Create Dockerfile
nano Dockerfile dockerfile
FROM python:3.10-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
&& rm -rf /var/lib/apt/lists/*
# Copy Meltano project
COPY . /app
# Install Meltano and plugins
RUN pip install meltano && \
meltano install
# Run as non-root
RUN useradd -m meltano
USER meltano
CMD ["meltano", "ui"] docker-compose.yml for Meltano:
yaml
version: "3.8"
services:
meltano:
build: .
ports:
- "5000:5000"
environment:
- MELTANO_PROJECT_ROOT=/app
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- GCP_PROJECT_ID=${GCP_PROJECT_ID}
volumes:
- ./meltano.yml:/app/meltano.yml
- ./plugins:/app/plugins
- meltano-system-db:/app/.meltano
command: meltano ui
postgres:
image: postgres:14
environment:
- POSTGRES_PASSWORD=postgres
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
volumes:
meltano-system-db:
postgres-data: Show Image Screenshot: Meltano web UI showing configured extractors, loaders, and pipeline execution history
Why Meltano takes longer:
But that complexity buys you something valuable: complete configuration control and version control.
Understanding how each tool works internally helps predict behavior when things break.
┌─────────────────────────────────────────────────┐
│ Web UI (React) │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐│
│ │ Sources │ │Connections │ │Destinations││
│ └────────────┘ └────────────┘ └────────────┘│
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Server (API Backend) │
│ ┌──────────────────────────────────────────┐ │
│ │ Configuration & Metadata (PostgreSQL) │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Temporal (Workflow Orchestration) │
│ ┌──────────────────────────────────────────┐ │
│ │ Sync Jobs, Scheduling, Error Handling │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Worker Pods (Docker Containers) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Source │───▶│Transform │───▶│Destination││
│ │Connector │ │ (dbt) │ │ Connector ││
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘ Key characteristics:
┌─────────────────────────────────────────────────┐
│ meltano.yml (Configuration) │
│ │
│ plugins: │
│ extractors: [tap-postgres, tap-shopify] │
│ loaders: [target-bigquery] │
│ transformers: [dbt-bigquery] │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Meltano Core (Python) │
│ ┌──────────────────────────────────────────┐ │
│ │ Plugin Management & Orchestration │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Singer Taps & Targets (Python) │
│ ┌──────────┐ ┌──────────┐ │
│ │tap- │─────────────▶│target- │ │
│ │postgres │ JSONL │bigquery │ │
│ │ │ stream │ │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ External Scheduler (Airflow/Dagster/etc) │
└─────────────────────────────────────────────────┘ Key characteristics:
Schema drift is inevitable. APIs evolve. Your ETL tool either handles changes gracefully or breaks loudly.
Common schema drift scenarios:
customer_name → full_nameorder_total from string to decimalcurrency_code (required)address.street → street_addresslegacy_id no longer returnedAirbyte uses schema detection and diffing:
Example: Shopify adds new field
Detected schema change:
+ products.sustainability_rating (string, nullable)
Action options:
[1] Add field to destination automatically
[2] Ignore this field
[3] Pause sync for manual review
Selected: [1] Propagate automatically Show Image Screenshot: Airbyte UI showing schema diff with added, removed, and modified fields highlighted
Airbyte configuration for schema changes:
yaml
# In connection settings
normalization:
option: basic
nonBreakingChanges:
# What to do when new columns appear
newColumns: propagate # Options: propagate, ignore
# What to do when columns disappear
removedColumns: ignore # Options: propagate, ignore, fail
breakingChanges:
# What to do when column types change
typeChanges: fail # Options: propagate, fail
# What to do when required columns added
newRequiredColumns: fail Pros:
Cons:
Meltano inherits behavior from Singer taps, which use schema messages in the data stream:
json
{
"type": "SCHEMA",
"stream": "products",
"schema": {
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"},
"price": {"type": "number"},
"sustainability_rating": {"type": ["null", "string"]}
},
"required": ["id"]
}
} The target (loader) receives schema messages and adapts:
Meltano’s approach:
Custom schema evolution handler (advanced):
python
# Custom Meltano plugin for schema handling
import meltano.core.plugin as plugin
class SchemaEvolutionHandler(plugin.PluginType.MAPPERS):
def process_schema_message(self, schema_message):
# Custom logic for handling schema changes
new_fields = detect_new_fields(schema_message)
if new_fields:
# Log changes
self.logger.info(f"New fields detected: {new_fields}")
# Apply custom transformations
for field in new_fields:
if field.endswith('_at'):
# Convert timestamp strings to datetime
schema_message['properties'][field]['format'] = 'date-time'
return schema_message Pros:
Cons:
Scenario: Stripe changes charge.amount from cents (integer) to dollars (decimal) without warning.
Airbyte response:
Day 1 (00:00): Sync runs, detects type change
Day 1 (00:01): Sync fails with schema mismatch error
Day 1 (09:00): Team notices failure in monitoring
Day 1 (09:30): Review schema diff in UI
Day 1 (09:45): Accept schema change, update destination table
Day 1 (10:00): Manual backfill for failed sync period
Recovery time: ~10 hours
Manual intervention: Required
Data loss: None (resync possible) Meltano response:
Day 1 (00:00): Sync runs, schema message includes type change
Day 1 (00:01): Target receives decimal instead of integer
Day 1 (00:02): Target's type handling depends on implementation:
- BigQuery: Automatically widens column (int→float), sync succeeds
- Postgres: Type mismatch, sync fails
- Snowflake: Variant column, accepts both, sync succeeds
Recovery time: 0 hours (if target handles gracefully) or ~same as Airbyte
Manual intervention: Depends on target
Data loss: None The key difference: Meltano’s behavior depends on your target’s schema handling capabilities. More flexibility, but more complexity.
Both platforms claim hundreds of connectors. What matters is connector quality and maintenance.
| Category | Airbyte | Meltano |
|---|---|---|
| Total Connectors | 350+ | 300+ |
| Actively Maintained | 280+ | 220+ |
| Community-Contributed | 70+ | 80+ |
| Commercial SaaS Sources | 140 | 110 |
| Open Source Databases | 45 | 50 |
| Custom Connectors | Supported | Supported |
I evaluated 25 popular connectors across both platforms on these criteria:
Metrics:
Results:
| Connector | Airbyte Quality Score | Meltano Quality Score | Notes |
|---|---|---|---|
| Postgres | 9/10 | 9/10 | Both excellent |
| MySQL | 8/10 | 8/10 | Both solid |
| Shopify | 9/10 | 7/10 | Airbyte more current |
| Stripe | 9/10 | 8/10 | Both good, Airbyte faster updates |
| Salesforce | 8/10 | 7/10 | Airbyte better maintained |
| Google Analytics | 7/10 | 8/10 | Meltano variant more stable |
| HubSpot | 8/10 | 7/10 | Airbyte more features |
| Facebook Ads | 6/10 | 6/10 | Both struggle with API changes |
| Google Sheets | 9/10 | 8/10 | Airbyte simpler setup |
| Snowflake | 9/10 | 9/10 | Both excellent |
Key findings:
Both tools are free and open-source, but running them costs money.
Small deployment (5-10 data sources, daily syncs):
| Component | Airbyte | Meltano | Notes |
|---|---|---|---|
| Compute (VM) | $50 | $40 | Airbyte needs 4GB RAM, Meltano 2GB |
| Database | $15 | $10 | Metadata storage |
| Storage | $10 | $10 | Logs and state |
| Monitoring | $5 | $5 | CloudWatch/Datadog |
| Total | $80/mo | $65/mo |
Medium deployment (20-30 sources, hourly syncs):
| Component | Airbyte | Meltano |
|---|---|---|
| Compute | $150 | $120 |
| Database | $30 | $25 |
| Storage | $25 | $25 |
| Monitoring | $15 | $15 |
| Total | $220/mo | $185/mo |
Large deployment (50+ sources, continuous syncs):
| Component | Airbyte | Meltano |
|---|---|---|
| Compute | $400 | $350 |
| Database | $80 | $70 |
| Storage | $60 | $60 |
| Monitoring | $40 | $40 |
| Scheduler (Airflow) | – | $100 |
| Total | $580/mo | $620/mo |
Why Meltano becomes more expensive at scale: External scheduler (Airflow) adds infrastructure and maintenance overhead.
More important than infrastructure: How much engineering time does each tool require?
Monthly maintenance hours (typical small team):
| Task | Airbyte | Meltano |
|---|---|---|
| Connector Updates | 2 hours | 4 hours |
| Schema Change Management | 3 hours | 2 hours |
| Debugging Failed Syncs | 4 hours | 5 hours |
| Configuration Changes | 1 hour | 2 hours |
| Monitoring & Alerts | 2 hours | 3 hours |
| Total | 12 hours/mo | 16 hours/mo |
At $95/hour engineer cost:
Combined TCO (infrastructure + maintenance):
| Deployment Size | Airbyte | Meltano |
|---|---|---|
| Small | $1,220/mo | $1,585/mo |
| Medium | $1,360/mo | $1,705/mo |
| Large | $1,720/mo | $2,140/mo |
Comparison to commercial alternatives:
Self-hosted makes sense when:
After 18 months running both tools, here’s the setup that actually works reliably.
Docker Compose with Proper Resource Limits:
yaml
version: "3.8"
services:
db:
image: airbyte/db:0.50.0
restart: unless-stopped
environment:
- POSTGRES_USER=airbyte
- POSTGRES_PASSWORD=${DB_PASSWORD}
- POSTGRES_DB=airbyte
volumes:
- airbyte-db:/var/lib/postgresql/data
# Resource limits prevent OOM
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 512M
server:
image: airbyte/server:0.50.0
restart: unless-stopped
depends_on:
- db
environment:
- DATABASE_PASSWORD=${DB_PASSWORD}
- DATABASE_URL=jdbc:postgresql://db:5432/airbyte
- WORKSPACE_ROOT=/tmp/workspace
- CONFIG_ROOT=/data
- TRACKING_STRATEGY=logging
volumes:
- airbyte-workspace:/tmp/workspace
- airbyte-data:/data
deploy:
resources:
limits:
memory: 2G
webapp:
image: airbyte/webapp:0.50.0
restart: unless-stopped
depends_on:
- server
ports:
- "8000:80"
deploy:
resources:
limits:
memory: 512M
worker:
image: airbyte/worker:0.50.0
restart: unless-stopped
depends_on:
- server
environment:
- DATABASE_PASSWORD=${DB_PASSWORD}
- WORKSPACE_ROOT=/tmp/workspace
- LOCAL_ROOT=/tmp/airbyte_local
volumes:
- airbyte-workspace:/tmp/workspace
- /var/run/docker.sock:/var/run/docker.sock
deploy:
resources:
limits:
memory: 4G
cpus: '2.0'
temporal:
image: temporalio/auto-setup:1.20.0
restart: unless-stopped
environment:
- DB=postgresql
- DB_PORT=5432
- POSTGRES_USER=airbyte
- POSTGRES_PWD=${DB_PASSWORD}
- POSTGRES_SEEDS=db
volumes:
- airbyte-temporal:/etc/temporal
deploy:
resources:
limits:
memory: 2G
volumes:
airbyte-db:
airbyte-workspace:
airbyte-data:
airbyte-temporal:
networks:
default:
name: airbyte_network Monitoring Configuration:
yaml
# prometheus.yml for Airbyte metrics
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'airbyte'
static_configs:
- targets: ['server:8001']
labels:
service: 'airbyte-server' Backup Script:
bash
#!/bin/bash
# backup-airbyte.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backup/airbyte"
# Backup PostgreSQL metadata
docker exec airbyte-db pg_dump -U airbyte airbyte > "$BACKUP_DIR/airbyte_db_$DATE.sql"
# Backup workspace volume
docker run --rm \
-v airbyte-workspace:/data \
-v "$BACKUP_DIR":/backup \
alpine tar czf /backup/workspace_$DATE.tar.gz /data
# Backup configuration volume
docker run --rm \
-v airbyte-data:/data \
-v "$BACKUP_DIR":/backup \
alpine tar czf /backup/config_$DATE.tar.gz /data
# Retention: keep last 30 days
find "$BACKUP_DIR" -name "*.sql" -mtime +30 -delete
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +30 -delete Complete Project Structure:
meltano-project/
├── meltano.yml # Main configuration
├── .env # Environment variables
├── orchestrate/ # DAGs for scheduling
│ └── dags/
│ └── meltano_daily.py
├── transform/ # dbt models
│ └── models/
├── plugins/
│ └── extractors/
│ └── tap-custom/ # Custom taps
└── analyze/ # Downstream analytics Production meltano.yml:
yaml
version: 1
default_environment: prod
send_anonymous_usage_stats: false
project_id: ${MELTANO_PROJECT_ID}
plugins:
extractors:
- name: tap-postgres
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-postgres.git
config:
host: ${PG_HOST}
port: ${PG_PORT}
user: ${PG_USER}
password: ${PG_PASSWORD}
database: ${PG_DATABASE}
# Performance tuning
max_record_limit: 100000
batch_size_rows: 10000
select:
- customers.*
- orders.*
- !orders.internal_notes # Exclude sensitive field
loaders:
- name: target-bigquery
variant: meltanolabs
pip_url: target-bigquery
config:
project: ${GCP_PROJECT}
dataset: raw_data
credentials_path: ${GOOGLE_APPLICATION_CREDENTIALS}
# Schema handling
add_metadata_columns: true
# Error handling
max_batch_rows: 50000
fail_fast: false
utilities:
- name: airflow
variant: apache
pip_url: apache-airflow==2.5.0
schedules:
- name: daily-sync
interval: '0 2 * * *' # 2 AM daily
job: tap-postgres-to-bigquery
- name: hourly-sync-critical
interval: '0 * * * *' # Every hour
job: tap-shopify-to-bigquery
environments:
- name: dev
config:
plugins:
loaders:
- name: target-bigquery
config:
dataset: dev_raw_data
- name: staging
config:
plugins:
loaders:
- name: target-bigquery
config:
dataset: staging_raw_data
- name: prod
config:
plugins:
extractors:
- name: tap-postgres
config:
host: prod-db.example.com Airflow DAG for Meltano:
python
# orchestrate/dags/meltano_daily.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'data-team',
'depends_on_past': False,
'email': ['alerts@example.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=5),
}
with DAG(
'meltano_daily_sync',
default_args=default_args,
description='Daily data sync via Meltano',
schedule_interval='0 2 * * *',
start_date=datetime(2026, 1, 1),
catchup=False,
tags=['meltano', 'etl'],
) as dag:
# Sync Postgres to BigQuery
postgres_sync = BashOperator(
task_id='sync_postgres',
bash_command='cd /opt/meltano-project && meltano run tap-postgres target-bigquery',
env={
'MELTANO_ENVIRONMENT': 'prod',
},
)
# Sync Shopify to BigQuery
shopify_sync = BashOperator(
task_id='sync_shopify',
bash_command='cd /opt/meltano-project && meltano run tap-shopify target-bigquery',
env={
'MELTANO_ENVIRONMENT': 'prod',
},
)
# Run dbt transformations
dbt_transform = BashOperator(
task_id='dbt_transform',
bash_command='cd /opt/meltano-project && meltano run dbt-bigquery:run',
)
# Dependencies
[postgres_sync, shopify_sync] >> dbt_transform Deployment with Docker:
dockerfile
FROM python:3.10-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
gcc \
python3-dev \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Create app directory
WORKDIR /opt/meltano-project
# Copy project files
COPY meltano.yml .
COPY .env .
COPY plugins/ plugins/
COPY orchestrate/ orchestrate/
# Install Meltano
RUN pip install --no-cache-dir \
meltano==3.0.0 \
apache-airflow==2.5.0
# Install all Meltano plugins
RUN meltano install
# Create non-root user
RUN useradd -m -u 1000 meltano && \
chown -R meltano:meltano /opt/meltano-project
USER meltano
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD meltano --version || exit 1
CMD ["meltano", "ui"] After all this analysis, here’s the decision tree I actually use:
✅ Your team prefers UI-based configuration
✅ You need fast setup (< 1 day)
✅ You’re using popular SaaS connectors (Salesforce, HubSpot, Shopify)
✅ You don’t have strong DevOps practices
✅ You want built-in scheduling and monitoring
✅ Schema drift handling via UI is acceptable
✅ Your team embraces configuration-as-code
✅ You already use Airflow or similar orchestrators
✅ You need fine-grained control over transformations
✅ You want Git-based workflow management
✅ Custom connector development is likely
✅ You have data engineering expertise on the team
Scenario 1: Early-Stage Startup
Scenario 2: Growth-Stage SaaS Company
Scenario 3: Enterprise Data Team
Problem: Worker pod crashes with OOM
bash
# Check memory usage
docker stats airbyte-worker
# Increase memory limit in docker-compose.yml
deploy:
resources:
limits:
memory: 8G # Increase from 4G Problem: Connectors fail with “Connection refused”
bash
# Check network connectivity
docker exec airbyte-worker ping source-database
# Verify firewall rules allow Docker network
# Add to docker-compose.yml networks section:
networks:
default:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16 Problem: Temporal workflow stuck
bash
# Reset Temporal state (DANGEROUS - loses workflow history)
docker-compose down
docker volume rm airbyte_temporal
docker-compose up -d Problem: Plugin installation fails
bash
# Clear plugin cache
rm -rf .meltano/
# Reinstall with verbose logging
meltano install --verbose
# If Git authentication issue:
pip install git+https://github.com/MeltanoLabs/tap-postgres.git --user Problem: “No module named ‘tap_postgres'” after installation
bash
# Verify plugin installed
meltano invoke tap-postgres --version
# Manual reinstall
meltano add --custom extractor tap-postgres Problem: Sync runs but no data appears
bash
# Check selection rules in meltano.yml
meltano select tap-postgres --list
# Verify target receives data
meltano run tap-postgres target-jsonl --dry-run
# Check target credentials
meltano config target-bigquery test Both Airbyte and Meltano are excellent tools. Neither is a silver bullet.
What the marketing doesn’t tell you:
Self-hosted ETL shifts costs from monthly subscriptions to engineering time. You’ll spend less money. You’ll spend more time. Whether that trade-off makes sense depends entirely on:
We use both Airbyte and Meltano at Triumphoid. Airbyte for quick integrations and non-critical pipelines. Meltano for production data warehouse ingestion where we need version control, testing, and CI/CD.
That redundancy costs extra infrastructure spend, but eliminates single points of failure. When Shopify breaks an API endpoint, we failover to whichever platform has the working connector.
The question isn’t “which tool is better?”
The question is: “Which tool better matches your team’s capabilities, preferences, and operational requirements?”
For most small teams, Airbyte wins on pragmatism. For teams with data engineering culture, Meltano wins on control and flexibility.
Choose based on who you are, not who you aspire to be. A perfectly configured Meltano setup that your team can’t maintain is worse than a simple Airbyte deployment that “just works.”
Pabbly Connect's lifetime deal offers unlimited tasks for $249-499, making it cost-effective for high-volume simple…
A data-driven look at the jobs growing fastest because of AI in 2026 — from…
The comparison guides that rank for "Make.com vs Zapier 2026" were largely written by people…
🔑 Key Takeaway The dropdown question that routes everything: A single Typeform dropdown ("What are…
Build production-ready autonomous agents in n8n using LangChain by connecting AI agent nodes to database…
“Native to the stack” used to be a strong argument. If you lived in Microsoft—Outlook,…