Architecture Overview¶
Understanding how Dango's components work together.
System Architecture¶
Dango integrates four production-grade open-source tools into a unified platform:
┌─────────────────────────────────────────────────────────────┐
│ External Data Sources │
│ APIs (Stripe, HubSpot, GA4) • CSV Files • Databases │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ dlt (Data Load Tool) │
│ • 33 sources • OAuth • Incremental • Deduplication │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ DuckDB (Database) │
│ raw_{source}.* → staging.* → intermediate.* → marts.*│
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ dbt (Transformations) │
│ • Auto-generated staging • Custom SQL • Snapshots │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Visualization & Management Layer │
│ Metabase (BI) • Web UI • Notebooks (Marimo) • dbt-docs │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Platform Services │
│ Auth • Scheduling • Monitoring • Governance • Cloud Deploy │
└─────────────────────────────────────────────────────────────┘
Component Overview¶
dlt - Data Ingestion¶
Purpose: Load data from external sources into DuckDB
Key Features: - 30+ sources with wizard support (Stripe, HubSpot, Google Analytics, Salesforce, etc.) - Access to 60+ dlt sources via dlt_native for advanced users - CSV file ingestion with auto-detection - Custom REST API sources - OAuth 2.0 authentication handling - Incremental loading (only new/changed data) - Automatic schema evolution (for dlt sources)
What it does:
dango sync
# → dlt connects to Stripe API
# → Fetches charges, customers, subscriptions
# → Writes to raw_stripe.* tables in DuckDB
# → Auto-generates staging models in dbt/models/staging/
# → Tracks metadata (_dlt_load_id, _dlt_extracted_at)
Learn more: Data Sources section
DuckDB - Analytics Database¶
Purpose: Store and query data with SQL
Why DuckDB?: - Embedded (no server to manage) - OLAP-optimized (fast analytics queries) - Efficient storage (columnar format) - Full SQL support (window functions, CTEs, etc.) - Handles datasets up to ~100GB comfortably on a modern laptop
For larger datasets
DuckDB is optimized for single-machine analytical workloads. For enterprise-scale needs (petabytes), consider cloud data warehouses like Snowflake or BigQuery.
Database file: data/warehouse.duckdb
Schema organization:
warehouse.duckdb
├── raw_{source_name} # Raw ingested data (e.g., raw_stripe, raw_hubspot)
├── staging # Cleaned data (tables, not views)
├── intermediate # Reusable business logic
└── marts # Final metrics for BI
Each data source gets its own raw_{source_name} schema (e.g., raw_stripe, raw_csv_orders).
Learn more: Data Layers
dbt - SQL Transformations¶
Purpose: Transform raw data into analytics-ready tables
What Dango automates: - Auto-generates staging models during dango sync - Creates data lineage documentation - Generates dbt-docs website (documentation and lineage visualization)
What you control: - Custom transformations in dbt/models/ - Business logic in SQL - Data tests and documentation
Example workflow:
dango sync # Load raw data + auto-generate staging models
# Edit dbt/models/marts/customer_metrics.sql
dango run # Run dbt transformations
dango docs # Generate documentation
Learn more: Transformations section
Metabase - Business Intelligence¶
Purpose: Visualize data and create dashboards
Auto-configured on dango start: - DuckDB connection established - Admin account auto-provisioned (local development only) - All tables and schemas discovered - Sample collections set up
Access: Run dango start first, then access via: - Web UI at http://localhost:8800 (recommended - includes navigation to all services) - Direct Metabase at http://localhost:3000
What you can do: - Write SQL queries with autocomplete - Create visualizations (charts, tables, maps) - Build dashboards - Set up alerts and notifications - Schedule email reports - Share with stakeholders
Learn more: Dashboards section
Web UI - Monitoring & Management¶
Purpose: Monitor pipelines and manage sources
Built with: FastAPI (Python backend)
Access: http://localhost:8800
Features: - Real-time sync status (WebSocket updates) - Source management (add, remove, configure) - CSV file uploads - Validation reports - Activity logs - Links to Metabase and dbt-docs
Learn more: Web UI section
Data Flow Example¶
Let's walk through what happens when you add a Stripe source:
1. Configuration¶
Option A: Interactive wizard (recommended)
Option B: Manual configuration in .dango/sources.yml:
sources:
- name: stripe_payments
type: stripe
enabled: true
stripe:
stripe_secret_key_env: STRIPE_API_KEY
start_date: 2024-01-01
You can also add sources via the Web UI at http://localhost:8800.
2. Ingestion + Staging Generation¶
When you run dango sync:
Dango executes: 1. Reads Stripe API credentials from environment 2. Authenticates with Stripe via dlt 3. Fetches data (charges, customers, subscriptions) 4. Writes to DuckDB: - raw_stripe.charges - raw_stripe.customers - raw_stripe.subscriptions 5. Tracks load metadata (_dlt_load_id, _dlt_extracted_at) 6. Auto-generates staging models
3. Generated Staging Models¶
After sync, Dango creates:
dbt/models/staging/
├── stg_stripe_charges.sql
├── stg_stripe_customers.sql
├── stg_stripe_subscriptions.sql
├── _stg_stripe__sources.yml
└── _stg_stripe__schema.yml
4. Transformations (dbt)¶
Create custom business logic in dbt/models/marts/:
-- dbt/models/marts/customer_metrics.sql
{{ config(materialized='table') }}
WITH charges AS (
SELECT
customer_id,
SUM(amount) as total_spent,
COUNT(*) as order_count
FROM {{ ref('stg_stripe_charges') }}
GROUP BY customer_id
),
customers AS (
SELECT
id,
email,
created
FROM {{ ref('stg_stripe_customers') }}
)
SELECT
c.id,
c.email,
COALESCE(ch.total_spent, 0) as lifetime_value,
COALESCE(ch.order_count, 0) as total_orders
FROM customers c
LEFT JOIN charges ch ON c.id = ch.customer_id
Run transformations:
5. Visualization (Metabase)¶
Start the platform if not already running:
Open the Web UI (http://localhost:8800) and navigate to Metabase, or go directly to http://localhost:3000. Query your marts tables:
Create charts, dashboards, and share with your team.
Tech Stack Details¶
Core Tools¶
| Tool | Version | Purpose | Language |
|---|---|---|---|
| dlt | 1.24.x | Data ingestion | Python |
| dbt | 1.10.x | SQL transformations | Python + SQL |
| DuckDB | 1.5.x | Analytics database | C++ |
| Metabase | 0.59.x | Business intelligence | Java/Clojure |
Dango Components¶
| Component | Technology | Purpose |
|---|---|---|
| Web Backend | FastAPI | REST API + WebSockets |
| Service Orchestration | Docker Compose | Metabase + dbt-docs containers |
| File Watcher | watchdog (Python) | Auto-sync on file changes |
| CLI | Click (Python) | Command-line interface |
| Config Management | YAML + TOML | Sources, project settings |
Service Management¶
When you run dango start, the following services are launched:
Local Services¶
- FastAPI Web UI - Port 8800
- File Watcher - Background process (if
auto_sync: true)
Docker Containers¶
- Metabase - Port 3000
- dbt-docs - Port 8081
Check service status:
Output:
Project: my-analytics (Port: 8800)
Status: ● Running
Services:
FastAPI Web UI ● Running (http://localhost:8800)
File Watcher ● Running (auto-sync enabled)
Metabase ● Running (http://localhost:3000)
dbt-docs ● Running (http://localhost:8081)
Database: data/warehouse.duckdb (42.3 MB)
Stop all services:
Local vs. Cloud Architecture¶
Local¶
┌─────────────────────────────────────┐
│ Your Laptop │
│ ┌──────────────────────────────┐ │
│ │ DuckDB (data/warehouse.db) │ │
│ │ FastAPI (port 8800) │ │
│ │ File Watcher (auto-sync) │ │
│ │ Docker (Metabase, dbt-docs) │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────┘
Cloud¶
┌─────────────────────────────────────┐
│ Cloud Server (DO / BYOS) │
│ ┌──────────────────────────────┐ │
│ │ Caddy (HTTPS, reverse proxy)│ │
│ │ DuckDB (warehouse.duckdb) │ │
│ │ FastAPI (single worker) │ │
│ │ APScheduler (cron/interval) │ │
│ │ Docker (Metabase :ro mount) │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────┘
The same .dango/ configuration files work in both environments. See Local vs Cloud for detailed behavioral differences.
Module Dependency Hierarchy¶
Dango's codebase is organized into four dependency levels. Higher levels can import from lower levels, but never the reverse.
| Level | Role | Modules |
|---|---|---|
| L0 (base) | No internal imports | config/, utils/, security/, migrations/, templates/ |
| L1 (core) | Imports L0 only | oauth/, ingestion/, transformation/, auth/, governance/, notebooks/, analysis/ |
| L2 (platform) | Imports L0-L1 | platform/ (scheduling, cloud, local), web/, visualization/ |
| L3 (ui) | Imports any level | cli/ |
Import rules:
- Downward only — higher levels import lower levels, never reverse
- Same-level OK — as long as there are no circular dependencies
- Lazy imports — used sparingly for orchestration in the ingestion runner
Key modules added in v1
auth/— Password + 2FA + API key authentication, 3 roles (admin/editor/viewer), 29 permissions, Metabase SSO bridgegovernance/— Schema drift detection (breaking/additive), PII scanning (Presidio + spaCy)analysis/— Custom monitoring metrics, 6 comparison types, drill-down analysisnotebooks/— Marimo notebook management, DuckDB snapshots, file lockingplatform/scheduling/— Cron/interval schedules, webhook notifications, execution historyplatform/cloud/— DigitalOcean + BYOS deployment, SSH management, remote operations
Design Principles¶
1. Minimal Setup¶
Get started without complex infrastructure: - DuckDB is embedded (single file database) - Metabase runs in Docker - Web UI is a lightweight Python process
2. Configuration Over Code¶
Define sources in YAML, not Python:
# This...
sources:
- name: stripe
type: stripe
stripe:
stripe_secret_key_env: STRIPE_API_KEY
# ...instead of this (raw dlt code)
import dlt
pipeline = dlt.pipeline(...)
data = stripe_source(api_key=os.getenv("STRIPE_API_KEY"))
pipeline.run(data)
3. Auto-Generated Boilerplate¶
Dango generates the repetitive dbt staging models automatically during sync:
dango sync
# → Loads data from sources
# → Auto-generates staging models
# → You focus on business logic in marts/
4. Batteries Included¶
Set up a complete analytics stack with one command:
This gives you dlt, dbt, DuckDB, and Metabase—ready to go.
Then build your data pipeline:
dango source add # Add data sources
dango sync # Load data + generate staging
dango start # Launch platform
Next Steps¶
- Data Layers - Learn how data is organized across schemas
- DuckDB & Single-Writer - Understand the single-writer constraint
- Project Structure - Understand the directory layout
- Local vs Cloud - Differences between deployment modes