Architecture Overview¶

Understanding how Dango's components work together.

System Architecture¶

Dango integrates four production-grade open-source tools into a unified platform:

┌─────────────────────────────────────────────────────────────┐
│                    External Data Sources                    │
│  APIs (Stripe, HubSpot, GA4) • CSV Files • Databases       │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                    dlt (Data Load Tool)                     │
│  • 33 sources  • OAuth  • Incremental  • Deduplication     │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                    DuckDB (Database)                        │
│  raw_{source}.*  →  staging.*  →  intermediate.*  →  marts.*│
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   dbt (Transformations)                     │
│  • Auto-generated staging  • Custom SQL  • Snapshots       │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│              Visualization & Management Layer               │
│  Metabase (BI) • Web UI • Notebooks (Marimo) • dbt-docs    │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│              Platform Services                              │
│  Auth • Scheduling • Monitoring • Governance • Cloud Deploy │
└─────────────────────────────────────────────────────────────┘

Component Overview¶

dlt - Data Ingestion¶

Purpose: Load data from external sources into DuckDB

Key Features: - 30+ sources with wizard support (Stripe, HubSpot, Google Analytics, Salesforce, etc.) - Access to 60+ dlt sources via dlt_native for advanced users - CSV file ingestion with auto-detection - Custom REST API sources - OAuth 2.0 authentication handling - Incremental loading (only new/changed data) - Automatic schema evolution (for dlt sources)

What it does:

dango sync
# → dlt connects to Stripe API
# → Fetches charges, customers, subscriptions
# → Writes to raw_stripe.* tables in DuckDB
# → Auto-generates staging models in dbt/models/staging/
# → Tracks metadata (_dlt_load_id, _dlt_extracted_at)

Learn more: Data Sources section

DuckDB - Analytics Database¶

Purpose: Store and query data with SQL

Why DuckDB?: - Embedded (no server to manage) - OLAP-optimized (fast analytics queries) - Efficient storage (columnar format) - Full SQL support (window functions, CTEs, etc.) - Handles datasets up to ~100GB comfortably on a modern laptop

For larger datasets

DuckDB is optimized for single-machine analytical workloads. For enterprise-scale needs (petabytes), consider cloud data warehouses like Snowflake or BigQuery.

Database file: data/warehouse.duckdb

Schema organization:

warehouse.duckdb
├── raw_{source_name}   # Raw ingested data (e.g., raw_stripe, raw_hubspot)
├── staging             # Cleaned data (tables, not views)
├── intermediate        # Reusable business logic
└── marts               # Final metrics for BI

Each data source gets its own raw_{source_name} schema (e.g., raw_stripe, raw_csv_orders).

Learn more: Data Layers

dbt - SQL Transformations¶

Purpose: Transform raw data into analytics-ready tables

What Dango automates: - Auto-generates staging models during dango sync - Creates data lineage documentation - Generates dbt-docs website (documentation and lineage visualization)

What you control: - Custom transformations in dbt/models/ - Business logic in SQL - Data tests and documentation

Example workflow:

dango sync              # Load raw data + auto-generate staging models
# Edit dbt/models/marts/customer_metrics.sql
dango run               # Run dbt transformations
dango docs              # Generate documentation

Learn more: Transformations section

Metabase - Business Intelligence¶

Purpose: Visualize data and create dashboards

Auto-configured on dango start: - DuckDB connection established - Admin account auto-provisioned (local development only) - All tables and schemas discovered - Sample collections set up

Access: Run dango start first, then access via: - Web UI at http://localhost:8800 (recommended - includes navigation to all services) - Direct Metabase at http://localhost:3000

What you can do: - Write SQL queries with autocomplete - Create visualizations (charts, tables, maps) - Build dashboards - Set up alerts and notifications - Schedule email reports - Share with stakeholders

Learn more: Dashboards section

Web UI - Monitoring & Management¶

Purpose: Monitor pipelines and manage sources

Built with: FastAPI (Python backend)

Access: http://localhost:8800

Features: - Real-time sync status (WebSocket updates) - Source management (add, remove, configure) - CSV file uploads - Validation reports - Activity logs - Links to Metabase and dbt-docs

Learn more: Web UI section

Data Flow Example¶

Let's walk through what happens when you add a Stripe source:

1. Configuration¶

Option A: Interactive wizard (recommended)

dango source add
# Select "Stripe" from the list
# Follow the prompts

Option B: Manual configuration in .dango/sources.yml:

sources:
  - name: stripe_payments
    type: stripe
    enabled: true
    stripe:
      stripe_secret_key_env: STRIPE_API_KEY
      start_date: 2024-01-01

You can also add sources via the Web UI at http://localhost:8800.

2. Ingestion + Staging Generation¶

When you run dango sync:

dango sync
# Or trigger from Web UI at http://localhost:8800

Dango executes: 1. Reads Stripe API credentials from environment 2. Authenticates with Stripe via dlt 3. Fetches data (charges, customers, subscriptions) 4. Writes to DuckDB: - raw_stripe.charges - raw_stripe.customers - raw_stripe.subscriptions 5. Tracks load metadata (_dlt_load_id, _dlt_extracted_at) 6. Auto-generates staging models

3. Generated Staging Models¶

After sync, Dango creates:

dbt/models/staging/
├── stg_stripe_charges.sql
├── stg_stripe_customers.sql
├── stg_stripe_subscriptions.sql
├── _stg_stripe__sources.yml
└── _stg_stripe__schema.yml

4. Transformations (dbt)¶

Create custom business logic in dbt/models/marts/:

-- dbt/models/marts/customer_metrics.sql
{{ config(materialized='table') }}

WITH charges AS (
    SELECT
        customer_id,
        SUM(amount) as total_spent,
        COUNT(*) as order_count
    FROM {{ ref('stg_stripe_charges') }}
    GROUP BY customer_id
),

customers AS (
    SELECT
        id,
        email,
        created
    FROM {{ ref('stg_stripe_customers') }}
)

SELECT
    c.id,
    c.email,
    COALESCE(ch.total_spent, 0) as lifetime_value,
    COALESCE(ch.order_count, 0) as total_orders
FROM customers c
LEFT JOIN charges ch ON c.id = ch.customer_id

Run transformations:

dango run

5. Visualization (Metabase)¶

Start the platform if not already running:

dango start

Open the Web UI (http://localhost:8800) and navigate to Metabase, or go directly to http://localhost:3000. Query your marts tables:

SELECT * FROM marts.customer_metrics
ORDER BY lifetime_value DESC
LIMIT 10

Create charts, dashboards, and share with your team.

Tech Stack Details¶

Core Tools¶

Tool	Version	Purpose	Language
dlt	1.24.x	Data ingestion	Python
dbt	1.10.x	SQL transformations	Python + SQL
DuckDB	1.5.x	Analytics database	C++
Metabase	0.59.x	Business intelligence	Java/Clojure

Dango Components¶

Component	Technology	Purpose
Web Backend	FastAPI	REST API + WebSockets
Service Orchestration	Docker Compose	Metabase + dbt-docs containers
File Watcher	watchdog (Python)	Auto-sync on file changes
CLI	Click (Python)	Command-line interface
Config Management	YAML + TOML	Sources, project settings

Service Management¶

When you run dango start, the following services are launched:

Local Services¶

FastAPI Web UI - Port 8800
File Watcher - Background process (if auto_sync: true)

Docker Containers¶

Metabase - Port 3000
dbt-docs - Port 8081

Check service status:

dango status

Output:

Project: my-analytics (Port: 8800)
Status: ● Running

Services:
  FastAPI Web UI     ● Running (http://localhost:8800)
  File Watcher       ● Running (auto-sync enabled)
  Metabase          ● Running (http://localhost:3000)
  dbt-docs          ● Running (http://localhost:8081)

Database: data/warehouse.duckdb (42.3 MB)

Stop all services:

dango stop

Local vs. Cloud Architecture¶

Local¶

┌─────────────────────────────────────┐
│        Your Laptop                  │
│  ┌──────────────────────────────┐  │
│  │  DuckDB (data/warehouse.db)  │  │
│  │  FastAPI (port 8800)         │  │
│  │  File Watcher (auto-sync)    │  │
│  │  Docker (Metabase, dbt-docs) │  │
│  └──────────────────────────────┘  │
└─────────────────────────────────────┘

Cloud¶

┌─────────────────────────────────────┐
│      Cloud Server (DO / BYOS)       │
│  ┌──────────────────────────────┐  │
│  │  Caddy (HTTPS, reverse proxy)│  │
│  │  DuckDB (warehouse.duckdb)   │  │
│  │  FastAPI (single worker)     │  │
│  │  APScheduler (cron/interval) │  │
│  │  Docker (Metabase :ro mount) │  │
│  └──────────────────────────────┘  │
└─────────────────────────────────────┘

The same .dango/ configuration files work in both environments. See Local vs Cloud for detailed behavioral differences.

Module Dependency Hierarchy¶

Dango's codebase is organized into four dependency levels. Higher levels can import from lower levels, but never the reverse.

Level	Role	Modules
L0 (base)	No internal imports	`config/`, `utils/`, `security/`, `migrations/`, `templates/`
L1 (core)	Imports L0 only	`oauth/`, `ingestion/`, `transformation/`, `auth/`, `governance/`, `notebooks/`, `analysis/`
L2 (platform)	Imports L0-L1	`platform/` (scheduling, cloud, local), `web/`, `visualization/`
L3 (ui)	Imports any level	`cli/`

Import rules:

Downward only — higher levels import lower levels, never reverse
Same-level OK — as long as there are no circular dependencies
Lazy imports — used sparingly for orchestration in the ingestion runner

Key modules added in v1

auth/ — Password + 2FA + API key authentication, 3 roles (admin/editor/viewer), 29 permissions, Metabase SSO bridge
governance/ — Schema drift detection (breaking/additive), PII scanning (Presidio + spaCy)
analysis/ — Custom monitoring metrics, 6 comparison types, drill-down analysis
notebooks/ — Marimo notebook management, DuckDB snapshots, file locking
platform/scheduling/ — Cron/interval schedules, webhook notifications, execution history
platform/cloud/ — DigitalOcean + BYOS deployment, SSH management, remote operations

Design Principles¶

1. Minimal Setup¶

Get started without complex infrastructure: - DuckDB is embedded (single file database) - Metabase runs in Docker - Web UI is a lightweight Python process

2. Configuration Over Code¶

Define sources in YAML, not Python:

# This...
sources:
  - name: stripe
    type: stripe
    stripe:
      stripe_secret_key_env: STRIPE_API_KEY

# ...instead of this (raw dlt code)
import dlt
pipeline = dlt.pipeline(...)
data = stripe_source(api_key=os.getenv("STRIPE_API_KEY"))
pipeline.run(data)

3. Auto-Generated Boilerplate¶

Dango generates the repetitive dbt staging models automatically during sync:

dango sync
# → Loads data from sources
# → Auto-generates staging models
# → You focus on business logic in marts/

4. Batteries Included¶

Set up a complete analytics stack with one command:

curl -sSL https://getdango.dev/install.sh | bash

This gives you dlt, dbt, DuckDB, and Metabase—ready to go.

Then build your data pipeline:

dango source add      # Add data sources
dango sync           # Load data + generate staging
dango start          # Launch platform

Next Steps¶

Data Layers - Learn how data is organized across schemas
DuckDB & Single-Writer - Understand the single-writer constraint
Project Structure - Understand the directory layout
Local vs Cloud - Differences between deployment modes