Project Structure¶
Understanding the directory layout and configuration files.
Overview¶
A Dango project has a standard directory structure created by dango init:
my-analytics/
├── .dango/ # Project configuration (13 config files)
│ ├── state/ # Runtime state (locks, sync status, deploy journal)
│ ├── logs/ # Application logs
│ └── snapshots/ # DuckDB read-only snapshots for notebooks
├── .dlt/ # dlt configuration
├── data/ # Database and uploads
├── dbt/ # dbt project
├── custom_sources/ # Custom dlt sources
├── notebooks/ # Marimo notebooks
├── dashboards/ # Metabase exports (optional)
├── docker-compose.yml # Docker services
├── Dockerfile.metabase # Metabase DuckDB driver
├── .gitignore # Git ignore patterns
├── .env # Environment variables
└── README.md # Auto-generated docs
.dango/ - Project Configuration¶
The .dango/ directory contains all Dango-specific configuration, runtime state, and logs.
Directory Contents¶
.dango/
├── project.yml # Project metadata and platform settings
├── sources.yml # Data source definitions
├── schedules.yml # Sync schedules (cron/interval)
├── monitors.yml # Metric analysis configuration
├── notifications.yml # Webhook notification configs
├── pii-overrides.yml # PII detection overrides
├── cloud.yml # Cloud deployment details (auto-generated)
├── metabase.yml # Metabase admin credentials (auto-generated)
├── state/ # Runtime state files
│ ├── dbt.lock # DuckDB write lock
│ ├── dbt.lock.json # Lock holder metadata
│ ├── sync_status_*.json # Sync progress tracking
│ └── deployments.jsonl # Deployment journal (append-only)
├── logs/ # Application logs
└── snapshots/ # DuckDB read-only snapshots for notebooks
Configuration Files (13 total)¶
| File | Purpose | Commit to Git? |
|---|---|---|
.dango/project.yml | Project metadata, platform settings | Yes |
.dango/sources.yml | Source definitions and config | Yes |
.dango/schedules.yml | Sync schedules | Yes |
.dango/monitors.yml | Metric analysis config | Yes |
.dango/notifications.yml | Webhook configs | Yes |
.dango/pii-overrides.yml | PII status overrides | Yes |
.dango/metabase.yml | Metabase admin creds (auto-generated) | No |
.dango/cloud.yml | Cloud server details (auto-generated) | No |
.env | API keys, environment variables | No |
.dlt/secrets.toml | OAuth tokens, dlt credentials | No |
dbt/profiles.yml | DuckDB connection for dbt | Yes |
dbt/dbt_project.yml | dbt project config | Yes |
docker-compose.yml | Docker service definitions | Yes |
project.yml¶
Project metadata and platform configuration.
Location: .dango/project.yml
Example:
project:
name: my-analytics
organization: Acme Corp
created: 2024-11-15
created_by: [email protected]
purpose: Track daily sales and customer behavior
dango_version: 0.0.5 # Dango version this project was created with
stakeholders:
- name: Bob Chen
role: CEO - Executive view
contact: [email protected]
- name: Sarah Lee
role: Analyst - Power user
contact: [email protected]
sla: "Daily by 9am UTC"
limitations: "Stripe data has 24h delay. GA4 takes 48h to backfill."
getting_started: |
1. Run: dango sync
2. Open: http://localhost:8800
platform:
port: 8800 # Web UI port
metabase_port: 3000 # Metabase BI port
dbt_docs_port: 8081 # dbt docs port
auto_sync: true # Enable file watcher
auto_dbt: true # Auto-run dbt after sync
debounce_seconds: 600 # Wait 10 min before triggering
watch_patterns:
- "*.csv"
- "*.json"
- "*.jsonl"
- "*.ndjson"
- "*.parquet"
watch_directories:
- "data/uploads"
duckdb_path: "./data/warehouse.duckdb"
dbt_project_dir: "./dbt"
data_dir: "./data"
Key sections:
- project - Human-readable metadata
- stakeholders - Who uses this project
- sla - Data freshness expectations
- limitations - Known data delays or gaps
- platform - Technical configuration
sources.yml¶
Data source definitions.
Location: .dango/sources.yml
Example:
version: "1.0"
sources:
# CSV source
- name: orders
type: csv
enabled: true
csv:
directory: ./data/uploads
file_pattern: "orders_*.csv"
deduplication_strategy: latest_only
primary_key: order_id
timestamp_column: created_at
timestamp_sort: desc
# Stripe API source
- name: stripe_payments
type: stripe
enabled: true
stripe:
stripe_secret_key_env: STRIPE_API_KEY
start_date: 2024-01-01
# Google Sheets source (OAuth)
- name: forecast
type: google_sheets
enabled: true
google_sheets:
spreadsheet_url_or_id: "1a2b3c4d5e6f..."
range_names: [Sheet1, Q4_Planning]
# Custom dlt_native source
- name: internal_api
type: dlt_native
enabled: true
dlt_native:
source_module: my_custom_source
source_function: internal_api_source
function_kwargs:
api_url: "https://api.internal.com"
api_key_env: INTERNAL_API_KEY
# Disabled source
- name: old_hubspot
type: hubspot
enabled: false
Common fields:
name- Unique identifier (used in table names)type- Source type (csv, stripe, hubspot, etc.)enabled- Whether to sync this source<type>- Type-specific configuration
.dlt/ - dlt Configuration¶
The .dlt/ directory contains dlt pipeline configuration.
Directory Contents¶
.dlt/
├── config.toml # Pipeline settings (destination, etc.)
└── secrets.toml # API keys, OAuth tokens (git-ignored)
What to commit: config.toml What to ignore: secrets.toml
config.toml¶
dlt pipeline configuration.
Location: .dlt/config.toml
Auto-generated content:
You rarely need to edit this - Dango manages it automatically.
secrets.toml¶
Sensitive credentials (API keys, OAuth tokens).
Location: .dlt/secrets.toml
Example:
[sources.stripe]
stripe_secret_key = "sk_live_abc123..."
[sources.google_analytics]
credentials = """
{
"type": "service_account",
"project_id": "my-ga-project",
"private_key": "-----BEGIN PRIVATE KEY-----\n...",
"client_email": "[email protected]"
}
"""
[sources.facebook_ads]
access_token = "EAABwzLix..."
account_id = "123456789"
Security: - Never commit this file (included in .gitignore) - Use .env or environment variables as alternative - dlt reads credentials in this priority order: 1. .dlt/secrets.toml 2. .env 3. Environment variables
data/ - Database and Uploads¶
The data/ directory stores the DuckDB database and CSV uploads.
Directory Contents¶
data/
├── warehouse.duckdb # Main analytics database
├── warehouse.duckdb.wal # Write-ahead log (temporary)
├── uploads/ # Default CSV upload location
│ ├── orders_2024-12-01.csv
│ ├── orders_2024-12-02.csv
│ └── customers.csv
└── warehouse/ # Legacy data location (ignored)
What to commit: Nothing (all ignored by default) What to backup: warehouse.duckdb, uploads/
warehouse.duckdb¶
The DuckDB analytics database file.
Location: data/warehouse.duckdb
Size: Typically 10-500 MB for small projects
Contents: - All raw data ingested by dlt - Staging models (tables) - Intermediate models (tables) - Marts models (tables) - dlt metadata tables
Direct access:
# Open DuckDB CLI
duckdb data/warehouse.duckdb
# Run queries
SELECT * FROM marts.customer_metrics LIMIT 10;
Backup:
# Copy database file
cp data/warehouse.duckdb backups/warehouse-2024-12-01.duckdb
# Or export specific tables
duckdb data/warehouse.duckdb -c "EXPORT DATABASE 'export_dir'"
uploads/¶
Default location for CSV file uploads.
Location: data/uploads/
How it's used: 1. You drop CSV files here 2. File watcher detects changes (if auto_sync: true) 3. Dango auto-syncs after debounce period 4. Data lands in raw_{source}.* tables
File naming patterns:
# Match specific pattern
file_pattern: "orders_*.csv"
# → orders_2024-12-01.csv ✓
# → orders_2024-12-02.csv ✓
# → customers.csv ✗
# Match all CSVs
file_pattern: "*.csv"
# → Any .csv file ✓
dbt/ - dbt Project¶
The dbt/ directory contains all SQL transformation logic.
Directory Contents¶
dbt/
├── dbt_project.yml # dbt project configuration
├── profiles.yml # DuckDB connection profile
├── models/
│ ├── staging/ # Auto-generated staging models
│ │ ├── stg_stripe_charges.sql
│ │ ├── stg_stripe_customers.sql
│ │ ├── _stg_stripe__sources.yml
│ │ └── _stg_stripe__schema.yml
│ ├── intermediate/ # User-created business logic
│ │ ├── int_customer_ltv.sql
│ │ └── int_customer_segments.sql
│ └── marts/ # User-created final metrics
│ ├── customer_metrics.sql
│ ├── daily_sales.sql
│ └── schema.yml
├── macros/ # Reusable SQL macros
├── tests/ # Data tests
├── snapshots/ # SCD Type 2 snapshots
└── target/ # Build artifacts (git-ignored)
What to commit: Everything except target/, logs/, dbt_packages/
dbt_project.yml¶
dbt project configuration.
Location: dbt/dbt_project.yml
Auto-generated content:
name: my_analytics
version: 1.0.0
profile: duckdb
model-paths: [models]
test-paths: [tests]
data-paths: [data]
macro-paths: [macros]
snapshot-paths: [snapshots]
target-path: target
clean-targets: [target, dbt_packages]
models:
my_analytics:
staging:
materialized: table # Tables for Metabase performance
tags: [staging]
intermediate:
materialized: table # Pre-computed for reuse
tags: [intermediate]
marts:
materialized: table # Optimized for BI queries
tags: [marts]
profiles.yml¶
dbt DuckDB connection configuration.
Location: dbt/profiles.yml
Auto-generated content:
Connects dbt to: data/warehouse.duckdb
models/ Structure¶
staging/¶
Auto-generated by dango sync.
You can customize these files by editing the generated SQL. Remove the auto-generated comment at the top to prevent Dango from overwriting your changes.
Contents: - stg_<source>__<table>.sql - Staging model SQL - _stg_<source>__sources.yml - Documents raw tables - _stg_<source>__schema.yml - Column descriptions
intermediate/¶
User-created reusable business logic.
You create these files.
Examples:
-- int_customer_ltv.sql
-- Reusable customer lifetime value calculation
-- int_customer_segments.sql
-- Customer segmentation logic
marts/¶
User-created final metrics for BI tools.
You create these files.
Examples:
-- customer_metrics.sql
-- Wide table with all customer KPIs
-- daily_sales.sql
-- Daily aggregated sales metrics
custom_sources/ - Custom dlt Sources¶
Write custom dlt sources without modifying Dango code.
Directory Contents¶
What to commit: All .py files
Example Custom Source¶
File: custom_sources/my_api_source.py
import dlt
from typing import Iterator, Dict, Any
@dlt.resource
def my_api_source(api_key: str, endpoint: str) -> Iterator[Dict[str, Any]]:
"""
Custom dlt source for internal API.
"""
import requests
headers = {"Authorization": f"Bearer {api_key}"}
response = requests.get(endpoint, headers=headers)
response.raise_for_status()
for item in response.json():
yield item
Configuration in sources.yml:
- name: my_api_data
type: dlt_native
enabled: true
dlt_native:
source_module: my_api_source # Filename (without .py)
source_function: my_api_source # Function name
function_kwargs:
api_key_env: MY_API_KEY # From .env or secrets.toml
endpoint: "https://api.internal.com/data"
Learn more: Custom Sources guide
Docker Configuration¶
Docker Compose orchestrates Metabase and dbt-docs containers.
docker-compose.yml¶
Service definitions.
Location: docker-compose.yml
Auto-generated content (simplified):
services:
metabase:
build:
context: .
dockerfile: Dockerfile.metabase
ports:
- "3000:3000"
volumes:
- metabase-data:/metabase-data
# :ro prevents Metabase JDBC driver from acquiring DuckDB write lock
- ./data:/data:ro
- ./metabase-plugins:/app/plugins
environment:
MB_DB_TYPE: h2
MB_DB_FILE: /metabase-data/metabase.db
restart: unless-stopped
dbt-docs:
image: nginx:alpine
ports:
- "8081:80"
volumes:
- ./dbt/target:/usr/share/nginx/html:ro
restart: unless-stopped
volumes:
metabase-data:
driver: local
Why :ro?
The ./data:/data:ro mount prevents Metabase's JDBC driver from acquiring a DuckDB write lock. See DuckDB & Single-Writer for details.
Dockerfile.metabase¶
Custom Metabase image with DuckDB support.
Location: Dockerfile.metabase
Why a custom image? Standard Metabase images use Alpine Linux which lacks the glibc/libstdc++ libraries that DuckDB requires. This Dockerfile uses Eclipse Temurin (Debian-based) as the base image.
FROM eclipse-temurin:21-jre-jammy
ENV MB_VERSION=v0.59.1
# Download Metabase jar
ADD https://downloads.metabase.com/${MB_VERSION}/metabase.jar /app/metabase.jar
.gitignore¶
Git ignore patterns to protect secrets and large files.
Location: .gitignore
Auto-generated content:
# Secrets
.env
.dlt/secrets.toml
*.key
*.pem
# Data (large files)
data/
*.duckdb
*.duckdb.wal
# dbt artifacts
dbt/target/
dbt/logs/
dbt/dbt_packages/
# Python
__pycache__/
*.pyc
.venv/
venv/
# System
.DS_Store
What gets committed: - .dango/project.yml - .dango/sources.yml - dbt/models/ (your SQL) - custom_sources/ (your Python) - docker-compose.yml - README.md
What stays local: - data/warehouse.duckdb (database) - .env (secrets) - .dlt/secrets.toml (API keys)
.env - Environment Variables¶
Store secrets and environment-specific configuration.
Location: .env
Example:
# Stripe
STRIPE_API_KEY=sk_live_abc123...
# Google Analytics
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# HubSpot
HUBSPOT_API_KEY=pat-na1-xyz...
# Custom sources
MY_API_KEY=internal-api-key-here
INTERNAL_API_URL=https://api.internal.com
Security: - Never commit (in .gitignore) - Use for local development - For production, use proper secrets management
Precedence: 1. .dlt/secrets.toml (highest) 2. .env 3. System environment variables (lowest)
dashboards/ - Metabase Exports (Optional)¶
Store exported Metabase dashboard definitions.
Location: dashboards/
Example:
How to use: 1. Export dashboard from Metabase UI 2. Save YAML to dashboards/ 3. Commit to Git 4. Teammates import into their Metabase
Learn more: Metabase Overview
README.md¶
Auto-generated project documentation.
Location: README.md
Auto-generated on dango init:
# my-analytics
Track daily sales and customer behavior
## Quick Start
```bash
dango sync
dango start
```
Open: http://localhost:8800
## Stakeholders
- Bob Chen (CEO) - [email protected]
- Sarah Lee (Analyst) - [email protected]
## SLA
Daily by 9am UTC
## Limitations
Stripe data has 24h delay. GA4 takes 48h to backfill.
File Size Reference¶
Typical file sizes for a small project:
| File/Directory | Size | Notes |
|---|---|---|
data/warehouse.duckdb | 10-500 MB | Grows with data volume |
.dango/sources.yml | 1-5 KB | Depends on source count |
dbt/models/ | 10-100 KB | Depends on model count |
docker-compose.yml | 1 KB | Static |
| Total project | 50-1000 MB | Mostly database |
Backup Strategy¶
What to backup: 1. Database: data/warehouse.duckdb 2. Configuration: .dango/, dbt/models/ 3. Custom code: custom_sources/ 4. Secrets: .env, .dlt/secrets.toml (separately!)
What NOT to backup: - dbt/target/ (regenerated) - Docker images (re-downloadable) - metabase-plugins/ (re-downloadable)
Multi-Project Setup¶
Running multiple Dango projects on the same machine:
~/analytics/
├── project-a/
│ ├── .dango/project.yml # port: 8800
│ ├── data/warehouse.duckdb
│ └── dbt/
├── project-b/
│ ├── .dango/project.yml # port: 8801
│ ├── data/warehouse.duckdb
│ └── dbt/
└── project-c/
├── .dango/project.yml # port: 8802
├── data/warehouse.duckdb
└── dbt/
Each project gets unique ports: - Project A: Web UI 8800, Metabase 3000, dbt-docs 8081 - Project B: Web UI 8801, Metabase 3001, dbt-docs 8082 - Project C: Web UI 8802, Metabase 3002, dbt-docs 8083
Manage all projects:
Next Steps¶
- Architecture - See how files relate to components
- Data Layers - Understand where data lives in DuckDB
- DuckDB & Single-Writer - How the database lock and snapshots work
- Quick Start - Create your first project