Data Sources¶
Connect to APIs, databases, and local files through Dango's unified data ingestion layer.
Overview¶
Dango supports 33 data sources through dlt (data load tool). Whether you're working with local CSV files, cloud APIs, or existing databases, Dango provides a unified configuration interface.
Wizard vs Manual Sources
Wizard-enabled sources (25 sources): Add via dango source add interactive wizard — handles authentication, configuration, and validation automatically.
Manual sources: Configure directly in sources.yml using dlt_native for any dlt verified source.
See the Source Catalog for the complete list of all 33 sources.
Source categories at a glance:
- Local Files — CSV, JSON, JSONL, Parquet from your filesystem
- OAuth Sources — Google Sheets, GA4, Google Ads, Facebook Ads (browser-based auth)
- API Key Sources — Stripe, HubSpot, Salesforce, GitHub, Slack, and more
- Database Sources — PostgreSQL, MongoDB, and others via dlt
- REST API — Connect to any API with JSON responses
- Custom Sources — Build integrations with Python and dlt
For dlt Users¶
If you're already familiar with dlt (data load tool), here's how Dango relates:
Dango wraps dlt with:
- YAML configuration instead of Python scripts
- Automatic dbt staging model generation
- Unified CLI (
dango sync) for all sources - Web UI for monitoring and management
What stays the same:
- Credentials in
.dlt/secrets.toml(same format) - All dlt verified sources available via
dlt_native - Standard dlt decorators (
@dlt.source,@dlt.resource)
When to use what:
| Scenario | Use |
|---|---|
| Standard sources (Stripe, Google Sheets, etc.) | Dango wizard or YAML config |
| Custom API with simple logic | Dango dlt_native + Python file |
| Complex pipelines, custom destinations | Pure dlt (Dango not needed) |
Learn more:
- Custom Sources — "dlt vs. Dango Workflow" comparison
- Database Sources — "How This Differs from Standard dlt" table
- dlt Documentation — Official dlt docs for advanced topics
Quick Start¶
Add Your First Source¶
Choose your source type and follow the guide:
# Recommended: Use the wizard
dango source add
# Select "File Import (CSV, JSON, Parquet)" and follow prompts
Or configure manually in .dango/sources.yml:
sources:
- name: sales_data
type: local_files
enabled: true
local_files:
directory: data/uploads/sales_data
file_pattern: "*.csv"
Then copy files and sync:
# Configure .dlt/secrets.toml
[sources.sql_database]
credentials = "postgresql://user:pass@host:5432/db"
# Edit .dango/sources.yml
sources:
- name: my_postgres
type: dlt_native
dlt_native:
source_module: sql_database
source_function: sql_database
function_kwargs:
schema: "public"
# Sync
dango sync my_postgres
# custom_sources/my_api.py
import dlt
import requests
@dlt.source
def my_api():
@dlt.resource(name="data")
def get_data():
return requests.get("https://api.example.com/data").json()
return [get_data()]
Source Type Guides¶
-
Local Files
Load CSV, JSON, JSONL, and Parquet files with automatic schema detection and incremental sync.
- 5 supported formats
- File change tracking
- Schema evolution support
-
OAuth Sources
Connect to cloud services using OAuth 2.0 authentication.
- Google Sheets, GA4, Google Ads, Facebook Ads
- Automatic token management
- Browser-based authentication
-
Database Sources
Connect to PostgreSQL, MySQL, SQL Server via dlt.
- Full table or incremental loading
- SSL/TLS support
-
Custom Sources
Build custom integrations using Python and dlt.
- REST APIs
- Web scraping
- Custom data formats
-
Source Catalog
Complete catalog of all 33 supported data sources.
- Source types and auth methods
- Configuration examples
- Sync behavior details
-
Adding Sources
Step-by-step wizard walkthrough and manual YAML configuration.
-
Sync Modes
Incremental loading, full refresh, and date range syncs.
-
Deduplication
Four strategies for handling duplicate records in your data.
Common Workflows¶
Adding a New Source¶
- Choose source type based on your data
- Run the wizard with
dango source add(or editsources.ymlmanually) - Configure credentials (OAuth flow, API key in
.env, or connection string) - Sync with
dango sync <name> - Verify in Metabase or with
dango db query
Managing Multiple Sources¶
# .dango/sources.yml
version: '1.0'
sources:
# Production Stripe data
- name: stripe_prod
type: stripe
enabled: true
stripe:
stripe_secret_key_env: STRIPE_PROD_API_KEY
# Google Sheets for manual data
- name: manual_overrides
type: google_sheets
enabled: true
# PostgreSQL analytics database
- name: analytics_db
type: dlt_native
enabled: true
dlt_native:
source_module: sql_database
source_function: sql_database
# Local CSV exports
- name: finance_reports
type: local_files
enabled: true
local_files:
directory: data/uploads/finance_reports
file_pattern: "*.csv"
Sync All Sources¶
# Sync all enabled sources
dango sync
# Sync specific source
dango sync stripe_prod
# List all sources
dango source list
Data Flow¶
Understanding how data flows from sources to your warehouse:
graph LR
A[Data Source] --> B[dlt]
B --> C[Raw Layer]
C --> D[DuckDB]
D --> E[dbt Staging]
E --> F[dbt Marts]
F --> G[Metabase]
style A fill:#e1f5ff
style B fill:#fff3e0
style C fill:#f3e5f5
style D fill:#e8f5e9
style E fill:#fff9c4
style F fill:#ffebee
style G fill:#e0f2f1 - Source — External API, database, or file
- dlt — Fetches and normalizes data
- Raw Layer — Source data as-loaded in DuckDB
- Staging — Clean starting point (auto-generated by Dango)
- Marts — Business logic (custom SQL models you write)
- Metabase — Dashboards and queries
Learn more about data layers →
Source Configuration¶
sources.yml Structure¶
version: '1.0'
sources:
- name: unique_source_name # Identifier (lowercase, underscores)
type: local_files # Source type
enabled: true # Toggle sync
description: "Optional description"
local_files: # Type-specific config
directory: data/uploads/unique_source_name
file_pattern: "*.csv"
Common Parameters¶
| Parameter | Required | Description |
|---|---|---|
name | Yes | Unique identifier for this source |
type | Yes | Source type from the catalog |
enabled | No | Whether to include in sync (default: true) |
description | No | Human-readable description |
deduplication | No | Strategy: none, latest_only, append_only, scd_type2 |
Credentials Management¶
Never commit credentials! Use one of these methods:
Recommended: .env file (persists across sessions)
Or .dlt/secrets.toml (gitignored credential storage)
Or environment variables (current session only)
Testing Status¶
| Source Type | Status | Notes |
|---|---|---|
| Local Files | CSV, JSON, JSONL, Parquet | |
| Stripe | All resources supported | |
| Google Sheets | OAuth flow verified | |
| Google Analytics 4 | OAuth flow verified | |
| Facebook Ads | OAuth flow verified | |
| Google Ads | OAuth flow verified | |
| HubSpot | Contacts, companies, deals, tickets | |
| GitHub | Issues, PRs, commits | |
| Salesforce | Service account auth | |
| Slack | Channels, messages, users | |
| PostgreSQL | Full table and incremental | |
| MongoDB | Collections with filtering | |
| REST API | Generic API connector | |
| dlt_native | Registry bypass for any dlt source | |
| Coming Soon sources | Pending | Shopify, Matomo, Jira, Asana, Strapi, Personio |
Best Practices¶
1. Use Descriptive Names¶
# Good
- name: stripe_production_payments
- name: marketing_facebook_ads
- name: finance_google_sheets
# Avoid
- name: source1
- name: data
2. Enable Only What You Need¶
Disable unused sources to speed up sync:
3. Document Your Sources¶
- name: crm_export
type: local_files
description: "Weekly CRM export from sales team, updated every Monday"
local_files:
directory: data/uploads/crm_export
file_pattern: "*.csv"
4. Use Incremental When Possible¶
Incremental sync is the default — it loads only new and changed data. See Sync Modes for details on when to use full refresh instead.
5. Monitor Source Health¶
Troubleshooting¶
Source Not Syncing¶
- Check
enabled: truein sources.yml - Verify credentials in
.envor.dlt/secrets.toml - Run
dango validateto see errors - Check network connectivity
Authentication Failures¶
- API keys: Verify not expired, check permissions
- OAuth: Re-authenticate with
dango oauth refresh <source_type> - Database: Test connection outside Dango
Schema Mismatches¶
When APIs change: 1. Run dango sync (schema auto-updates for API sources, staging models regenerated) 2. Update custom dbt models if needed
File schema changes
For local file sources, schema is fixed on first load. Use --allow-schema-changes to add new columns, or --full-refresh to reload with a new schema. See Local Files.
Performance Issues¶
- Use incremental loading for large datasets (default behavior)
- Sync sources individually rather than all at once
- Check API rate limits
- Use
--limit Nduring development to cap row counts
Next Steps¶
-
Adding Sources
Step-by-step wizard walkthrough for your first source.
-
Source Catalog
Explore all 33 supported data source types.
-
Transformations
Transform your loaded data with dbt.