Source Registry¶
Complete reference for all data source types supported by Dango.
Overview¶
Dango supports 33 data sources across 10 categories, using 5 authentication types. Sources are added via dango source add (wizard-enabled) or manual YAML configuration in .dango/sources.yml.
- 25 wizard-enabled sources can be configured interactively
- 8 wizard-disabled sources require manual YAML configuration or the
dlt_nativebypass - Sources without a dedicated Pydantic config model use
generic_config: dictin YAML
Source Summary¶
| Source | Type Key | Category | Auth | Wizard | Incremental |
|---|---|---|---|---|---|
| File Import (CSV, JSON, Parquet) | local_files | Local & Custom | None | ||
| REST API (Generic) | rest_api | Local & Custom | API Key | ||
| dlt Native Source (Advanced) | dlt_native | Local & Custom | None | ||
| CSV Files | csv | Local & Custom | None | ||
| Files & Cloud Storage | filesystem | Local & Custom | None | ||
| Google Sheets | google_sheets | Marketing & Analytics | OAuth | ||
| Facebook Ads | facebook_ads | Marketing & Analytics | OAuth | ||
| Google Analytics (GA4) | google_analytics | Marketing & Analytics | OAuth | ||
| Google Ads | google_ads | Marketing & Analytics | OAuth | ||
| Mux | mux | Marketing & Analytics | API Key | ||
| Airtable | airtable | Marketing & Analytics | API Key | ||
| Matomo Analytics | matomo | Marketing & Analytics | API Key | ||
| HubSpot | hubspot | Business & CRM | API Key | ||
| Salesforce | salesforce | Business & CRM | Service Account | ||
| Zendesk | zendesk | Business & CRM | Basic | ||
| Pipedrive | pipedrive | Business & CRM | API Key | ||
| Freshdesk | freshdesk | Business & CRM | API Key | ||
| Workable | workable | Business & CRM | API Key | ||
| Jira | jira | Business & CRM | Basic | ||
| Asana | asana | Business & CRM | API Key | ||
| Stripe | stripe | E-commerce & Payment | API Key | ||
| Shopify | shopify | E-commerce & Payment | OAuth | ||
| Notion | notion | Files & Storage | API Key | ||
| Email Inbox (IMAP) | inbox | Files & Storage | Basic | ||
| MongoDB | mongodb | Databases | Basic | ||
| PostgreSQL | postgres | Databases | Basic | ||
| GitHub | github | Development | API Key | ||
| Slack | slack | Communication | API Key | ||
| Apache Kafka | kafka | Streaming | None | ||
| Amazon Kinesis | kinesis | Streaming | Service Account | ||
| Chess.com | chess | Other | None | ||
| Strapi | strapi | Other | API Key | ||
| Personio | personio | Other | API Key |
Sources by Category¶
Local & Custom¶
File Import (local_files)
¶
Load CSV, JSON, JSONL, or Parquet files from a directory. All matching files are combined into a single raw table. On re-sync, new/modified files are loaded, deleted files are removed.
sources:
- name: sales_data
type: local_files
local_files:
directory: data/uploads/sales
file_pattern: "*.csv"
| Field | Type | Default | Description |
|---|---|---|---|
directory | path | -- | Directory containing files (required) |
file_pattern | string | "*" | Glob pattern for files to load |
notes | string | -- | Notes about how to regenerate files |
REST API (rest_api)
¶
Connect to any REST API with configurable authentication (bearer, API key, basic, OAuth2 client credentials, custom header).
sources:
- name: custom_api
type: rest_api
rest_api:
base_url: https://api.example.com/v1
auth_type: bearer
auth_token_env: API_TOKEN
endpoints:
- path: /users
- path: /orders
params:
limit: 100
| Field | Type | Default | Description |
|---|---|---|---|
base_url | string | -- | Base URL for API (required) |
endpoints | list[dict] | -- | Endpoint definitions (required) |
auth_type | string | "bearer" | Auth type: bearer, api_key, basic, oauth2_client_credentials, custom_header, none |
auth_token_env | string | -- | Env var with auth token/key |
api_key_name | string | -- | Header or query param name for API key auth |
api_key_location | string | -- | Where to send API key: "header" or "query" |
basic_username_env | string | -- | Env var for HTTP Basic username |
basic_password_env | string | -- | Env var for HTTP Basic password |
access_token_url | string | -- | OAuth2 token endpoint URL |
client_id_env | string | -- | Env var for OAuth2 client ID |
client_secret_env | string | -- | Env var for OAuth2 client secret |
auth_header_name | string | -- | Custom auth header name (e.g., X-Shopify-Access-Token) |
headers | dict | -- | Additional request headers |
dlt Native Source (dlt_native)
¶
Use any dlt verified source or custom source not in Dango's registry. For advanced users.
sources:
- name: hubspot_crm
type: dlt_native
dlt_native:
source_module: hubspot
source_function: hubspot
function_kwargs:
api_key: "env:HUBSPOT_API_KEY"
| Field | Type | Default | Description |
|---|---|---|---|
source_module | string | -- | Python module name (required) |
source_function | string | -- | Function name to call (required) |
function_kwargs | dict | {} | Arguments passed to the source function |
pipeline_name | string | source name | Custom pipeline name |
dataset_name | string | source name | Custom dataset name |
CSV Files (csv)¶
Hidden source
The csv type is hidden in the wizard. Use local_files instead, which supports CSV plus JSON, JSONL, and Parquet formats.
| Field | Type | Default | Description |
|---|---|---|---|
directory | path | -- | Directory containing CSV files (required) |
file_pattern | string | "*.csv" | Glob pattern for files |
notes | string | -- | Regeneration notes |
Files & Cloud Storage (filesystem)¶
Hidden source
The filesystem type is hidden in the wizard. Use local_files for local files or filesystem with manual YAML for cloud storage (S3, GCS, Azure).
Marketing & Analytics¶
Google Sheets (google_sheets)
¶
Load data from Google Sheets (one or more tabs). Requires OAuth setup via dango oauth setup google.
sources:
- name: budgets
type: google_sheets
google_sheets:
spreadsheet_url_or_id: https://docs.google.com/spreadsheets/d/1abc...
range_names:
- Monthly Budget
- Quarterly Forecast
deduplication: latest_only
| Field | Type | Default | Description |
|---|---|---|---|
spreadsheet_url_or_id | string | -- | Spreadsheet URL or ID (required) |
range_names | list[string] | -- | Sheet/tab names to load (required) |
deduplication | enum | latest_only | Dedup strategy: none, latest_only, append_only, scd_type2 |
Facebook Ads (facebook_ads)
¶
Load ad campaigns, ads, creatives, leads, and daily performance metrics.
sources:
- name: facebook_marketing
type: facebook_ads
facebook_ads:
account_id: 123456789
access_token_env: FB_ACCESS_TOKEN
initial_load_past_days: 30
| Field | Type | Default | Description |
|---|---|---|---|
account_id | string | -- | Facebook Ads Account ID with act_ prefix (required) |
access_token_env | string | FB_ACCESS_TOKEN | Env var with access token |
initial_load_past_days | integer | 30 | Historical days to load on first sync |
start_date | date | -- | Start date (YYYY-MM-DD) |
resources | list[string] | all | Resources to sync |
Default resources: campaigns, ads, ad_sets, facebook_insights
Available resources: campaigns, ads, ad_sets, ad_creatives, leads, facebook_insights
Google Analytics (google_analytics)
¶
Load website analytics data from Google Analytics 4. Supports custom report queries.
sources:
- name: website_analytics
type: google_analytics
google_analytics:
property_id: "123456789"
credentials_env: GOOGLE_CREDENTIALS
start_date: "2024-01-01"
| Field | Type | Default | Description |
|---|---|---|---|
property_id | string | -- | GA4 property ID (required) |
credentials_env | string | GOOGLE_CREDENTIALS | Env var with credentials |
start_date | string | -- | Start date (YYYY-MM-DD or relative like 90daysAgo) |
Google Ads (google_ads)
¶
Load daily performance metrics from Google Ads via GAQL queries. Includes 5 default queries (campaign stats, ad group stats, keyword stats, ad stats, search term stats).
| Field | Type | Default | Description |
|---|---|---|---|
property_id | string | -- | Google Ads customer ID (required) |
credentials_env | string | GOOGLE_CREDENTIALS | Env var with credentials |
Mux (mux)
¶
Load video analytics data from Mux.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Airtable (airtable)
¶
Load tables from Airtable bases.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Matomo Analytics (matomo)¶
Wizard disabled
Disabled because Matomo passes the auth token via GET parameter, which is a security risk. Configure manually with dlt_native.
Business & CRM¶
HubSpot (hubspot)
¶
Load contacts, companies, deals, and tickets from HubSpot CRM.
sources:
- name: hubspot_crm
type: hubspot
hubspot:
api_key_env: HUBSPOT_API_KEY
resources:
- contacts
- companies
- deals
| Field | Type | Default | Description |
|---|---|---|---|
api_key_env | string | HUBSPOT_API_KEY | Env var with API key |
resources | list[string] | ["contacts", "companies", "deals", "tickets"] | Resources to sync |
Available resources: contacts, companies, deals, tickets, products, quotes, owners, properties, pipelines_deal, pipelines_ticket
Salesforce (salesforce)
¶
Load data from Salesforce CRM using service account authentication.
| Field | Type | Default | Description |
|---|---|---|---|
resources | list[string] | all | Resources to sync |
Default resources: account, contact, lead, opportunity, campaign
Available resources: account, contact, lead, opportunity, campaign, task, event, sf_user, user_role, product_2
Zendesk (zendesk)
¶
Load support tickets, users, and chat data from Zendesk Support.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Default resources: tickets, ticket_fields
Available resources: tickets, ticket_fields, ticket_events, ticket_metric_events
Pipedrive (pipedrive)
¶
Load deals, contacts, and activities from Pipedrive CRM.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Freshdesk (freshdesk)
¶
Load support tickets, agents, and companies from Freshdesk.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Workable (workable)
¶
Load candidates, jobs, and events from Workable ATS.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Jira (jira)¶
Wizard disabled
Disabled due to endpoint issues in the dlt source. Configure manually with dlt_native.
Asana (asana)¶
Wizard disabled
Disabled because the Asana SDK was removed from the dlt source. Configure manually with dlt_native.
E-commerce & Payment¶
Stripe (stripe)
¶
Load payment data from Stripe (charges, customers, subscriptions, etc.).
sources:
- name: stripe_payments
type: stripe
stripe:
stripe_secret_key_env: STRIPE_API_KEY
endpoints:
- charges
- customers
- invoices
start_date: "2024-01-01"
| Field | Type | Default | Description |
|---|---|---|---|
stripe_secret_key_env | string | STRIPE_API_KEY | Env var with Stripe secret key |
endpoints | list[string] | all | Specific endpoints to sync |
start_date | date | -- | Start date (YYYY-MM-DD) |
end_date | date | -- | End date (YYYY-MM-DD) |
Shopify (shopify)¶
Wizard disabled
Disabled because Shopify requires Authorization Code Grant OAuth 2.0, which needs a dedicated dango oauth shopify provider (not yet implemented).
Files & Storage¶
Notion (notion)
¶
Load pages and databases from Notion.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Email Inbox (inbox)
¶
Read messages and attachments from email inbox via IMAP.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Databases¶
MongoDB (mongodb)
¶
Load collections from MongoDB databases with incremental support.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
PostgreSQL (postgres)
¶
Load tables from PostgreSQL databases with schema filtering.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Streaming¶
Apache Kafka (kafka)
¶
Extract messages from Kafka topics.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Amazon Kinesis (kinesis)
¶
Read messages from Kinesis streams.
| Field | Type | Default | Description |
|---|---|---|---|
generic_config | dict | -- | See generic_config |
Development¶
GitHub (github)
¶
Load repository data, issues, pull requests, and commits from GitHub.
sources:
- name: my_repo
type: github
github:
access_token_env: GITHUB_ACCESS_TOKEN
owner: my-org
name: my-repo
| Field | Type | Default | Description |
|---|---|---|---|
access_token_env | string | GITHUB_ACCESS_TOKEN | Env var with personal access token |
owner | string | -- | GitHub username or organization (required) |
name | string | -- | Repository name (required) |
Communication¶
Slack (slack)
¶
Load messages, channels, and user data from Slack.
sources:
- name: slack_data
type: slack
slack:
access_token_env: SLACK_ACCESS_TOKEN
selected_channels:
- C01234ABCDE
| Field | Type | Default | Description |
|---|---|---|---|
access_token_env | string | SLACK_ACCESS_TOKEN | Env var with Slack bot token |
selected_channels | list[string] | all | Channel IDs to sync |
start_date | date | -- | Start date for message history |
Other¶
Chess.com (chess)
¶
Load player profiles and games from Chess.com API. No authentication required.
Strapi (strapi)¶
Wizard disabled
Untested, requires a Docker Strapi instance.
Personio (personio)¶
Wizard disabled
Enterprise-only API.
generic_config¶
Sources without a dedicated Pydantic configuration model use the generic_config field. This applies to 21+ sources including Zendesk, Pipedrive, Freshdesk, Workable, Airtable, Mux, Notion, Inbox, MongoDB, PostgreSQL, Kafka, Kinesis, and others.
The generic_config field accepts any key-value pairs that the underlying dlt source function expects:
sources:
- name: my_zendesk
type: zendesk
generic_config:
subdomain: mycompany
email: [email protected]
Refer to the dlt documentation for source-specific parameters.
Capabilities Matrix¶
| Capability | Description | Sources |
|---|---|---|
| Performance Metrics | Source provides built-in analytics/metrics | Facebook Ads, Google Analytics, Google Ads, Matomo, Mux |
| Date Range | Supports start_date/end_date filtering | Stripe, Shopify, Google Analytics, Google Ads, Zendesk, Workable, Slack, Mux |
| Incremental | Supports incremental loading (only new/changed data) | CSV, Local Files, REST API, Facebook Ads, Google Analytics, HubSpot, Salesforce, Zendesk, Pipedrive, Freshdesk, Workable, Stripe, Shopify, Slack, GitHub, Inbox, MongoDB, PostgreSQL, Kafka, Kinesis, Asana, Matomo, Chess, Personio |
| Custom Queries | Supports user-defined queries or report definitions | dlt Native, REST API, Google Analytics, Google Ads, Matomo |
Authentication Types¶
| Auth Type | Description | Sources |
|---|---|---|
| None | No authentication required | CSV, Local Files, dlt Native, Filesystem, Kafka, Chess |
| API Key | API key passed via environment variable | REST API, Stripe, HubSpot, GitHub, Slack, Pipedrive, Freshdesk, Workable, Airtable, Mux, Matomo, Notion, Asana, Strapi, Personio |
| OAuth | OAuth 2.0 flow via dango oauth <provider> | Google Sheets, Facebook Ads, Google Analytics, Google Ads, Shopify |
| Basic | Username/password or token authentication | Zendesk, Jira, Inbox, MongoDB, PostgreSQL |
| Service Account | Service account credentials (JSON key file) | Salesforce, Kinesis |
Common Source Fields¶
These fields are available on every source regardless of type:
| Field | Type | Default | Description |
|---|---|---|---|
name | string | -- | Unique source name (required, lowercase alphanumeric + underscore) |
type | SourceType | -- | Source type key (required) |
enabled | boolean | true | Whether to include in syncs |
description | string | -- | Human-readable description |
tags | list[string] | [] | Metadata tags for organization |
lookback_days | integer | -- | Re-load this many days on incremental sync (ignored on full refresh) |
Related Pages¶
- Source Catalog — User-friendly guide to choosing sources
- Adding Sources — Step-by-step source configuration
- Configuration Reference — Full
sources.ymlschema