Snapshots (SCD Type 2)¶
Track historical changes to your data using dbt snapshots. Snapshots capture the state of source tables over time, creating a complete audit trail of every change.
Overview¶
Dango supports two types of snapshots:
| Type | Purpose | Command |
|---|---|---|
| dbt Snapshots | SCD Type 2 change tracking — records when rows are inserted, updated, or deleted | dango snapshot add, dango snapshot run |
| DuckDB Snapshots | Read-only database copies for safe notebook exploration | dango snapshot db |
dbt snapshots add two columns to your data:
dbt_valid_from— when this version of the row became activedbt_valid_to— when this version was superseded (NULL= current)
This lets you answer questions like "what did this customer record look like last Tuesday?" or "when did this order status change?"
Quick Start¶
# 1. Create a snapshot (interactive wizard)
dango snapshot add
# 2. Run the snapshot to capture current state
dango snapshot run
# 3. Query historical data
duckdb data/warehouse.duckdb -c "
SELECT * FROM snapshots.snap_shopify_orders
WHERE dbt_valid_to IS NULL -- current records only
"
Creating Snapshots¶
The dango snapshot add wizard walks you through creating a dbt snapshot:
Step 1: Select a Source¶
The wizard queries DuckDB for schemas matching raw_* and lists available sources:
Note
Sources must have data loaded first. Run dango sync before creating snapshots.
Step 2: Select a Table¶
Tables from the selected source are listed, excluding internal dlt tables (_dlt_*):
Step 3: Detect Primary Key¶
Dango auto-detects primary key columns by looking for columns named exactly id, uuid, or key:
Warning
Columns with an _id suffix (like customer_id or order_id) are foreign keys, not primary keys. The auto-detection only matches exact column names.
Step 4: Choose Strategy¶
Two snapshot strategies are available:
Uses a timestamp column to detect changes. Dango auto-selects this strategy when it detects columns named updated_at, modified_at, or last_modified.
Best for: Tables with a reliable last-modified timestamp.
Step 5: Generated File¶
The wizard creates a snapshot SQL file:
Generated SQL Format¶
Snapshot files are generated in dbt/snapshots/ using a Jinja2 template:
-- Auto-generated by Dango on 2026-05-15 14:30
-- Snapshot: snap_shopify_orders
-- Source: shopify.orders
-- Strategy: timestamp
--
-- This file tracks changes (SCD Type 2) to the source table.
-- dbt adds dbt_valid_from and dbt_valid_to columns automatically.
{% snapshot snap_shopify_orders %}
{{
config(
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at'
)
}}
SELECT * FROM {{ source('shopify', 'orders') }}
{% endsnapshot %}
-- Auto-generated by Dango on 2026-05-15 14:30
-- Snapshot: snap_stripe_customers
-- Source: stripe.customers
-- Strategy: check
--
-- This file tracks changes (SCD Type 2) to the source table.
-- dbt adds dbt_valid_from and dbt_valid_to columns automatically.
{% snapshot snap_stripe_customers %}
{{
config(
target_schema='snapshots',
unique_key='id',
strategy='check',
check_cols='all'
)
}}
SELECT * FROM {{ source('stripe', 'customers') }}
{% endsnapshot %}
Key configuration:
| Parameter | Value | Description |
|---|---|---|
target_schema | snapshots | All snapshots write to the snapshots schema |
unique_key | Auto-detected | Column that uniquely identifies each row |
strategy | timestamp or check | How changes are detected |
updated_at | Column name | Only for timestamp strategy |
check_cols | all | Only for check strategy — compares every column |
Managing Snapshots¶
Listing Snapshots¶
Displays a table of configured snapshots:
┌───────────────────────────┬─────────┬───────────┬────────────┬─────────────────────────────────────────────┐
│ Name │ Source │ Strategy │ Unique Key │ File │
├───────────────────────────┼─────────┼───────────┼────────────┼─────────────────────────────────────────────┤
│ snap_shopify_orders │ shopify │ timestamp │ id │ dbt/snapshots/snap_shopify_orders.sql │
│ snap_stripe_customers │ stripe │ check │ id │ dbt/snapshots/snap_stripe_customers.sql │
└───────────────────────────┴─────────┴───────────┴────────────┴─────────────────────────────────────────────┘
The listing parses snapshot SQL files to extract metadata (name, source, strategy, unique key).
Running Snapshots¶
# Run all snapshots
dango snapshot run
# Run a specific snapshot
dango snapshot run -s snap_shopify_orders
Running a snapshot:
- Acquires a DbtLock (5-minute timeout) to prevent concurrent DuckDB writes
- Executes
dbt snapshotagainst your warehouse - Creates or updates the snapshot table in the
snapshotsschema
DbtLock Timeout
If another process (sync, dbt run) holds the DuckDB write lock, dango snapshot run waits up to 5 minutes. If the lock isn't released, you'll see a timeout error. Wait for the other process to finish, then retry.
Auto-Run After Sync¶
Snapshots run automatically after every dango sync when .sql files exist in dbt/snapshots/. This is handled by the post-sync hook.
The auto-run:
- Checks for
.sqlfiles indbt/snapshots/ - Calls
run_dbt_snapshots()if files exist - Logs failures but does not fail the sync — your data is still loaded even if snapshots encounter an error
# Post-sync hook execution order:
1. Column profiling
2. Staging test enrichment
3. PII scanning
4. Automated analysis
5. dbt snapshots ← runs here
6. Sync notification
Tip
You don't need to set up a separate schedule for snapshots. Once you create a snapshot with dango snapshot add, it runs automatically after every sync.
Querying Snapshot Data¶
Snapshot tables live in the snapshots schema and include two tracking columns added by dbt:
| Column | Type | Description |
|---|---|---|
dbt_valid_from | TIMESTAMP | When this version of the row became active |
dbt_valid_to | TIMESTAMP | When this version was superseded (NULL = current) |
Current State¶
Get the latest version of every row:
Point-in-Time Query¶
See what the data looked like at a specific moment:
SELECT *
FROM snapshots.snap_shopify_orders
WHERE dbt_valid_from <= '2026-05-01'
AND (dbt_valid_to > '2026-05-01' OR dbt_valid_to IS NULL)
Change History¶
Track all changes to a specific record:
SELECT
id,
status,
total_price,
dbt_valid_from,
dbt_valid_to
FROM snapshots.snap_shopify_orders
WHERE id = 12345
ORDER BY dbt_valid_from
Count Changes Over Time¶
SELECT
DATE_TRUNC('day', dbt_valid_from) AS change_date,
COUNT(*) AS changes
FROM snapshots.snap_shopify_orders
WHERE dbt_valid_to IS NOT NULL -- only changed records
GROUP BY 1
ORDER BY 1
DuckDB Snapshots (for Notebooks)¶
DuckDB snapshots create read-only copies of your warehouse for safe notebook exploration. This is separate from dbt snapshots — it's about preventing write lock conflicts.
# Create a snapshot for notebook use
dango snapshot db
# Create a named snapshot
dango snapshot db --user alice
Output:
✓ Snapshot created: warehouse_20260515_143000.duckdb
Path: .dango/snapshots/default/warehouse_20260515_143000.duckdb
Size: 45.2 MB
Key details:
- Snapshots are stored in
.dango/snapshots/{user}/ - Each user keeps up to 3 snapshots (oldest are automatically removed)
- The snapshot is a full copy of
data/warehouse.duckdb - Use the snapshot path in your notebook to avoid blocking syncs or dbt runs
Why DuckDB Snapshots?
DuckDB is single-writer. A read-only connection from a notebook can still acquire shared locks that block write operations. Creating a snapshot copy eliminates this conflict entirely.
Customizing Snapshots¶
The generated snapshot files are fully editable. Common customizations:
Filter Rows¶
Only snapshot active records:
{% snapshot snap_shopify_orders %}
{{ config(
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at'
) }}
SELECT *
FROM {{ source('shopify', 'orders') }}
WHERE cancelled_at IS NULL
{% endsnapshot %}
Select Specific Columns¶
Reduce snapshot table size:
{% snapshot snap_shopify_orders %}
{{ config(
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at'
) }}
SELECT
id,
order_number,
status,
total_price,
customer_id,
updated_at
FROM {{ source('shopify', 'orders') }}
{% endsnapshot %}
Check Specific Columns¶
Instead of comparing all columns, only track changes to specific ones:
{% snapshot snap_stripe_customers %}
{{ config(
target_schema='snapshots',
unique_key='id',
strategy='check',
check_cols=['email', 'name', 'plan']
) }}
SELECT * FROM {{ source('stripe', 'customers') }}
{% endsnapshot %}
Troubleshooting¶
DbtLock Timeout¶
Another process holds the DuckDB write lock. Check for running syncs or dbt operations:
Wait for the other process to complete, then retry.
No Raw Tables Found¶
You need to sync data before creating snapshots. Run dango sync to load data into DuckDB.
Snapshot Table Growing Large¶
Snapshot tables grow with every change detected. For high-volume tables:
- Use timestamp strategy — more efficient than check (only compares one column)
- Filter rows — only snapshot records you care about
- Select fewer columns — reduce the data stored per version
- Use
check_cols— track changes to specific columns instead of all
Strategy Mismatch¶
If you change the strategy of an existing snapshot (e.g., from check to timestamp), dbt may error. Drop the snapshot table and re-run:
Then run dango snapshot run to recreate it.
Next Steps¶
-
Test snapshot queries safely against a dev copy of your database.
-
Learn how staging models clean and deduplicate raw data before snapshotting.
-
Build marts that combine snapshot history with current data.
-
Write Metabase queries against snapshot tables.