Skip to content

Snapshots (SCD Type 2)

Track historical changes to your data using dbt snapshots. Snapshots capture the state of source tables over time, creating a complete audit trail of every change.


Overview

Dango supports two types of snapshots:

Type Purpose Command
dbt Snapshots SCD Type 2 change tracking — records when rows are inserted, updated, or deleted dango snapshot add, dango snapshot run
DuckDB Snapshots Read-only database copies for safe notebook exploration dango snapshot db

dbt snapshots add two columns to your data:

  • dbt_valid_from — when this version of the row became active
  • dbt_valid_to — when this version was superseded (NULL = current)

This lets you answer questions like "what did this customer record look like last Tuesday?" or "when did this order status change?"


Quick Start

# 1. Create a snapshot (interactive wizard)
dango snapshot add

# 2. Run the snapshot to capture current state
dango snapshot run

# 3. Query historical data
duckdb data/warehouse.duckdb -c "
  SELECT * FROM snapshots.snap_shopify_orders
  WHERE dbt_valid_to IS NULL  -- current records only
"

Creating Snapshots

The dango snapshot add wizard walks you through creating a dbt snapshot:

Step 1: Select a Source

The wizard queries DuckDB for schemas matching raw_* and lists available sources:

? Select source:
  ❯ shopify
    stripe
    google_sheets

Note

Sources must have data loaded first. Run dango sync before creating snapshots.

Step 2: Select a Table

Tables from the selected source are listed, excluding internal dlt tables (_dlt_*):

? Select table:
  ❯ orders
    customers
    products

Step 3: Detect Primary Key

Dango auto-detects primary key columns by looking for columns named exactly id, uuid, or key:

? Unique key column [id]: id

Warning

Columns with an _id suffix (like customer_id or order_id) are foreign keys, not primary keys. The auto-detection only matches exact column names.

Step 4: Choose Strategy

Two snapshot strategies are available:

Uses a timestamp column to detect changes. Dango auto-selects this strategy when it detects columns named updated_at, modified_at, or last_modified.

? Snapshot strategy:
  ❯ timestamp
    check
? Timestamp column: updated_at

Best for: Tables with a reliable last-modified timestamp.

Compares all column values to detect changes. Use this when no timestamp column exists.

? Snapshot strategy:
    timestamp
  ❯ check

Best for: Tables without timestamps, or when you need to catch changes to any column.

Step 5: Generated File

The wizard creates a snapshot SQL file:

✓ Created snapshot snap_shopify_orders
  Run 'dango snapshot run' to execute the snapshot.

Generated SQL Format

Snapshot files are generated in dbt/snapshots/ using a Jinja2 template:

-- Auto-generated by Dango on 2026-05-15 14:30
-- Snapshot: snap_shopify_orders
-- Source: shopify.orders
-- Strategy: timestamp
--
-- This file tracks changes (SCD Type 2) to the source table.
-- dbt adds dbt_valid_from and dbt_valid_to columns automatically.

{% snapshot snap_shopify_orders %}

  {{
    config(
      target_schema='snapshots',
      unique_key='id',
      strategy='timestamp',
      updated_at='updated_at'
    )
  }}

  SELECT * FROM {{ source('shopify', 'orders') }}

{% endsnapshot %}
-- Auto-generated by Dango on 2026-05-15 14:30
-- Snapshot: snap_stripe_customers
-- Source: stripe.customers
-- Strategy: check
--
-- This file tracks changes (SCD Type 2) to the source table.
-- dbt adds dbt_valid_from and dbt_valid_to columns automatically.

{% snapshot snap_stripe_customers %}

  {{
    config(
      target_schema='snapshots',
      unique_key='id',
      strategy='check',
      check_cols='all'
    )
  }}

  SELECT * FROM {{ source('stripe', 'customers') }}

{% endsnapshot %}

Key configuration:

Parameter Value Description
target_schema snapshots All snapshots write to the snapshots schema
unique_key Auto-detected Column that uniquely identifies each row
strategy timestamp or check How changes are detected
updated_at Column name Only for timestamp strategy
check_cols all Only for check strategy — compares every column

Managing Snapshots

Listing Snapshots

dango snapshot list

Displays a table of configured snapshots:

┌───────────────────────────┬─────────┬───────────┬────────────┬─────────────────────────────────────────────┐
│ Name                      │ Source  │ Strategy  │ Unique Key │ File                                        │
├───────────────────────────┼─────────┼───────────┼────────────┼─────────────────────────────────────────────┤
│ snap_shopify_orders       │ shopify │ timestamp │ id         │ dbt/snapshots/snap_shopify_orders.sql        │
│ snap_stripe_customers     │ stripe  │ check     │ id         │ dbt/snapshots/snap_stripe_customers.sql      │
└───────────────────────────┴─────────┴───────────┴────────────┴─────────────────────────────────────────────┘

The listing parses snapshot SQL files to extract metadata (name, source, strategy, unique key).

Running Snapshots

# Run all snapshots
dango snapshot run

# Run a specific snapshot
dango snapshot run -s snap_shopify_orders

Running a snapshot:

  1. Acquires a DbtLock (5-minute timeout) to prevent concurrent DuckDB writes
  2. Executes dbt snapshot against your warehouse
  3. Creates or updates the snapshot table in the snapshots schema
Running dbt snapshot...

✓ dbt snapshot completed successfully.

DbtLock Timeout

If another process (sync, dbt run) holds the DuckDB write lock, dango snapshot run waits up to 5 minutes. If the lock isn't released, you'll see a timeout error. Wait for the other process to finish, then retry.


Auto-Run After Sync

Snapshots run automatically after every dango sync when .sql files exist in dbt/snapshots/. This is handled by the post-sync hook.

The auto-run:

  • Checks for .sql files in dbt/snapshots/
  • Calls run_dbt_snapshots() if files exist
  • Logs failures but does not fail the sync — your data is still loaded even if snapshots encounter an error
# Post-sync hook execution order:
1. Column profiling
2. Staging test enrichment
3. PII scanning
4. Automated analysis
5. dbt snapshots  ← runs here
6. Sync notification

Tip

You don't need to set up a separate schedule for snapshots. Once you create a snapshot with dango snapshot add, it runs automatically after every sync.


Querying Snapshot Data

Snapshot tables live in the snapshots schema and include two tracking columns added by dbt:

Column Type Description
dbt_valid_from TIMESTAMP When this version of the row became active
dbt_valid_to TIMESTAMP When this version was superseded (NULL = current)

Current State

Get the latest version of every row:

SELECT *
FROM snapshots.snap_shopify_orders
WHERE dbt_valid_to IS NULL

Point-in-Time Query

See what the data looked like at a specific moment:

SELECT *
FROM snapshots.snap_shopify_orders
WHERE dbt_valid_from <= '2026-05-01'
  AND (dbt_valid_to > '2026-05-01' OR dbt_valid_to IS NULL)

Change History

Track all changes to a specific record:

SELECT
  id,
  status,
  total_price,
  dbt_valid_from,
  dbt_valid_to
FROM snapshots.snap_shopify_orders
WHERE id = 12345
ORDER BY dbt_valid_from

Count Changes Over Time

SELECT
  DATE_TRUNC('day', dbt_valid_from) AS change_date,
  COUNT(*) AS changes
FROM snapshots.snap_shopify_orders
WHERE dbt_valid_to IS NOT NULL  -- only changed records
GROUP BY 1
ORDER BY 1

DuckDB Snapshots (for Notebooks)

DuckDB snapshots create read-only copies of your warehouse for safe notebook exploration. This is separate from dbt snapshots — it's about preventing write lock conflicts.

# Create a snapshot for notebook use
dango snapshot db

# Create a named snapshot
dango snapshot db --user alice

Output:

✓ Snapshot created: warehouse_20260515_143000.duckdb
  Path: .dango/snapshots/default/warehouse_20260515_143000.duckdb
  Size: 45.2 MB

Key details:

  • Snapshots are stored in .dango/snapshots/{user}/
  • Each user keeps up to 3 snapshots (oldest are automatically removed)
  • The snapshot is a full copy of data/warehouse.duckdb
  • Use the snapshot path in your notebook to avoid blocking syncs or dbt runs

Why DuckDB Snapshots?

DuckDB is single-writer. A read-only connection from a notebook can still acquire shared locks that block write operations. Creating a snapshot copy eliminates this conflict entirely.


Customizing Snapshots

The generated snapshot files are fully editable. Common customizations:

Filter Rows

Only snapshot active records:

{% snapshot snap_shopify_orders %}

  {{ config(
    target_schema='snapshots',
    unique_key='id',
    strategy='timestamp',
    updated_at='updated_at'
  ) }}

  SELECT *
  FROM {{ source('shopify', 'orders') }}
  WHERE cancelled_at IS NULL

{% endsnapshot %}

Select Specific Columns

Reduce snapshot table size:

{% snapshot snap_shopify_orders %}

  {{ config(
    target_schema='snapshots',
    unique_key='id',
    strategy='timestamp',
    updated_at='updated_at'
  ) }}

  SELECT
    id,
    order_number,
    status,
    total_price,
    customer_id,
    updated_at
  FROM {{ source('shopify', 'orders') }}

{% endsnapshot %}

Check Specific Columns

Instead of comparing all columns, only track changes to specific ones:

{% snapshot snap_stripe_customers %}

  {{ config(
    target_schema='snapshots',
    unique_key='id',
    strategy='check',
    check_cols=['email', 'name', 'plan']
  ) }}

  SELECT * FROM {{ source('stripe', 'customers') }}

{% endsnapshot %}

Troubleshooting

DbtLock Timeout

Error: Could not acquire DbtLock within 300 seconds.

Another process holds the DuckDB write lock. Check for running syncs or dbt operations:

ps aux | grep -E "(dango|dbt)"

Wait for the other process to complete, then retry.

No Raw Tables Found

No raw tables found. Run 'dango sync' to load data first.

You need to sync data before creating snapshots. Run dango sync to load data into DuckDB.

Snapshot Table Growing Large

Snapshot tables grow with every change detected. For high-volume tables:

  1. Use timestamp strategy — more efficient than check (only compares one column)
  2. Filter rows — only snapshot records you care about
  3. Select fewer columns — reduce the data stored per version
  4. Use check_cols — track changes to specific columns instead of all

Strategy Mismatch

If you change the strategy of an existing snapshot (e.g., from check to timestamp), dbt may error. Drop the snapshot table and re-run:

-- In DuckDB
DROP TABLE IF EXISTS snapshots.snap_shopify_orders;

Then run dango snapshot run to recreate it.


Next Steps

  • Dev Workflow

    Test snapshot queries safely against a dev copy of your database.

  • Staging Models

    Learn how staging models clean and deduplicate raw data before snapshotting.

  • Custom Models

    Build marts that combine snapshot history with current data.

  • SQL Queries

    Write Metabase queries against snapshot tables.