Migration from Spreadsheets¶
Move your team from spreadsheet-based reporting to a proper analytics pipeline.
What You'll Build¶
By the end of this tutorial, you'll have:
- CSV exports imported into DuckDB (a fast analytics database)
- Data organized into clean layers (raw → staging → business models)
- A live dashboard replacing your spreadsheet reports
- A path to automated refresh
Duration: ~25 minutes
Why Migrate?¶
| Spreadsheets | Dango |
|---|---|
| Manual copy-paste updates | Automated data syncs |
| Conflicting versions across team | Single source of truth |
| VLOOKUP breaks at scale | SQL joins across any data |
| No access control | Role-based permissions |
| Formula spaghetti | Documented, testable transformations |
| Charts tied to one file | Shared, interactive dashboards |
Prerequisites¶
- Dango installed (
pip install getdango) - Docker running (for Metabase, the visualization tool)
- One or more CSV exports from your spreadsheets
- Basic SQL knowledge (we'll guide you through it)
Step 1: Export Your Spreadsheets¶
From Google Sheets¶
- Open your spreadsheet
- File → Download → Comma Separated Values (.csv)
- For multi-tab spreadsheets, download each tab separately
From Excel¶
- Open your workbook
- File → Save As → choose CSV UTF-8 format
- Save each sheet as a separate CSV
Multi-Tab Spreadsheets
Name your CSV files descriptively: sales_2024.csv, customers.csv, products.csv. Each file becomes a separate table in your database.
Step 2: Create a Dango Project¶
# Create a new project
dango init my-analytics
cd my-analytics
# Start services (DuckDB warehouse + Metabase for dashboards)
dango start
Authentication
dango init prompts you to set an admin password. This secures your web UI and dashboards. See Authentication for details.
Step 3: Import CSV Files¶
Add a File Import Source¶
Run the interactive wizard:
When prompted:
- Select File Import from the source categories
- Source name:
my_data(or something descriptive likesales_data) - Directory:
data/uploads - File pattern:
*.csv
Copy Your CSVs¶
mkdir -p data/uploads
cp ~/Downloads/sales_2024.csv data/uploads/
cp ~/Downloads/customers.csv data/uploads/
cp ~/Downloads/products.csv data/uploads/
Sync to Database¶
This loads each CSV as a table in DuckDB. Verify the import:
You should see tables like raw_my_data.sales_2024, raw_my_data.customers, etc.
What is DuckDB?
DuckDB is a fast, embedded analytics database. Think of it as a SQLite designed for analytical queries. It runs locally — no server to manage — and handles millions of rows efficiently.
Step 4: Understand Data Layers¶
Dango organizes data into layers, similar to how you might separate raw data from summary sheets in a spreadsheet:
graph LR
A[Raw Data] --> B[Staging]
B --> C[Business Models]
C --> D[Dashboard] | Layer | Purpose | Spreadsheet Equivalent |
|---|---|---|
| Raw | Exact copy of source data | Your original CSV data |
| Staging | Cleaned, renamed, typed | A "clean" copy of each sheet |
| Business Models | Calculations, joins, metrics | Summary/pivot sheets, VLOOKUP sheets |
See Data Layers for a deeper explanation.
Step 5: Explore Auto-Generated Staging Models¶
After syncing, Dango auto-generates staging models that clean your raw data:
Each staging model:
- Renames columns to
snake_case - Casts types (dates, numbers)
- Filters out empty rows
What is dbt?
dbt (data build tool) transforms data using SQL. Instead of writing formulas across cells, you write SQL queries that produce new tables. Each query is a "model" — a reusable, testable building block.
Step 6: Build Business Models¶
This is where you replace your spreadsheet formulas with SQL. Create models in dbt/models/marts/.
Example: Monthly Sales Summary¶
If your spreadsheet had a pivot table summarizing sales by month, replace it with:
-- dbt/models/marts/fct_monthly_sales.sql
{{ config(materialized='table') }}
with sales as (
select * from {{ ref('stg_my_data_sales_2024') }}
)
select
date_trunc('month', sale_date::date) as month,
count(*) as total_orders,
sum(amount) as total_revenue,
avg(amount) as avg_order_value,
count(distinct customer_id) as unique_customers
from sales
group by 1
order by 1
Example: Customer Summary¶
Replace your VLOOKUP-based customer sheet:
-- dbt/models/marts/dim_customer_summary.sql
{{ config(materialized='table') }}
with customers as (
select * from {{ ref('stg_my_data_customers') }}
),
sales as (
select * from {{ ref('stg_my_data_sales_2024') }}
)
select
c.customer_id,
c.name,
c.email,
c.region,
count(s.order_id) as total_orders,
coalesce(sum(s.amount), 0) as lifetime_value,
min(s.sale_date) as first_purchase,
max(s.sale_date) as last_purchase
from customers c
left join sales s using (customer_id)
group by 1, 2, 3, 4
Run Models¶
Tip
dango run executes dbt build, which runs both your models and any data quality tests in dependency order.
Step 7: Create Your First Dashboard¶
Open the Dango web UI at http://localhost:8800 and log in with your admin credentials. Navigate to Metabase via the sidebar.
What is Metabase?
Metabase is an open-source business intelligence tool. It connects to your DuckDB database and lets you create charts, tables, and dashboards — no code required.
Create Questions¶
Monthly Revenue (Line chart):
- Click + New → Question → Native query
- Enter:
- Click Visualize → choose Line chart
- Save as "Monthly Revenue"
Top Customers (Table):
- + New → Question → Native query
- Enter:
- Save as "Top Customers"
Key Metrics (Number cards):
- + New → Question → Native query
- Enter:
- Save as "Key Metrics"
Build Dashboard¶
- + New → Dashboard
- Name it "Sales Overview"
- Add your saved questions
- Drag to arrange:
┌─────────────────────────────────────────┐
│ Total Revenue │ Total Orders │ AOV │
├─────────────────────────────────────────┤
│ Monthly Revenue (Line chart) │
├─────────────────────────────────────────┤
│ Top Customers (Table) │
└─────────────────────────────────────────┘
Step 8: Set Up Ongoing Imports¶
Manual Refresh¶
When you get new CSV exports, drop them into data/uploads/ and re-sync:
Your dashboard updates automatically.
Connect Google Sheets Directly¶
Instead of exporting CSVs, connect Google Sheets as a live source:
# Authenticate with Google
dango oauth google_sheets
# Add via source wizard (select Google Sheets)
dango source add
See Google Sheets for setup details.
Schedule Automatic Syncs¶
For data that updates regularly:
The wizard walks you through setting up daily or weekly syncs. See Scheduled Syncs for details.
Step 9: Share with Your Team¶
Once your dashboard is ready, add team members so they can view it too:
# Add a viewer (can see dashboards, can't change data)
dango auth add-user [email protected] --role viewer
Share the invite link with your team. See Team Setup for the full guide on roles and permissions.
Spreadsheet vs Dango Reference¶
| Spreadsheet | Dango (SQL) | Example |
|---|---|---|
VLOOKUP(A2, Sheet2!A:B, 2) | JOIN ... USING (key) | FROM sales JOIN customers USING (customer_id) |
| Pivot table | GROUP BY | SELECT region, sum(amount) GROUP BY region |
SUMIFS(range, criteria) | SUM(CASE WHEN ...) | sum(case when status = 'paid' then amount end) |
COUNTIF(range, ">100") | COUNT + CASE WHEN | count(case when amount > 100 then 1 end) |
IF(A1>0, "Yes", "No") | CASE WHEN | case when amount > 0 then 'Yes' else 'No' end |
TEXT(date, "YYYY-MM") | date_trunc | date_trunc('month', sale_date) |
UNIQUE(range) | DISTINCT | select distinct customer_id from sales |
| Copy sheet for backup | dango snapshot db | Point-in-time copy of your database |
| Share via email | Role-based access | dango auth add-user ... --role viewer |
Summary¶
You've migrated from spreadsheets to a proper analytics pipeline:
- CSV data imported into DuckDB
- Data organized into clean layers
- SQL models replacing spreadsheet formulas
- Interactive dashboard replacing static charts
- Path to automated refresh and team sharing
What's Different Now¶
- One source of truth — no more conflicting spreadsheet versions
- Automatic updates — sync new data without copy-paste
- Testable logic — SQL models can be validated, unlike hidden formulas
- Team access — role-based permissions instead of sharing files
- Scales — handles millions of rows without slowdown
Next Steps¶
- E-commerce Tutorial - Build a more complex pipeline
- Team Setup - Add team members with roles
- Google Sheets - Connect sheets directly instead of CSV exports
- Scheduled Syncs - Automate data refresh