What is SPAN?
Identity as Infrastructure
SPAN is a native Snowflake primitive. It turns identity resolution from logic into durable data.
In most data stacks, identity lives in code. It exists in JOIN clauses, CASE statements, and dbt models. This logic is temporary. It runs every time a query runs.
SPAN persists identity as state. It saves the relationships between customer identifiers in tables. You query these tables with SQL.
Core Concept: Identity as State
SPAN handles identity upstream. It does not calculate it downstream.
Ingest: SPAN reads raw identifiers (emails, device IDs, account IDs) from your source tables.
Link: It maps connections between these identifiers with deterministic logic.
Persist: It writes the results to a shared schema in your Snowflake account.
Query: Downstream models join to the SPAN tables to fetch a stable
span_id.
Architecture
SPAN runs inside your Snowflake account. It fits teams with strict security rules.
Native Compute: SPAN runs on your existing Snowflake warehouses. You control the cost.
Zero Data Movement: No data leaves your security perimeter. SPAN reads from your raw stage and writes to your analytics stage.
Standard SQL: The output is standard Snowflake tables. It works with dbt, Looker, Tableau, and any tool that speaks SQL.
The Data Model
SPAN builds tables that map your customer data.
The primary interface is the Identity Map. This table links every specific identifier to a stable span_id.
Table Structure:
source_id
source_type
span_id
user_123
backend_db
span_abc123
jdoe@gmail.com
email
span_abc123
cookie_xyz
segment
span_abc123
Engineers treat this table as infrastructure. You join against it to resolve identity for any record.
Integration Patterns
Teams treat SPAN as a read-only source of truth.
1. The Universal Join Key
Analysts use the span_id as the standard grouping key for metrics. When they calculate Active Users, they join the raw event stream to the Identity Map first. This guarantees that a user who logs in from three devices counts as one entity.
2. Simplifying dbt Models
Data engineers remove complex resolution logic from their marts. They do not write 50-line CASE statements to resolve user IDs. They perform a single LEFT JOIN to the SPAN table.
3. Cross-Domain Mapping
Marketing systems use emails. Product systems use UUIDs. SPAN maps both to a single span_id. Teams use this bridge to link ad spend directly to feature usage without custom scripts.
Last updated