What is SPAN?

Identity as Infrastructure

SPAN is a native Snowflake primitive. It turns identity resolution from logic into durable data.

In most data stacks, identity lives in code. It exists in JOIN clauses, CASE statements, and dbt models. This logic is temporary. It runs every time a query runs.

SPAN persists identity as state. It saves the relationships between customer identifiers in tables. You query these tables with SQL.

Core Concept: Identity as State

SPAN handles identity upstream. It does not calculate it downstream.

  1. Ingest: SPAN reads raw identifiers (emails, device IDs, account IDs) from your source tables.

  2. Link: It maps connections between these identifiers with deterministic logic.

  3. Persist: It writes the results to a shared schema in your Snowflake account.

  4. Query: Downstream models join to the SPAN tables to fetch a stable span_id.

Architecture

SPAN runs inside your Snowflake account. It fits teams with strict security rules.

  • Native Compute: SPAN runs on your existing Snowflake warehouses. You control the cost.

  • Zero Data Movement: No data leaves your security perimeter. SPAN reads from your raw stage and writes to your analytics stage.

  • Standard SQL: The output is standard Snowflake tables. It works with dbt, Looker, Tableau, and any tool that speaks SQL.

The Data Model

SPAN builds tables that map your customer data.

The primary interface is the Identity Map. This table links every specific identifier to a stable span_id.

Table Structure:

source_id

source_type

span_id

user_123

backend_db

span_abc123

jdoe@gmail.com

email

span_abc123

cookie_xyz

segment

span_abc123

Engineers treat this table as infrastructure. You join against it to resolve identity for any record.

Integration Patterns

Teams treat SPAN as a read-only source of truth.

1. The Universal Join Key

Analysts use the span_id as the standard grouping key for metrics. When they calculate Active Users, they join the raw event stream to the Identity Map first. This guarantees that a user who logs in from three devices counts as one entity.

2. Simplifying dbt Models

Data engineers remove complex resolution logic from their marts. They do not write 50-line CASE statements to resolve user IDs. They perform a single LEFT JOIN to the SPAN table.

3. Cross-Domain Mapping

Marketing systems use emails. Product systems use UUIDs. SPAN maps both to a single span_id. Teams use this bridge to link ad spend directly to feature usage without custom scripts.


Last updated