# What is SPAN?

## Identity as Infrastructure

SPAN is a Snowflake Native App that turns identity resolution logic and outputs into durable data.

In most data stacks, identity lives in code. It exists as `JOIN` clauses, `CASE` statements, `COALESCE` logic, and DBT models or dynamic tables. This logic is temporary and often fragmented. It runs every time a DBT runs or a source table updates. And changes often occur silently and cascade unpredictably.

SPAN persists identity, and key dimensions and metrics that sit on top of identity, as stateful first class artifacts in your Snowflake AI Data Cloud. Identity resolution logic is stored and versioned, current identities are resolved and persisted to a Snowflake table, and overrides to the core logic are strictly governed so they can be persisted. For all SPAN artifacts, history and lineage are stored, managed, and readily available via SnowSQL so you have full visibility into your data at all times.

### Core Concept

SPAN handles identity and its associated dimensions and metrics as a stateful artifacts in your Snowflake AI Data Cloud.&#x20;

1. **Ingest**: SPAN source tables to understand personal identifiers like emails, names, address, device IDs.
2. **Link**: It maps connections between these identifiers in the ID Graph using matching logic that you control.
3. **Persist**: It writes the results to a schema SPAN app managed schema in your Snowflake environment.
4. **Compile**: SPAN compiles critical dimension and fact tables using the stable `profile_id` generated by the ID Graph
5. **Query / Access**: All data is materialized as views in a shared access schema so that you can connect SPAN artifacts easily and securely into downstream data models, dashboards, or AI powered applications

> When you use SPAN, identity is defined once, centrally, and used everywhere

### Architecture

SPAN runs inside your Snowflake account, leveraging all of the built-in governance and compliance that your Snowflake environment provides.

* **Native Compute**: SPAN runs using the Snowflake warehouse you choose. You control the cost.
* **Zero Data Movement**: No data leaves your security perimeter. SPAN reads from your existing tables, computation occurs on the provisioned Snowflake compute pool you define at setup, and results are materialized to your Snowflake environment.
* **SQL Native**: The output artifacts are standard Snowflake views making it easy to connect directly into DBT pipelines, Dynamic Tables, Looker, Tableau, and any other tool connected to your Snowflake environment.

### The Data Model

#### **Identity Table (ID Graph)**

*Required field*

* `SOURCE_ID`: \[STRING] The primary key from source tables
* `SOURCE_DATASET`: \[STRING] The fully qualified table name for each source dataset
* `PROFILE_ID`: \[UUID] The default column name for SPAN identity
* `CREATED_DATE`: \[TIMESTAMP] Date/time when an source record (`SOURCE_ID` + `SOURCE_DATASET`) is first ingested by SPAN ID Graph
* `MODIFIED_DATE`: \[TIMESTAMP] Last date/time when the identity was changed - new source record added, source record removed&#x20;

<figure><img src="https://1294232733-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvuGs6oah0Sb73W57eyvW%2Fuploads%2FnOHtkiwtJZ4nqQnd4Qpk%2Fimage.png?alt=media&#x26;token=badc2ef4-c208-408a-a114-38bba05c2b65" alt=""><figcaption></figcaption></figure>

> The user has the option to add columns used for record matching to the ID Graph output. This can be useful for reviewing and validation of identities.&#x20;

#### **Dimensions Tables (Compiler)**

There are only two default and required attributes for dimension tables, `PROFILE_ID` and `COMPUTED_AT`. All other columns are defined by the user.&#x20;

* `PROFILE_ID` is the unique identity defined by the ID Graph.&#x20;
* `COMPUTE_AT` is the timestamp for the most recent run of the Compiler.&#x20;

**Demographic***:* a transformation module that creates a unified table with one record per `PROFILE_ID` by consolidating "demographic" data from multiple source systems. This table answers: *What do we know about this customer?* (identity, contact, address). It produces one canonical value per profile per field based on waterfall precedence. The user has full control over what "demographics" and what precedence rules to apply.

<figure><img src="https://1294232733-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvuGs6oah0Sb73W57eyvW%2Fuploads%2F4txQfK0hazoOMIuIdhJd%2Fimage.png?alt=media&#x26;token=1b39f5a2-eec4-4a41-92e7-7efadcf8d9e2" alt=""><figcaption></figcaption></figure>

**Metric***:* a transformation module that calculates a unified table with one record per `PROFILE_ID`  by combining rolling-window metrics calculations with aggregation methods. Source fact tables (transactional events like orders, ticket purchases, and newsletter signups) are used to calculate all metrics. This table answers: *What did this customer do over time?* e.g. lifetime value (LTV), purchase frequency, recency. Metrics are also used as pre-computed KPIs for segmentation, targeting, and reporting. The user has full control over what metrics to include and how they should be calculated.

<figure><img src="https://1294232733-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvuGs6oah0Sb73W57eyvW%2Fuploads%2Fd9RVNqRsw26WrwddvqGi%2Fimage.png?alt=media&#x26;token=fab3ba8c-93bb-4e74-a0e2-7b286112d618" alt=""><figcaption></figcaption></figure>

**Segment:** a transformation module that creates customer segment flags and cohorts by evaluating business rules against compiled demographics and metrics. It enables customer segmentation for marketing campaigns, personalization, and business intelligence by defining pre-computed, consistent customer classifications.The user has full control over what segments to include and how they should be calculated.

<figure><img src="https://1294232733-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvuGs6oah0Sb73W57eyvW%2Fuploads%2FyZWQHtJXzCEnRPTOyy7J%2Fimage.png?alt=media&#x26;token=195bdbf6-cbb3-4868-adeb-27f7642f6a95" alt=""><figcaption></figcaption></figure>

#### **Facts (Compiler)**

**Consent*****:*** a transformation module that creates a unified privacy consent fact table by consolidating consent data from multiple source systems with inconsistent formats. It implements configurable conflict resolution using precedence rules (typically: denied > unknown > granted) to create a single source of truth for customer consent while supporting both explicit consent columns and implicit consent inferred from transactional events. In SPAN consent is organized based on where did the consent come from (consent source), where will direct communication be delivered (consent medium), and purpose (is messaging promotional or transactional?). Consent is the only artifact generated by Compiler with a strict data schema.

* `HASH_ID`: \[STRING] Hash function applied to generate unique key for the data
* `PROFILE_ID`: \[UUID] The default column name for SPAN identity
* `CONSENT_SOURCE`: \[STRING] Source dataset for consent signal (uses the SPAN configuration label rather than fully qualified name for easier semantics)
* `CONSENT_MEDIUM`: \[STRING] What is the intended distribution medium for the consent signal. Default values are: \[email, sms, targeted\_advertising].&#x20;
* `CONSENT_PURPOSE`: \[STRING] What is the intended purpose for the consent signal. Default values are: \[promotional, transactional].&#x20;
* `CONSENT_STATE`:\[STRING] What is the status of the consent signal. Default values are: \[granted, denied, unknown].&#x20;
* `COMPUTED_AT`:\[TIMESTAMP] When was this table last compiled.&#x20;

<figure><img src="https://1294232733-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvuGs6oah0Sb73W57eyvW%2Fuploads%2F8sEokycKRn0QRphTQqJb%2Fimage.png?alt=media&#x26;token=eb96d4a6-cf75-4758-9706-7ec778ea29b8" alt=""><figcaption></figcaption></figure>

**Helper Fact Tables:** a transformation module enriches fact tables (transactional event data like orders, ticket purchases, etc.) used for calculating metrics. Key enrichment is adding the `PROFILE_ID` , `FACT_PK`, and `COMPUTED_AT`. This makes it easy to join fact tables the core dimensions and fact tables generated by compiler. This helps increase ease of use and flexibility downstream of SPAN.

`PROFILE_ID` and `COMPUTED_AT` are consistent with other tables generated by SPAN Compiler. `FACT_PK` is similar to `HASH_ID` on the Consent table. It is composite key generated using a standard hashing function.&#x20;

Here is an example fact table:<br>

<figure><img src="https://1294232733-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvuGs6oah0Sb73W57eyvW%2Fuploads%2FLDyQhsQScKd6wNFCzr4w%2Fimage.png?alt=media&#x26;token=ccdab367-614f-4535-a7af-bbee91c67a75" alt=""><figcaption></figcaption></figure>

> SPAN tables are designed to Analytics Engineers allow treat each SPAN artifact as piece of infrastructure that supports downstream data products

### Integration Patterns

Data teams treat SPAN as a read-only source of truth for their customer data where identity, demographic, metric, segmentation, and consent logic is defined once and used everywhere.

**1. The Universal Join Key**

The `PROFILE_ID` represents a single key that defines an identity. It is the standard primary and foreign key relating all compiled dimension and fact tables.&#x20;

**2. Simplifying Data Models**

Data and analytics engineers can remove complex resolution logic from their models. With SPAN, they do not need to write 50-line `CASE` statements to resolve user IDs or to define business metrics. They simply perform a single `JOIN` to the SPAN tables using either the standard `PROFILE_ID` when building downstream of SPAN or `SOURCE_ID` and `SOURCE_DATASET` when using models upstream of SPAN.

**3. Cross-Domain Mapping**

Marketing systems use emails, product systems use UUIDs, and transactional platforms use accounting IDs. SPAN maps all of these to a single `PROFILE_ID`. Teams use this bridge to link ad spend directly to feature usage, to checkout A/B testing, to revenue metrics without a spider web of custom scripts.

> Data doesn't need to live in silos any more, SPAN's creates a lingua franca between your data sources via the `PROFILE_ID`

***
