Setup Guide
SPAN Identity Graph enables identity resolution directly inside your Snowflake account. All processing runs within Snowflake. No data leaves your environment.
Overview
SPAN clusters records that refer to the same real-world entity and assigns a stable profile_id to each resolved identity.
The Identity Graph:
Normalizes identity fields
Applies configurable matching rules
Clusters related records
Writes results back into Snowflake
SPAN runs entirely as a Snowflake Native App.
Prerequisites
Before installing SPAN, ensure you have:
Permission to install applications from Snowflake Marketplace
Access to a running Snowflake warehouse
A source table containing identity-related fields (e.g., name, email, phone)
Permission to create tables in your selected output schema
SPAN does not require:
External API access
Data export
Python environment setup
Local execution
Installation
Navigate to Snowflake Marketplace.
Search for SPAN Identity Graph.
Click Install.
Select:
A warehouse
An application database (auto-created)
Review and approve requested privileges.
Click Launch.
After installation, SPAN runs entirely inside your Snowflake account.
First-Time Setup
After launching SPAN:
1. Select a Source Table
Choose the Snowflake table containing records you want to resolve.
Example:
Each record must include a unique key column (e.g., ACCOUNT_ID).
2. Map Identity Fields
SPAN requires a field mapping so it can interpret your schema.
You will map source columns to standard identity fields.
Example mapping:
Source Column
Type
SPAN Field
FIRSTNAME
text
first_name
LASTNAME
text
last_name
USER_EMAIL
DOB
text
birthdate
PHONE
phonenumber
phone
STREET_ADDRESS
text
street_address
CITY
text
city
STATE
text
state
ZIP_CODE
text
zip_code
GENDER
text
gender
Field types determine:
Normalization rules
Matching semantics
Comparison logic
Default configurations are provided for common identity schemas.
3. Configure Matching Rules
SPAN uses deterministic blocking rules to efficiently compare records.
Example rule patterns:
first_name + last_name + birthdate + state
first_name + last_name + zip_code
email + phone
phone + gender
These rules define how records are grouped and compared during clustering.
For MVP usage, default rule sets are pre-configured. Advanced tuning can be performed later.
4. Run Identity Graph
Click Run Identity Graph.
SPAN will:
Load source data
Normalize identity fields
Apply matching rules
Cluster related records
Assign a profile_id to each resolved entity
Processing uses your selected Snowflake warehouse.
Output Location
SPAN writes results to a Snowflake table you specify.
Example:
The output table contains:
All original source columns
A new column: profile_id
profile_id represents the resolved unique entity identifier.
Source tables are never modified.
Objects Created by SPAN
Upon installation, SPAN creates:
Application database (managed)
Core processing schemas
Temporary processing tables (auto-managed)
When running the Identity Graph, SPAN creates:
Output identity table (in your selected schema)
SPAN does not:
Modify source tables
Store data externally
Create background scheduled tasks (unless explicitly configured)
Cost & Compute Notes
SPAN uses your selected Snowflake warehouse.
Compute usage depends on:
Table size
Number of blocking rules
Record similarity
No compute is consumed unless the Identity Graph is actively running.
Troubleshooting
Insufficient Privileges
Ensure:
You can read from source tables
You can create tables in the output schema
Warehouse Suspended
Resume the selected warehouse before running the graph.
Column Mapping Errors
Confirm:
Selected columns exist in the source table
The key column uniquely identifies each record
For additional troubleshooting steps, refer to the following article: Troubleshoot
Security Model
All processing occurs inside your Snowflake account.
No data leaves your environment.
SPAN operates under Snowflake RBAC.
All activity is auditable via Snowflake system logs.
Next Steps
After generating your first Identity Graph:
Validate record counts
Compare cluster sizes
Join downstream models on profile_id
Iterate on matching rules if needed
SPAN is designed to make identity resolution a governed, queryable data primitive inside Snowflake.
Last updated