Setup Guide

SPAN Identity Graph enables identity resolution directly inside your Snowflake account. All processing runs within Snowflake. No data leaves your environment.

Overview

SPAN clusters records that refer to the same real-world entity and assigns a stable profile_id to each resolved identity.

The Identity Graph:

  • Normalizes identity fields

  • Applies configurable matching rules

  • Clusters related records

  • Writes results back into Snowflake

SPAN runs entirely as a Snowflake Native App.


Prerequisites

Before installing SPAN, ensure you have:

  • Permission to install applications from Snowflake Marketplace

  • Access to a running Snowflake warehouse

  • A source table containing identity-related fields (e.g., name, email, phone)

  • Permission to create tables in your selected output schema

SPAN does not require:

  • External API access

  • Data export

  • Python environment setup

  • Local execution


Installation

  1. Navigate to Snowflake Marketplace.

  2. Search for SPAN Identity Graph.

  3. Click Install.

  4. Select:

  • A warehouse

  • An application database (auto-created)

  1. Review and approve requested privileges.

  2. Click Launch.

After installation, SPAN runs entirely inside your Snowflake account.


First-Time Setup

After launching SPAN:

1. Select a Source Table

Choose the Snowflake table containing records you want to resolve.

Example:

Each record must include a unique key column (e.g., ACCOUNT_ID).

2. Map Identity Fields

SPAN requires a field mapping so it can interpret your schema.

You will map source columns to standard identity fields.

Example mapping:

Source Column

Type

SPAN Field

FIRSTNAME

text

first_name

LASTNAME

text

last_name

USER_EMAIL

email

email

DOB

text

birthdate

PHONE

phonenumber

phone

STREET_ADDRESS

text

street_address

CITY

text

city

STATE

text

state

ZIP_CODE

text

zip_code

GENDER

text

gender

Field types determine:

  • Normalization rules

  • Matching semantics

  • Comparison logic

Default configurations are provided for common identity schemas.

3. Configure Matching Rules

SPAN uses deterministic blocking rules to efficiently compare records.

Example rule patterns:

  • first_name + last_name + birthdate + state

  • first_name + last_name + zip_code

  • email + phone

  • phone + gender

These rules define how records are grouped and compared during clustering.

For MVP usage, default rule sets are pre-configured. Advanced tuning can be performed later.

4. Run Identity Graph

Click Run Identity Graph.

SPAN will:

  1. Load source data

  2. Normalize identity fields

  3. Apply matching rules

  4. Cluster related records

  5. Assign a profile_id to each resolved entity

Processing uses your selected Snowflake warehouse.


Output Location

SPAN writes results to a Snowflake table you specify.

Example:

The output table contains:

  • All original source columns

  • A new column: profile_id

profile_id represents the resolved unique entity identifier.

Source tables are never modified.


Objects Created by SPAN

Upon installation, SPAN creates:

  • Application database (managed)

  • Core processing schemas

  • Temporary processing tables (auto-managed)

When running the Identity Graph, SPAN creates:

  • Output identity table (in your selected schema)

SPAN does not:

  • Modify source tables

  • Store data externally

  • Create background scheduled tasks (unless explicitly configured)


Cost & Compute Notes

  • SPAN uses your selected Snowflake warehouse.

  • Compute usage depends on:

  • Table size

  • Number of blocking rules

  • Record similarity

  • No compute is consumed unless the Identity Graph is actively running.


Troubleshooting

Insufficient Privileges

Ensure:

  • You can read from source tables

  • You can create tables in the output schema

Warehouse Suspended

Resume the selected warehouse before running the graph.

Column Mapping Errors

Confirm:

  • Selected columns exist in the source table

  • The key column uniquely identifies each record

For additional troubleshooting steps, refer to the following article: Troubleshoot


Security Model

  • All processing occurs inside your Snowflake account.

  • No data leaves your environment.

  • SPAN operates under Snowflake RBAC.

  • All activity is auditable via Snowflake system logs.


Next Steps

After generating your first Identity Graph:

  • Validate record counts

  • Compare cluster sizes

  • Join downstream models on profile_id

  • Iterate on matching rules if needed

SPAN is designed to make identity resolution a governed, queryable data primitive inside Snowflake.


Last updated