02-sops

bq to snowflake ga4 via gard connector

Wed Apr 15 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·sop ·status: draft-v1

GA4 → Snowflake via GARD — Setup Guide

What this guide does

This SOP installs the Snowflake Connector for Google Analytics Raw Data (GARD) from the Snowflake Marketplace and configures it to mirror raw GA4 event data from a client’s BigQuery analytics_<property_id>.events_* (and events_intraday_*) tables into a dedicated Snowflake database. The connector preserves nested event_params and user_properties as VARIANT columns, so downstream LATERAL FLATTEN queries reproduce the same shape as BigQuery UNNEST. The connector listing itself is free; you pay only for Snowflake compute (warehouse credits) on the ingest side and BigQuery Storage Read API egress on the source side (~$1.10/TB read). No GA4 transformation, dbt modeling, or BI wiring is performed by this guide — see What this guide does NOT cover.

Prerequisites

Roles & handoff overview

StepOwnerWhat they doHand toVia
1BQ adminVerify GA4 export, create service account, grant IAM, download JSON keySnowflake admin1Password / HashiCorp Vault — never email or Slack
2BQ adminSet BQ budget alert(none — internal)GCP console
3Snowflake adminReceive credential bundle, install GARD from Marketplace, configure source, run partition-scoped validationBQ adminJoint validation call
4BothJoint 30-min validation call: row counts match, UNNEST works, cost monitors armed(done)Zoom / Meet

Sequence is strict. The Snowflake admin cannot start until the BQ admin has handed over the JSON key, project ID, and dataset name. Block out a 15-min sync between the two admins to confirm the credential transfer is in 1Password and accessible.

Section A: BigQuery / GCP Admin steps

Audience: the GCP / BigQuery admin at the client. You can read this section in isolation — Section B is for your Snowflake counterpart and you do not need to execute it.

A1. Verify GA4 export is landing

Open the BQ console, switch to the GA4 export project, and run:

SELECT
  _TABLE_SUFFIX AS partition_date,
  COUNT(*)      AS event_count
FROM `<project>.analytics_<property_id>.events_*`
WHERE _TABLE_SUFFIX BETWEEN
      FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 7)
  AND FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 1)
GROUP BY 1
ORDER BY 1 DESC;

You should see 7 rows, one per day, each with a non-trivial event count. Also confirm events_intraday_<today> exists if the client expects intra-day freshness:

SELECT COUNT(*) FROM `<project>.analytics_<property_id>.events_intraday_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE());

If the intraday table doesn’t exist, GA4 streaming export is not enabled for this property — note this and tell the Snowflake admin so they know not to schedule sub-daily syncs.

A2. Create a GCP service account dedicated to GARD

Use a dedicated SA — do not reuse an existing one. Naming convention:

gcloud iam service-accounts create gard-snowflake-reader \
  --project=<project> \
  --display-name="GARD Snowflake Connector Reader" \
  --description="Read-only access for Snowflake GARD connector to GA4 BQ export"

Result: gard-snowflake-reader@<project>.iam.gserviceaccount.com.

A3. Grant minimum-necessary IAM roles

Three grants. The first is dataset-scoped, the other two are project-scoped because BQ does not support scoping them lower.

3a. Dataset-scoped: roles/bigquery.dataViewer

In the BQ console, navigate to the analytics_<property_id> dataset → SHARING → Permissions → ADD PRINCIPAL. Add the SA email with role BigQuery Data Viewer. Or via CLI:

bq add-iam-policy-binding \
  --member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer" \
  <project>:analytics_<property_id>

3b. Project-scoped: roles/bigquery.readSessionUser (required for the Storage Read API the connector uses for high-throughput pulls)

gcloud projects add-iam-policy-binding <project> \
  --member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
  --role="roles/bigquery.readSessionUser"

3c. Project-scoped: roles/bigquery.jobUser (required for query metadata jobs the connector runs to enumerate partitions)

gcloud projects add-iam-policy-binding <project> \
  --member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Do not grant roles/bigquery.dataEditor, roles/bigquery.admin, or any project-wide data-viewer role. If the client has multiple GA4 properties exporting to the same project and they all need to land in Snowflake, repeat 3a per dataset rather than promoting to a project-wide grant.

A4. Create and download the JSON key

gcloud iam service-accounts keys create ~/gard-snowflake-reader-key.json \
  --iam-account=gard-snowflake-reader@<project>.iam.gserviceaccount.com

The file is now on your local disk. Do not leave it there. Move it into 1Password (or the client’s Vault) immediately after step A5 and shred the local copy:

shred -u ~/gard-snowflake-reader-key.json   # Linux
rm -P ~/gard-snowflake-reader-key.json      # macOS

A5. Hand off to Snowflake admin

Create a single 1Password item (vault: client’s shared engineering vault) with these fields:

Share the 1Password item with the Snowflake admin’s account. Confirm receipt verbally or in a video call. Never paste the key into Slack, email, a Google Doc, a ticket, or a chat tool — even ephemerally. If the client uses HashiCorp Vault instead, store under secret/clients/<client>/gard-connector with the same fields.

A6. Cost watch on the BQ side

The Storage Read API is the cost surface on BQ. A runaway connector or an unbounded backfill can quietly read terabytes. Set a project-level budget alert before you sleep tonight.

In the GCP console: Billing → Budgets & alerts → CREATE BUDGET.

Optionally also enable BigQuery’s per-project query quota under IAM & Admin → Quotas (filter for “BigQuery API”), capping daily Storage Read bytes if the client is risk-averse. Default quotas are very high; a 5-TB/day cap is reasonable for a mid-size GA4 property.


End of Section A. You are done. Confirm with the Snowflake admin that they have received the 1Password item and can proceed.

Section B: Snowflake Admin steps

Audience: the Snowflake admin (Mammoth Growth or client). You can read this section in isolation. Section A has already been completed by your BQ counterpart and the credential bundle is waiting for you in 1Password.

B1. Receive the credential bundle from the BQ admin

Open the 1Password item titled GARD Connector — GA4 BQ Service Account — <client>. Confirm it contains:

If anything is missing, ping the BQ admin before proceeding. Do not improvise the project ID or dataset name from a screenshot.

B2. Install the connector from Marketplace

Same flow as the GAAD install you’ve done before.

  1. Go to https://app.snowflake.com/marketplace.
  2. Search for “Snowflake Connector for Google Analytics Raw Data” (or “GARD”). The publisher is Snowflake Inc. Listing is free.
  3. Click Get → choose the role with marketplace install privileges (typically ACCOUNTADMIN or your dedicated MARKETPLACE_INSTALLER role) → confirm.

The connector installs as a Snowflake Native App into a database it creates (default name pattern: SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA). Don’t rename it — downstream upgrades expect the default name.

B3. Create a dedicated database, schema, and warehouse for ingested data

The connector lands data into a destination database/schema you specify. Create them up front, separately from the connector’s app database, so you can grant downstream consumers access without touching the app:

USE ROLE ACCOUNTADMIN;

-- Destination database for raw GA4 data
CREATE DATABASE IF NOT EXISTS GA4_RAW
  COMMENT = 'Raw GA4 event data mirrored from BigQuery via GARD connector';

CREATE SCHEMA IF NOT EXISTS GA4_RAW.EVENTS
  COMMENT = 'GA4 events_* and events_intraday_* tables';

-- Dedicated ingest warehouse (XS to start, scale up only if backfill is slow)
CREATE WAREHOUSE IF NOT EXISTS WH_GA4_INGEST
  WAREHOUSE_SIZE       = XSMALL
  AUTO_SUSPEND         = 60
  AUTO_RESUME          = TRUE
  INITIALLY_SUSPENDED  = TRUE
  COMMENT              = 'Dedicated warehouse for GARD connector ingest jobs';

-- Connector role (the connector creates its own service role on install,
-- but you'll grant it usage on the destination DB and warehouse)
GRANT USAGE  ON DATABASE  GA4_RAW           TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT USAGE  ON SCHEMA    GA4_RAW.EVENTS    TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT CREATE TABLE ON SCHEMA GA4_RAW.EVENTS TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT USAGE  ON WAREHOUSE WH_GA4_INGEST     TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;

If the application name differs in your account (Snowflake occasionally suffixes installs), check the actual name with:

SHOW APPLICATIONS LIKE '%GOOGLE_ANALYTICS_RAW%';

and substitute it in the GRANTs above.

B4. Configure source connection in the GARD UI

Open the connector app: Data Products → Apps → Snowflake Connector for Google Analytics Raw Data → Launch app.

In the connector’s configuration UI:

  1. Authentication: select Service Account JSON Key → paste the entire JSON contents (open the key file in a text editor, copy everything from { to }). The connector stores it as a Snowflake secret; the file should never live on Snowflake VM disk.

  2. Source:

    • GCP project ID: the project ID from the 1Password bundle.
    • BigQuery dataset: the GA4 dataset name (e.g. analytics_123456789).
    • Region: match the dataset’s BQ region (usually US or EU — check the BQ console if uncertain).
  3. Destination:

    • Database: GA4_RAW
    • Schema: EVENTS
    • Warehouse: WH_GA4_INGEST
  4. Tables to sync: start with the two table groups that matter:

    • events_* (daily-finalized event tables — the canonical source)
    • events_intraday_* (only if A1 confirmed intraday is enabled and the client wants today’s data)

    Exclude any tables the BQ admin flagged in the handoff bundle. Skip pseudonymous_users_* and users_* unless the client specifically asks — they pull additional data and are usually not needed for event analysis.

After saving, the connector validates the credentials by listing the dataset’s tables. If it errors:

B5. Set sync schedule

In the connector’s Schedule tab:

Shred the local JSON key file immediately after B4 succeeds:

rm -P /tmp/gard-snowflake-reader-key.json    # macOS
shred -u /tmp/gard-snowflake-reader-key.json # Linux

B6. Run the initial sync on ONE date partition first

Do not kick off a full backfill yet. Run the connector once against the most recent completed daily partition (yesterday’s events_YYYYMMDD) to validate end-to-end fidelity before committing to a multi-month read.

In the connector UI: Sync now → Custom range → Start = yesterday, End = yesterday → Tables = events_* → Run.

When it finishes (a few minutes for a typical day), validate:

Row count match:

-- In Snowflake
SELECT COUNT(*) AS sf_count
FROM GA4_RAW.EVENTS."events_YYYYMMDD";   -- substitute yesterday's date
-- In BigQuery (BQ admin runs this, or run yourself if you have read access)
SELECT COUNT(*) AS bq_count
FROM `<project>.analytics_<property_id>.events_YYYYMMDD`;

The two counts must match exactly. If they don’t, do not proceed — open a support ticket on the Snowflake side; this is usually a partition-cutoff issue.

event_params UNNEST fidelity:

-- Snowflake equivalent of BQ's UNNEST(event_params)
SELECT
  event_name,
  param.value:string_value::STRING AS page_location
FROM GA4_RAW.EVENTS."events_YYYYMMDD" e,
     LATERAL FLATTEN(input => e.event_params) param
WHERE param.value:key::STRING = 'page_location'
  AND event_name = 'page_view'
LIMIT 10;

You should see real URLs in page_location. If event_params is NULL or empty, the connector did not preserve the nested structure — check the connector’s “Schema mapping” setting and confirm RECORD ARRAY → VARIANT mapping is enabled (it is by default; explicit-flatten mode is a misconfiguration here).

Spot-check a known event: pick an event you know exists yesterday (e.g. a purchase for an e-commerce client) and confirm a sample row in Snowflake matches the same event_id / user_pseudo_id in BQ.

B7. Backfill the historical range

Once B6 passes, run the historical backfill. Cap the first run.

  1. First backfill window: last 30 days. Sync now → Custom range → Start = 30 days ago, End = 2 days ago → Run.
  2. After it completes, check actual cost:
    • Snowflake: SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY WHERE WAREHOUSE_NAME = 'WH_GA4_INGEST' ORDER BY START_TIME DESC LIMIT 50;
    • BQ: GCP console → Billing → cost breakdown → BigQuery Storage Read API line item.
  3. Extrapolate: if 30 days cost $X, the client’s full retention (typically 14 months for GA4 free, longer for 360) will cost roughly (retention_months / 1) × X. If that’s within the budget approved in Prerequisites, proceed. If not, pause and renegotiate scope with the client.
  4. Second backfill: remaining range. Run in chunks of 90 days at a time so you can monitor cost mid-flight and abort if the rate is wrong.

After backfill, the daily schedule from B5 takes over and keeps the data current.


End of Section B. Confirm with the BQ admin and book the joint validation call (Section C).

Section C: Joint validation (both admins on a 30-min call)

Schedule a 30-min Zoom/Meet with both admins. Screen-share is mandatory — both sides need to see the queries running live.

C1. Row counts match end-to-end

Pick three days at random across the backfill range (one recent, one mid-range, one oldest). For each:

BQ admin runs:

SELECT COUNT(*) FROM `<project>.analytics_<property_id>.events_YYYYMMDD`;

Snowflake admin runs:

SELECT COUNT(*) FROM GA4_RAW.EVENTS."events_YYYYMMDD";

All three pairs must match exactly. If even one is off by more than 0 rows, investigate before signing off — most often it’s a streaming-vs-batch boundary on the most recent day.

C2. event_params UNNEST fidelity walkthrough

Snowflake admin runs the LATERAL FLATTEN query from B6 against a different day. BQ admin runs the equivalent BQ query:

SELECT
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params)
   WHERE key = 'page_location') AS page_location
FROM `<project>.analytics_<property_id>.events_YYYYMMDD`
WHERE event_name = 'page_view'
LIMIT 10;

Both queries should return the same set of (event_name, page_location) pairs for the same user/timestamp. Don’t worry about row order — confirm the values match.

Repeat the exercise for user_properties to confirm that nested field is also intact.

C3. Cost monitoring is armed on both sides

C4. Sign-off

Both admins explicitly say “validated” on the call. Mammoth account lead notes the validation date in the engagement tracker. Hand off the daily-monitoring runbook (separate doc) to the client’s on-call rotation.

Cost expectations

ComponentCost surfaceRealistic range (mid-size property, ~1M events/day)
GARD connector listingFree$0
BQ Storage Read API$1.10 / TB read~$10–30/mo steady state, plus a one-time backfill spike (could be $50–200 for a 14-month backfill of a high-traffic property)
BQ query (metadata)$6.25 / TB scannedNegligible (<$1/mo) — the connector reads via Storage API, not query
Snowflake warehouse (WH_GA4_INGEST, XS)$2/credit × ~1–2 hr/day$50–100/mo steady state
Snowflake storage$23 / TB/mo (compressed)$5–20/mo for a 1M-events/day property — GA4 raw compresses ~5–8x in Snowflake
Total steady state$70–150/mo
One-time backfillBQ Storage Read + Snowflake compute$100–500 depending on retention and event volume

For the small-property end (under 100K events/day): expect under $50/mo total steady state.

For the high-traffic end (10M+ events/day): expect $300–1000/mo Snowflake (warehouse may need to scale to S or M for ingest), $50–150/mo BQ.

The Snowflake warehouse is the largest controllable cost. If steady-state usage settles below 30 min/day of compute, drop AUTO_SUSPEND to 30s. If backfills are slow, scale the warehouse up just for the backfill and back down after.

Common gotchas

What this guide does NOT cover