GA4 → Snowflake via GARD — Setup Guide

What this guide does

This SOP installs the Snowflake Connector for Google Analytics Raw Data (GARD) from the Snowflake Marketplace and configures it to mirror raw GA4 event data from a client’s BigQuery analytics_<property_id>.events_* (and events_intraday_*) tables into a dedicated Snowflake database. The connector preserves nested event_params and user_properties as VARIANT columns, so downstream LATERAL FLATTEN queries reproduce the same shape as BigQuery UNNEST. The connector listing itself is free; you pay only for Snowflake compute (warehouse credits) on the ingest side and BigQuery Storage Read API egress on the source side (~$1.10/TB read). No GA4 transformation, dbt modeling, or BI wiring is performed by this guide — see What this guide does NOT cover.

Prerequisites

GA4 → BigQuery export already linked and landing daily. Verify by running this query in the BQ console (replace <project> and <property_id>):
```
SELECT MAX(_TABLE_SUFFIX) AS latest_partition,
       COUNT(*)            AS row_count
FROM `<project>.analytics_<property_id>.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 1);
```
Expect a non-zero row count for yesterday. If the partition is missing, the GA4 → BQ link is not healthy — fix that before proceeding.
GCP project access for the BQ admin: Owner or Editor on the GA4 BQ export project, or at minimum the ability to create service accounts and grant IAM roles on the GA4 dataset.
Snowflake account with ACCOUNTADMIN or a role that has the IMPORT SHARE, CREATE DATABASE, CREATE WAREHOUSE, and CREATE INTEGRATION privileges (the marketplace install role pattern Mammoth uses for GAAD).
Approved spend on both sides. Rough ranges:
- Small property (<100K events/day): under $50/mo total.
- Mid-size (~1M events/day): $50–200/mo Snowflake, $10–30/mo BQ.
- High-traffic (10M+ events/day): $200–1000/mo Snowflake, $30–150/mo BQ. Confirm with the client before kicking off the backfill — see the bill-shock note in Common gotchas.

Roles & handoff overview

Step	Owner	What they do	Hand to	Via
1	BQ admin	Verify GA4 export, create service account, grant IAM, download JSON key	Snowflake admin	1Password / HashiCorp Vault — never email or Slack
2	BQ admin	Set BQ budget alert	(none — internal)	GCP console
3	Snowflake admin	Receive credential bundle, install GARD from Marketplace, configure source, run partition-scoped validation	BQ admin	Joint validation call
4	Both	Joint 30-min validation call: row counts match, UNNEST works, cost monitors armed	(done)	Zoom / Meet

Sequence is strict. The Snowflake admin cannot start until the BQ admin has handed over the JSON key, project ID, and dataset name. Block out a 15-min sync between the two admins to confirm the credential transfer is in 1Password and accessible.

Section A: BigQuery / GCP Admin steps

Audience: the GCP / BigQuery admin at the client. You can read this section in isolation — Section B is for your Snowflake counterpart and you do not need to execute it.

A1. Verify GA4 export is landing

Open the BQ console, switch to the GA4 export project, and run:

SELECT
  _TABLE_SUFFIX AS partition_date,
  COUNT(*)      AS event_count
FROM `<project>.analytics_<property_id>.events_*`
WHERE _TABLE_SUFFIX BETWEEN
      FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 7)
  AND FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 1)
GROUP BY 1
ORDER BY 1 DESC;

You should see 7 rows, one per day, each with a non-trivial event count. Also confirm events_intraday_<today> exists if the client expects intra-day freshness:

SELECT COUNT(*) FROM `<project>.analytics_<property_id>.events_intraday_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE());

If the intraday table doesn’t exist, GA4 streaming export is not enabled for this property — note this and tell the Snowflake admin so they know not to schedule sub-daily syncs.

A2. Create a GCP service account dedicated to GARD

Use a dedicated SA — do not reuse an existing one. Naming convention:

gcloud iam service-accounts create gard-snowflake-reader \
  --project=<project> \
  --display-name="GARD Snowflake Connector Reader" \
  --description="Read-only access for Snowflake GARD connector to GA4 BQ export"

Result: gard-snowflake-reader@<project>.iam.gserviceaccount.com.

A3. Grant minimum-necessary IAM roles

Three grants. The first is dataset-scoped, the other two are project-scoped because BQ does not support scoping them lower.

3a. Dataset-scoped: roles/bigquery.dataViewer

In the BQ console, navigate to the analytics_<property_id> dataset → SHARING → Permissions → ADD PRINCIPAL. Add the SA email with role BigQuery Data Viewer. Or via CLI:

bq add-iam-policy-binding \
  --member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer" \
  <project>:analytics_<property_id>

3b. Project-scoped: roles/bigquery.readSessionUser (required for the Storage Read API the connector uses for high-throughput pulls)

gcloud projects add-iam-policy-binding <project> \
  --member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
  --role="roles/bigquery.readSessionUser"

3c. Project-scoped: roles/bigquery.jobUser (required for query metadata jobs the connector runs to enumerate partitions)

gcloud projects add-iam-policy-binding <project> \
  --member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Do not grant roles/bigquery.dataEditor, roles/bigquery.admin, or any project-wide data-viewer role. If the client has multiple GA4 properties exporting to the same project and they all need to land in Snowflake, repeat 3a per dataset rather than promoting to a project-wide grant.

A4. Create and download the JSON key

gcloud iam service-accounts keys create ~/gard-snowflake-reader-key.json \
  --iam-account=gard-snowflake-reader@<project>.iam.gserviceaccount.com

The file is now on your local disk. Do not leave it there. Move it into 1Password (or the client’s Vault) immediately after step A5 and shred the local copy:

shred -u ~/gard-snowflake-reader-key.json   # Linux
rm -P ~/gard-snowflake-reader-key.json      # macOS

A5. Hand off to Snowflake admin

Create a single 1Password item (vault: client’s shared engineering vault) with these fields:

Title: GARD Connector — GA4 BQ Service Account — <client>
JSON key file (attached as file)
GCP project ID (text field): e.g. acme-analytics-prod
GA4 dataset name (text field): e.g. analytics_123456789
GA4 property ID (text field): e.g. 123456789
Excluded tables (notes, optional): list any tables to omit (e.g. events_intraday_* if the client only wants finalized daily data)
SA email (text field): gard-snowflake-reader@<project>.iam.gserviceaccount.com

Share the 1Password item with the Snowflake admin’s account. Confirm receipt verbally or in a video call. Never paste the key into Slack, email, a Google Doc, a ticket, or a chat tool — even ephemerally. If the client uses HashiCorp Vault instead, store under secret/clients/<client>/gard-connector with the same fields.

A6. Cost watch on the BQ side

The Storage Read API is the cost surface on BQ. A runaway connector or an unbounded backfill can quietly read terabytes. Set a project-level budget alert before you sleep tonight.

In the GCP console: Billing → Budgets & alerts → CREATE BUDGET.

Scope: Project = the GA4 export project.
Services filter: BigQuery (this captures both query and Storage Read API spend).
Budget amount: Suggest 3× the client’s pre-GARD monthly BQ spend, or $200/mo if they had near-zero BQ spend before. Tighter is fine — the goal is to catch a runaway, not to budget normal usage.
Alert thresholds: 50%, 90%, 100% of budget — email the BQ admin, the Snowflake admin, and the Mammoth account lead.

Optionally also enable BigQuery’s per-project query quota under IAM & Admin → Quotas (filter for “BigQuery API”), capping daily Storage Read bytes if the client is risk-averse. Default quotas are very high; a 5-TB/day cap is reasonable for a mid-size GA4 property.

End of Section A. You are done. Confirm with the Snowflake admin that they have received the 1Password item and can proceed.

Section B: Snowflake Admin steps

Audience: the Snowflake admin (Mammoth Growth or client). You can read this section in isolation. Section A has already been completed by your BQ counterpart and the credential bundle is waiting for you in 1Password.

B1. Receive the credential bundle from the BQ admin

Open the 1Password item titled GARD Connector — GA4 BQ Service Account — <client>. Confirm it contains:

A JSON key file (download it to a tmp location — you’ll paste its contents into Snowflake in B4, then shred)
GCP project ID
GA4 dataset name (e.g. analytics_123456789)
GA4 property ID
SA email
(Optional) list of excluded tables

If anything is missing, ping the BQ admin before proceeding. Do not improvise the project ID or dataset name from a screenshot.

B2. Install the connector from Marketplace

Same flow as the GAAD install you’ve done before.

Go to https://app.snowflake.com/marketplace.
Search for “Snowflake Connector for Google Analytics Raw Data” (or “GARD”). The publisher is Snowflake Inc. Listing is free.
Click Get → choose the role with marketplace install privileges (typically ACCOUNTADMIN or your dedicated MARKETPLACE_INSTALLER role) → confirm.

The connector installs as a Snowflake Native App into a database it creates (default name pattern: SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA). Don’t rename it — downstream upgrades expect the default name.

B3. Create a dedicated database, schema, and warehouse for ingested data

The connector lands data into a destination database/schema you specify. Create them up front, separately from the connector’s app database, so you can grant downstream consumers access without touching the app:

USE ROLE ACCOUNTADMIN;

-- Destination database for raw GA4 data
CREATE DATABASE IF NOT EXISTS GA4_RAW
  COMMENT = 'Raw GA4 event data mirrored from BigQuery via GARD connector';

CREATE SCHEMA IF NOT EXISTS GA4_RAW.EVENTS
  COMMENT = 'GA4 events_* and events_intraday_* tables';

-- Dedicated ingest warehouse (XS to start, scale up only if backfill is slow)
CREATE WAREHOUSE IF NOT EXISTS WH_GA4_INGEST
  WAREHOUSE_SIZE       = XSMALL
  AUTO_SUSPEND         = 60
  AUTO_RESUME          = TRUE
  INITIALLY_SUSPENDED  = TRUE
  COMMENT              = 'Dedicated warehouse for GARD connector ingest jobs';

-- Connector role (the connector creates its own service role on install,
-- but you'll grant it usage on the destination DB and warehouse)
GRANT USAGE  ON DATABASE  GA4_RAW           TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT USAGE  ON SCHEMA    GA4_RAW.EVENTS    TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT CREATE TABLE ON SCHEMA GA4_RAW.EVENTS TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT USAGE  ON WAREHOUSE WH_GA4_INGEST     TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;

If the application name differs in your account (Snowflake occasionally suffixes installs), check the actual name with:

SHOW APPLICATIONS LIKE '%GOOGLE_ANALYTICS_RAW%';

and substitute it in the GRANTs above.

B4. Configure source connection in the GARD UI

Open the connector app: Data Products → Apps → Snowflake Connector for Google Analytics Raw Data → Launch app.

In the connector’s configuration UI:

Authentication: select Service Account JSON Key → paste the entire JSON contents (open the key file in a text editor, copy everything from { to }). The connector stores it as a Snowflake secret; the file should never live on Snowflake VM disk.
Source:
- GCP project ID: the project ID from the 1Password bundle.
- BigQuery dataset: the GA4 dataset name (e.g. analytics_123456789).
- Region: match the dataset’s BQ region (usually US or EU — check the BQ console if uncertain).
Destination:
- Database: GA4_RAW
- Schema: EVENTS
- Warehouse: WH_GA4_INGEST
Tables to sync: start with the two table groups that matter:
- events_* (daily-finalized event tables — the canonical source)
- events_intraday_* (only if A1 confirmed intraday is enabled and the client wants today’s data)
Exclude any tables the BQ admin flagged in the handoff bundle. Skip pseudonymous_users_* and users_* unless the client specifically asks — they pull additional data and are usually not needed for event analysis.

After saving, the connector validates the credentials by listing the dataset’s tables. If it errors:

Permission denied on dataset → IAM grant A3a didn’t propagate yet (wait 60 seconds, retry) or the SA email in the JSON doesn’t match the one granted access (re-check the bundle).
Storage Read API quota exceeded → A6’s quota cap is too tight. Raise it.
Invalid JSON → the paste truncated. Re-copy the full file.

B5. Set sync schedule

In the connector’s Schedule tab:

Default (GA4 free tier, daily export): set frequency to daily, run time 04:00 UTC (after GA4’s typical 02:00–03:00 UTC export completion). Running earlier risks pulling a partial partition.
GA4 360 or streaming export enabled: set frequency to hourly for events_intraday_* and keep daily for events_*. The connector supports per-table-group schedules.
Aggressive freshness (15-min): only if the client is paying GA4 360 and explicitly needs sub-hour latency. This will cost noticeably more in both BQ Storage Read and Snowflake warehouse credits — get sign-off.

Shred the local JSON key file immediately after B4 succeeds:

rm -P /tmp/gard-snowflake-reader-key.json    # macOS
shred -u /tmp/gard-snowflake-reader-key.json # Linux

B6. Run the initial sync on ONE date partition first

Do not kick off a full backfill yet. Run the connector once against the most recent completed daily partition (yesterday’s events_YYYYMMDD) to validate end-to-end fidelity before committing to a multi-month read.

In the connector UI: Sync now → Custom range → Start = yesterday, End = yesterday → Tables = events_* → Run.

When it finishes (a few minutes for a typical day), validate:

Row count match:

-- In Snowflake
SELECT COUNT(*) AS sf_count
FROM GA4_RAW.EVENTS."events_YYYYMMDD";   -- substitute yesterday's date

-- In BigQuery (BQ admin runs this, or run yourself if you have read access)
SELECT COUNT(*) AS bq_count
FROM `<project>.analytics_<property_id>.events_YYYYMMDD`;

The two counts must match exactly. If they don’t, do not proceed — open a support ticket on the Snowflake side; this is usually a partition-cutoff issue.

event_params UNNEST fidelity:

-- Snowflake equivalent of BQ's UNNEST(event_params)
SELECT
  event_name,
  param.value:string_value::STRING AS page_location
FROM GA4_RAW.EVENTS."events_YYYYMMDD" e,
     LATERAL FLATTEN(input => e.event_params) param
WHERE param.value:key::STRING = 'page_location'
  AND event_name = 'page_view'
LIMIT 10;

You should see real URLs in page_location. If event_params is NULL or empty, the connector did not preserve the nested structure — check the connector’s “Schema mapping” setting and confirm RECORD ARRAY → VARIANT mapping is enabled (it is by default; explicit-flatten mode is a misconfiguration here).

Spot-check a known event: pick an event you know exists yesterday (e.g. a purchase for an e-commerce client) and confirm a sample row in Snowflake matches the same event_id / user_pseudo_id in BQ.

B7. Backfill the historical range

Once B6 passes, run the historical backfill. Cap the first run.

First backfill window: last 30 days. Sync now → Custom range → Start = 30 days ago, End = 2 days ago → Run.
After it completes, check actual cost:
- Snowflake: SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY WHERE WAREHOUSE_NAME = 'WH_GA4_INGEST' ORDER BY START_TIME DESC LIMIT 50;
- BQ: GCP console → Billing → cost breakdown → BigQuery Storage Read API line item.
Extrapolate: if 30 days cost $X, the client’s full retention (typically 14 months for GA4 free, longer for 360) will cost roughly (retention_months / 1) × X. If that’s within the budget approved in Prerequisites, proceed. If not, pause and renegotiate scope with the client.
Second backfill: remaining range. Run in chunks of 90 days at a time so you can monitor cost mid-flight and abort if the rate is wrong.

After backfill, the daily schedule from B5 takes over and keeps the data current.

End of Section B. Confirm with the BQ admin and book the joint validation call (Section C).

Section C: Joint validation (both admins on a 30-min call)

Schedule a 30-min Zoom/Meet with both admins. Screen-share is mandatory — both sides need to see the queries running live.

C1. Row counts match end-to-end

Pick three days at random across the backfill range (one recent, one mid-range, one oldest). For each:

BQ admin runs:

SELECT COUNT(*) FROM `<project>.analytics_<property_id>.events_YYYYMMDD`;

Snowflake admin runs:

SELECT COUNT(*) FROM GA4_RAW.EVENTS."events_YYYYMMDD";

All three pairs must match exactly. If even one is off by more than 0 rows, investigate before signing off — most often it’s a streaming-vs-batch boundary on the most recent day.

C2. event_params UNNEST fidelity walkthrough

Snowflake admin runs the LATERAL FLATTEN query from B6 against a different day. BQ admin runs the equivalent BQ query:

SELECT
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params)
   WHERE key = 'page_location') AS page_location
FROM `<project>.analytics_<property_id>.events_YYYYMMDD`
WHERE event_name = 'page_view'
LIMIT 10;

Both queries should return the same set of (event_name, page_location) pairs for the same user/timestamp. Don’t worry about row order — confirm the values match.

Repeat the exercise for user_properties to confirm that nested field is also intact.

C3. Cost monitoring is armed on both sides

BQ admin confirms the budget alert from A6 is active. Show the budget config screen. Confirm the alert recipients are correct (BQ admin + Snowflake admin + Mammoth account lead).

Snowflake admin sets a resource monitor on WH_GA4_INGEST:

CREATE RESOURCE MONITOR RM_GA4_INGEST
  WITH CREDIT_QUOTA = 100   -- adjust to expected monthly credit usage × 1.5
       FREQUENCY    = MONTHLY
       START_TIMESTAMP = IMMEDIATELY
       TRIGGERS ON 75 PERCENT DO NOTIFY
                ON 90 PERCENT DO NOTIFY
                ON 100 PERCENT DO SUSPEND;

ALTER WAREHOUSE WH_GA4_INGEST SET RESOURCE_MONITOR = RM_GA4_INGEST;

Confirm the monitor’s notification recipients include both admins.

C4. Sign-off

Both admins explicitly say “validated” on the call. Mammoth account lead notes the validation date in the engagement tracker. Hand off the daily-monitoring runbook (separate doc) to the client’s on-call rotation.

Cost expectations

Component	Cost surface	Realistic range (mid-size property, ~1M events/day)
GARD connector listing	Free	$0
BQ Storage Read API	$1.10 / TB read	~$10–30/mo steady state, plus a one-time backfill spike (could be $50–200 for a 14-month backfill of a high-traffic property)
BQ query (metadata)	$6.25 / TB scanned	Negligible (<$1/mo) — the connector reads via Storage API, not query
Snowflake warehouse (WH_GA4_INGEST, XS)	$2/credit × ~1–2 hr/day	$50–100/mo steady state
Snowflake storage	$23 / TB/mo (compressed)	$5–20/mo for a 1M-events/day property — GA4 raw compresses ~5–8x in Snowflake
Total steady state		$70–150/mo
One-time backfill	BQ Storage Read + Snowflake compute	$100–500 depending on retention and event volume

For the small-property end (under 100K events/day): expect under $50/mo total steady state.

For the high-traffic end (10M+ events/day): expect $300–1000/mo Snowflake (warehouse may need to scale to S or M for ingest), $50–150/mo BQ.

The Snowflake warehouse is the largest controllable cost. If steady-state usage settles below 30 min/day of compute, drop AUTO_SUSPEND to 30s. If backfills are slow, scale the warehouse up just for the backfill and back down after.

Common gotchas

GA4 BQ export is daily by default. The connector cannot give you better-than-daily freshness if GA4 itself isn’t streaming. If the client expects “real-time,” confirm streaming export is enabled in GA4 Admin → BigQuery Links before you set the schedule in B5. Otherwise sub-hour syncs will just no-op against unchanged daily tables.
Intraday is a separate table. Today’s freshest data lives in events_intraday_<today>, which GA4 recreates daily (drops yesterday’s intraday once that day’s events_<date> is finalized). If the client cares about today’s data, the connector must include events_intraday_* in B4. Without it, “today’s data” never appears in Snowflake.
Nested fields land as VARIANT, not as flattened columns. Downstream consumers expecting event_params.page_location as a top-level column will be disappointed. Educate the client (or their analytics team) that they need LATERAL FLATTEN — provide the B6 query as a template. Plan a thin view layer in the dbt engagement to flatten common params.
Daily-sharded table naming is schema-drift-adjacent. Each new day creates events_YYYYMMDD as a new table. The connector handles this automatically, but if GA4 ever changes its schema (added fields like is_active_user, etc.), the connector will pick up the new column on the next sync. Your downstream views should SELECT * defensively or use OBJECT_KEYS() to avoid breaking on schema additions.
Service account key rotation. GCP keys do not auto-rotate. Set a calendar reminder for the BQ admin to rotate the key every 90–180 days. Rotation procedure: create a new key, paste into the connector (B4), validate one sync, delete the old key in GCP. Do not delete the old key first — the connector will fail until the new one is in place.
Bill shock failure mode (most important). A naive backfill on a high-traffic property can read multiple TB from BQ Storage API in one go. At $1.10/TB this is bounded but unbudgeted spend will still surprise the client. Always cap the first backfill at 30 days (B7), validate the cost, then expand. A 14-month backfill of a 10M-events/day property could be 5–15 TB read = $5–17 in BQ. That’s tolerable; what’s not tolerable is also paying for an M-sized Snowflake warehouse running for 8 hours straight to ingest it. Pre-size both sides.
Region mismatch. If the BQ dataset is in EU and the Snowflake account is in us-east-1, egress works but is slower and incurs Google egress fees on top of Storage Read. For EU-resident clients, prefer a Snowflake EU region.
The timeZone column in GA4 events is the property’s reporting timezone, not the event timestamp’s timezone. Event timestamps are always UTC (microseconds since epoch in event_timestamp). Don’t let downstream consumers double-convert.

What this guide does NOT cover

dbt models on top of the landed data. Flattened fact tables, event_params pivots, session reconstruction — separate engagement.
Aggregation or pre-computed marts (daily active users, funnel tables, etc.) — separate engagement.
GA4 measurement protocol or custom event configuration — that’s GA4 admin work on the client side, out of scope for data engineering.
BI tool wiring (Looker, Tableau, Hex, Sigma, etc.) on top of the Snowflake landing zone — downstream.
Identity resolution / user stitching beyond what GA4 itself provides (user_pseudo_id, user_id) — separate engagement, usually done in dbt.
PII review. GA4 raw data may contain user_id values that are PII (email hashes, customer IDs). Review with the client’s privacy team before granting broad access to GA4_RAW.
Snowflake → Snowflake share-out to other accounts (e.g. agency reporting). Configure separately via Snowflake Secure Data Sharing once the landing zone is stable.

../06-reference/2026-04-16-research-bq-to-snowflake-routes — research note that selected GARD over alternatives (Fivetran, Airbyte, custom Storage Read pipeline). Note: file may not exist yet — this is a forward-reference for the upcoming research write-up.
Snowflake official docs: https://docs.snowflake.com/en/connectors/google/gard/gard-connector-about
Snowflake Marketplace listing: https://app.snowflake.com/marketplace (search “Google Analytics Raw Data”)
Adswerve practitioner write-up on GARD vs custom pipelines (cited in the research note)
mcp-server-setup — internal pattern for credentials handling (1Password-first); the same “no secrets on disk” principle applies to the JSON key in this SOP
GA4 BigQuery export schema reference: https://support.google.com/analytics/answer/7029846