GA4 → Snowflake via GARD — Setup Guide
What this guide does
This SOP installs the Snowflake Connector for Google Analytics Raw Data (GARD) from the Snowflake Marketplace and configures it to mirror raw GA4 event data from a client’s BigQuery analytics_<property_id>.events_* (and events_intraday_*) tables into a dedicated Snowflake database. The connector preserves nested event_params and user_properties as VARIANT columns, so downstream LATERAL FLATTEN queries reproduce the same shape as BigQuery UNNEST. The connector listing itself is free; you pay only for Snowflake compute (warehouse credits) on the ingest side and BigQuery Storage Read API egress on the source side (~$1.10/TB read). No GA4 transformation, dbt modeling, or BI wiring is performed by this guide — see What this guide does NOT cover.
Prerequisites
- GA4 → BigQuery export already linked and landing daily. Verify by running this query in the BQ console (replace
<project>and<property_id>):
Expect a non-zero row count for yesterday. If the partition is missing, the GA4 → BQ link is not healthy — fix that before proceeding.SELECT MAX(_TABLE_SUFFIX) AS latest_partition, COUNT(*) AS row_count FROM `<project>.analytics_<property_id>.events_*` WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 1); - GCP project access for the BQ admin: Owner or Editor on the GA4 BQ export project, or at minimum the ability to create service accounts and grant IAM roles on the GA4 dataset.
- Snowflake account with
ACCOUNTADMINor a role that has theIMPORT SHARE,CREATE DATABASE,CREATE WAREHOUSE, andCREATE INTEGRATIONprivileges (the marketplace install role pattern Mammoth uses for GAAD). - Approved spend on both sides. Rough ranges:
- Small property (<100K events/day): under $50/mo total.
- Mid-size (~1M events/day): $50–200/mo Snowflake, $10–30/mo BQ.
- High-traffic (10M+ events/day): $200–1000/mo Snowflake, $30–150/mo BQ. Confirm with the client before kicking off the backfill — see the bill-shock note in Common gotchas.
Roles & handoff overview
| Step | Owner | What they do | Hand to | Via |
|---|---|---|---|---|
| 1 | BQ admin | Verify GA4 export, create service account, grant IAM, download JSON key | Snowflake admin | 1Password / HashiCorp Vault — never email or Slack |
| 2 | BQ admin | Set BQ budget alert | (none — internal) | GCP console |
| 3 | Snowflake admin | Receive credential bundle, install GARD from Marketplace, configure source, run partition-scoped validation | BQ admin | Joint validation call |
| 4 | Both | Joint 30-min validation call: row counts match, UNNEST works, cost monitors armed | (done) | Zoom / Meet |
Sequence is strict. The Snowflake admin cannot start until the BQ admin has handed over the JSON key, project ID, and dataset name. Block out a 15-min sync between the two admins to confirm the credential transfer is in 1Password and accessible.
Section A: BigQuery / GCP Admin steps
Audience: the GCP / BigQuery admin at the client. You can read this section in isolation — Section B is for your Snowflake counterpart and you do not need to execute it.
A1. Verify GA4 export is landing
Open the BQ console, switch to the GA4 export project, and run:
SELECT
_TABLE_SUFFIX AS partition_date,
COUNT(*) AS event_count
FROM `<project>.analytics_<property_id>.events_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 7)
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE() - 1)
GROUP BY 1
ORDER BY 1 DESC;
You should see 7 rows, one per day, each with a non-trivial event count. Also confirm events_intraday_<today> exists if the client expects intra-day freshness:
SELECT COUNT(*) FROM `<project>.analytics_<property_id>.events_intraday_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE());
If the intraday table doesn’t exist, GA4 streaming export is not enabled for this property — note this and tell the Snowflake admin so they know not to schedule sub-daily syncs.
A2. Create a GCP service account dedicated to GARD
Use a dedicated SA — do not reuse an existing one. Naming convention:
gcloud iam service-accounts create gard-snowflake-reader \
--project=<project> \
--display-name="GARD Snowflake Connector Reader" \
--description="Read-only access for Snowflake GARD connector to GA4 BQ export"
Result: gard-snowflake-reader@<project>.iam.gserviceaccount.com.
A3. Grant minimum-necessary IAM roles
Three grants. The first is dataset-scoped, the other two are project-scoped because BQ does not support scoping them lower.
3a. Dataset-scoped: roles/bigquery.dataViewer
In the BQ console, navigate to the analytics_<property_id> dataset → SHARING → Permissions → ADD PRINCIPAL. Add the SA email with role BigQuery Data Viewer. Or via CLI:
bq add-iam-policy-binding \
--member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer" \
<project>:analytics_<property_id>
3b. Project-scoped: roles/bigquery.readSessionUser (required for the Storage Read API the connector uses for high-throughput pulls)
gcloud projects add-iam-policy-binding <project> \
--member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
--role="roles/bigquery.readSessionUser"
3c. Project-scoped: roles/bigquery.jobUser (required for query metadata jobs the connector runs to enumerate partitions)
gcloud projects add-iam-policy-binding <project> \
--member="serviceAccount:gard-snowflake-reader@<project>.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
Do not grant roles/bigquery.dataEditor, roles/bigquery.admin, or any project-wide data-viewer role. If the client has multiple GA4 properties exporting to the same project and they all need to land in Snowflake, repeat 3a per dataset rather than promoting to a project-wide grant.
A4. Create and download the JSON key
gcloud iam service-accounts keys create ~/gard-snowflake-reader-key.json \
--iam-account=gard-snowflake-reader@<project>.iam.gserviceaccount.com
The file is now on your local disk. Do not leave it there. Move it into 1Password (or the client’s Vault) immediately after step A5 and shred the local copy:
shred -u ~/gard-snowflake-reader-key.json # Linux
rm -P ~/gard-snowflake-reader-key.json # macOS
A5. Hand off to Snowflake admin
Create a single 1Password item (vault: client’s shared engineering vault) with these fields:
- Title:
GARD Connector — GA4 BQ Service Account — <client> - JSON key file (attached as file)
- GCP project ID (text field): e.g.
acme-analytics-prod - GA4 dataset name (text field): e.g.
analytics_123456789 - GA4 property ID (text field): e.g.
123456789 - Excluded tables (notes, optional): list any tables to omit (e.g.
events_intraday_*if the client only wants finalized daily data) - SA email (text field):
gard-snowflake-reader@<project>.iam.gserviceaccount.com
Share the 1Password item with the Snowflake admin’s account. Confirm receipt verbally or in a video call. Never paste the key into Slack, email, a Google Doc, a ticket, or a chat tool — even ephemerally. If the client uses HashiCorp Vault instead, store under secret/clients/<client>/gard-connector with the same fields.
A6. Cost watch on the BQ side
The Storage Read API is the cost surface on BQ. A runaway connector or an unbounded backfill can quietly read terabytes. Set a project-level budget alert before you sleep tonight.
In the GCP console: Billing → Budgets & alerts → CREATE BUDGET.
- Scope: Project = the GA4 export project.
- Services filter:
BigQuery(this captures both query and Storage Read API spend). - Budget amount: Suggest 3× the client’s pre-GARD monthly BQ spend, or $200/mo if they had near-zero BQ spend before. Tighter is fine — the goal is to catch a runaway, not to budget normal usage.
- Alert thresholds: 50%, 90%, 100% of budget — email the BQ admin, the Snowflake admin, and the Mammoth account lead.
Optionally also enable BigQuery’s per-project query quota under IAM & Admin → Quotas (filter for “BigQuery API”), capping daily Storage Read bytes if the client is risk-averse. Default quotas are very high; a 5-TB/day cap is reasonable for a mid-size GA4 property.
End of Section A. You are done. Confirm with the Snowflake admin that they have received the 1Password item and can proceed.
Section B: Snowflake Admin steps
Audience: the Snowflake admin (Mammoth Growth or client). You can read this section in isolation. Section A has already been completed by your BQ counterpart and the credential bundle is waiting for you in 1Password.
B1. Receive the credential bundle from the BQ admin
Open the 1Password item titled GARD Connector — GA4 BQ Service Account — <client>. Confirm it contains:
- A JSON key file (download it to a tmp location — you’ll paste its contents into Snowflake in B4, then shred)
- GCP project ID
- GA4 dataset name (e.g.
analytics_123456789) - GA4 property ID
- SA email
- (Optional) list of excluded tables
If anything is missing, ping the BQ admin before proceeding. Do not improvise the project ID or dataset name from a screenshot.
B2. Install the connector from Marketplace
Same flow as the GAAD install you’ve done before.
- Go to https://app.snowflake.com/marketplace.
- Search for “Snowflake Connector for Google Analytics Raw Data” (or “GARD”). The publisher is Snowflake Inc. Listing is free.
- Click Get → choose the role with marketplace install privileges (typically
ACCOUNTADMINor your dedicatedMARKETPLACE_INSTALLERrole) → confirm.
The connector installs as a Snowflake Native App into a database it creates (default name pattern: SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA). Don’t rename it — downstream upgrades expect the default name.
B3. Create a dedicated database, schema, and warehouse for ingested data
The connector lands data into a destination database/schema you specify. Create them up front, separately from the connector’s app database, so you can grant downstream consumers access without touching the app:
USE ROLE ACCOUNTADMIN;
-- Destination database for raw GA4 data
CREATE DATABASE IF NOT EXISTS GA4_RAW
COMMENT = 'Raw GA4 event data mirrored from BigQuery via GARD connector';
CREATE SCHEMA IF NOT EXISTS GA4_RAW.EVENTS
COMMENT = 'GA4 events_* and events_intraday_* tables';
-- Dedicated ingest warehouse (XS to start, scale up only if backfill is slow)
CREATE WAREHOUSE IF NOT EXISTS WH_GA4_INGEST
WAREHOUSE_SIZE = XSMALL
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE
COMMENT = 'Dedicated warehouse for GARD connector ingest jobs';
-- Connector role (the connector creates its own service role on install,
-- but you'll grant it usage on the destination DB and warehouse)
GRANT USAGE ON DATABASE GA4_RAW TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT USAGE ON SCHEMA GA4_RAW.EVENTS TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT CREATE TABLE ON SCHEMA GA4_RAW.EVENTS TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
GRANT USAGE ON WAREHOUSE WH_GA4_INGEST TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_GOOGLE_ANALYTICS_RAW_DATA;
If the application name differs in your account (Snowflake occasionally suffixes installs), check the actual name with:
SHOW APPLICATIONS LIKE '%GOOGLE_ANALYTICS_RAW%';
and substitute it in the GRANTs above.
B4. Configure source connection in the GARD UI
Open the connector app: Data Products → Apps → Snowflake Connector for Google Analytics Raw Data → Launch app.
In the connector’s configuration UI:
-
Authentication: select Service Account JSON Key → paste the entire JSON contents (open the key file in a text editor, copy everything from
{to}). The connector stores it as a Snowflake secret; the file should never live on Snowflake VM disk. -
Source:
- GCP project ID: the project ID from the 1Password bundle.
- BigQuery dataset: the GA4 dataset name (e.g.
analytics_123456789). - Region: match the dataset’s BQ region (usually
USorEU— check the BQ console if uncertain).
-
Destination:
- Database:
GA4_RAW - Schema:
EVENTS - Warehouse:
WH_GA4_INGEST
- Database:
-
Tables to sync: start with the two table groups that matter:
events_*(daily-finalized event tables — the canonical source)events_intraday_*(only if A1 confirmed intraday is enabled and the client wants today’s data)
Exclude any tables the BQ admin flagged in the handoff bundle. Skip
pseudonymous_users_*andusers_*unless the client specifically asks — they pull additional data and are usually not needed for event analysis.
After saving, the connector validates the credentials by listing the dataset’s tables. If it errors:
Permission denied on dataset→ IAM grant A3a didn’t propagate yet (wait 60 seconds, retry) or the SA email in the JSON doesn’t match the one granted access (re-check the bundle).Storage Read API quota exceeded→ A6’s quota cap is too tight. Raise it.Invalid JSON→ the paste truncated. Re-copy the full file.
B5. Set sync schedule
In the connector’s Schedule tab:
- Default (GA4 free tier, daily export): set frequency to daily, run time 04:00 UTC (after GA4’s typical 02:00–03:00 UTC export completion). Running earlier risks pulling a partial partition.
- GA4 360 or streaming export enabled: set frequency to hourly for
events_intraday_*and keep daily forevents_*. The connector supports per-table-group schedules. - Aggressive freshness (15-min): only if the client is paying GA4 360 and explicitly needs sub-hour latency. This will cost noticeably more in both BQ Storage Read and Snowflake warehouse credits — get sign-off.
Shred the local JSON key file immediately after B4 succeeds:
rm -P /tmp/gard-snowflake-reader-key.json # macOS
shred -u /tmp/gard-snowflake-reader-key.json # Linux
B6. Run the initial sync on ONE date partition first
Do not kick off a full backfill yet. Run the connector once against the most recent completed daily partition (yesterday’s events_YYYYMMDD) to validate end-to-end fidelity before committing to a multi-month read.
In the connector UI: Sync now → Custom range → Start = yesterday, End = yesterday → Tables = events_* → Run.
When it finishes (a few minutes for a typical day), validate:
Row count match:
-- In Snowflake
SELECT COUNT(*) AS sf_count
FROM GA4_RAW.EVENTS."events_YYYYMMDD"; -- substitute yesterday's date
-- In BigQuery (BQ admin runs this, or run yourself if you have read access)
SELECT COUNT(*) AS bq_count
FROM `<project>.analytics_<property_id>.events_YYYYMMDD`;
The two counts must match exactly. If they don’t, do not proceed — open a support ticket on the Snowflake side; this is usually a partition-cutoff issue.
event_params UNNEST fidelity:
-- Snowflake equivalent of BQ's UNNEST(event_params)
SELECT
event_name,
param.value:string_value::STRING AS page_location
FROM GA4_RAW.EVENTS."events_YYYYMMDD" e,
LATERAL FLATTEN(input => e.event_params) param
WHERE param.value:key::STRING = 'page_location'
AND event_name = 'page_view'
LIMIT 10;
You should see real URLs in page_location. If event_params is NULL or empty, the connector did not preserve the nested structure — check the connector’s “Schema mapping” setting and confirm RECORD ARRAY → VARIANT mapping is enabled (it is by default; explicit-flatten mode is a misconfiguration here).
Spot-check a known event: pick an event you know exists yesterday (e.g. a purchase for an e-commerce client) and confirm a sample row in Snowflake matches the same event_id / user_pseudo_id in BQ.
B7. Backfill the historical range
Once B6 passes, run the historical backfill. Cap the first run.
- First backfill window: last 30 days. Sync now → Custom range → Start = 30 days ago, End = 2 days ago → Run.
- After it completes, check actual cost:
- Snowflake:
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY WHERE WAREHOUSE_NAME = 'WH_GA4_INGEST' ORDER BY START_TIME DESC LIMIT 50; - BQ: GCP console → Billing → cost breakdown → BigQuery Storage Read API line item.
- Snowflake:
- Extrapolate: if 30 days cost $X, the client’s full retention (typically 14 months for GA4 free, longer for 360) will cost roughly
(retention_months / 1) × X. If that’s within the budget approved in Prerequisites, proceed. If not, pause and renegotiate scope with the client. - Second backfill: remaining range. Run in chunks of 90 days at a time so you can monitor cost mid-flight and abort if the rate is wrong.
After backfill, the daily schedule from B5 takes over and keeps the data current.
End of Section B. Confirm with the BQ admin and book the joint validation call (Section C).
Section C: Joint validation (both admins on a 30-min call)
Schedule a 30-min Zoom/Meet with both admins. Screen-share is mandatory — both sides need to see the queries running live.
C1. Row counts match end-to-end
Pick three days at random across the backfill range (one recent, one mid-range, one oldest). For each:
BQ admin runs:
SELECT COUNT(*) FROM `<project>.analytics_<property_id>.events_YYYYMMDD`;
Snowflake admin runs:
SELECT COUNT(*) FROM GA4_RAW.EVENTS."events_YYYYMMDD";
All three pairs must match exactly. If even one is off by more than 0 rows, investigate before signing off — most often it’s a streaming-vs-batch boundary on the most recent day.
C2. event_params UNNEST fidelity walkthrough
Snowflake admin runs the LATERAL FLATTEN query from B6 against a different day. BQ admin runs the equivalent BQ query:
SELECT
event_name,
(SELECT value.string_value FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_location
FROM `<project>.analytics_<property_id>.events_YYYYMMDD`
WHERE event_name = 'page_view'
LIMIT 10;
Both queries should return the same set of (event_name, page_location) pairs for the same user/timestamp. Don’t worry about row order — confirm the values match.
Repeat the exercise for user_properties to confirm that nested field is also intact.
C3. Cost monitoring is armed on both sides
-
BQ admin confirms the budget alert from A6 is active. Show the budget config screen. Confirm the alert recipients are correct (BQ admin + Snowflake admin + Mammoth account lead).
-
Snowflake admin sets a resource monitor on
WH_GA4_INGEST:CREATE RESOURCE MONITOR RM_GA4_INGEST WITH CREDIT_QUOTA = 100 -- adjust to expected monthly credit usage × 1.5 FREQUENCY = MONTHLY START_TIMESTAMP = IMMEDIATELY TRIGGERS ON 75 PERCENT DO NOTIFY ON 90 PERCENT DO NOTIFY ON 100 PERCENT DO SUSPEND; ALTER WAREHOUSE WH_GA4_INGEST SET RESOURCE_MONITOR = RM_GA4_INGEST;Confirm the monitor’s notification recipients include both admins.
C4. Sign-off
Both admins explicitly say “validated” on the call. Mammoth account lead notes the validation date in the engagement tracker. Hand off the daily-monitoring runbook (separate doc) to the client’s on-call rotation.
Cost expectations
| Component | Cost surface | Realistic range (mid-size property, ~1M events/day) |
|---|---|---|
| GARD connector listing | Free | $0 |
| BQ Storage Read API | $1.10 / TB read | ~$10–30/mo steady state, plus a one-time backfill spike (could be $50–200 for a 14-month backfill of a high-traffic property) |
| BQ query (metadata) | $6.25 / TB scanned | Negligible (<$1/mo) — the connector reads via Storage API, not query |
| Snowflake warehouse (WH_GA4_INGEST, XS) | $2/credit × ~1–2 hr/day | $50–100/mo steady state |
| Snowflake storage | $23 / TB/mo (compressed) | $5–20/mo for a 1M-events/day property — GA4 raw compresses ~5–8x in Snowflake |
| Total steady state | $70–150/mo | |
| One-time backfill | BQ Storage Read + Snowflake compute | $100–500 depending on retention and event volume |
For the small-property end (under 100K events/day): expect under $50/mo total steady state.
For the high-traffic end (10M+ events/day): expect $300–1000/mo Snowflake (warehouse may need to scale to S or M for ingest), $50–150/mo BQ.
The Snowflake warehouse is the largest controllable cost. If steady-state usage settles below 30 min/day of compute, drop AUTO_SUSPEND to 30s. If backfills are slow, scale the warehouse up just for the backfill and back down after.
Common gotchas
- GA4 BQ export is daily by default. The connector cannot give you better-than-daily freshness if GA4 itself isn’t streaming. If the client expects “real-time,” confirm streaming export is enabled in GA4 Admin → BigQuery Links before you set the schedule in B5. Otherwise sub-hour syncs will just no-op against unchanged daily tables.
- Intraday is a separate table. Today’s freshest data lives in
events_intraday_<today>, which GA4 recreates daily (drops yesterday’s intraday once that day’sevents_<date>is finalized). If the client cares about today’s data, the connector must includeevents_intraday_*in B4. Without it, “today’s data” never appears in Snowflake. - Nested fields land as VARIANT, not as flattened columns. Downstream consumers expecting
event_params.page_locationas a top-level column will be disappointed. Educate the client (or their analytics team) that they needLATERAL FLATTEN— provide the B6 query as a template. Plan a thin view layer in the dbt engagement to flatten common params. - Daily-sharded table naming is schema-drift-adjacent. Each new day creates
events_YYYYMMDDas a new table. The connector handles this automatically, but if GA4 ever changes its schema (added fields likeis_active_user, etc.), the connector will pick up the new column on the next sync. Your downstream views shouldSELECT *defensively or useOBJECT_KEYS()to avoid breaking on schema additions. - Service account key rotation. GCP keys do not auto-rotate. Set a calendar reminder for the BQ admin to rotate the key every 90–180 days. Rotation procedure: create a new key, paste into the connector (B4), validate one sync, delete the old key in GCP. Do not delete the old key first — the connector will fail until the new one is in place.
- Bill shock failure mode (most important). A naive backfill on a high-traffic property can read multiple TB from BQ Storage API in one go. At $1.10/TB this is bounded but unbudgeted spend will still surprise the client. Always cap the first backfill at 30 days (B7), validate the cost, then expand. A 14-month backfill of a 10M-events/day property could be 5–15 TB read = $5–17 in BQ. That’s tolerable; what’s not tolerable is also paying for an M-sized Snowflake warehouse running for 8 hours straight to ingest it. Pre-size both sides.
- Region mismatch. If the BQ dataset is in
EUand the Snowflake account is inus-east-1, egress works but is slower and incurs Google egress fees on top of Storage Read. For EU-resident clients, prefer a Snowflake EU region. - The
timeZonecolumn in GA4 events is the property’s reporting timezone, not the event timestamp’s timezone. Event timestamps are always UTC (microseconds since epoch inevent_timestamp). Don’t let downstream consumers double-convert.
What this guide does NOT cover
- dbt models on top of the landed data. Flattened fact tables,
event_paramspivots, session reconstruction — separate engagement. - Aggregation or pre-computed marts (daily active users, funnel tables, etc.) — separate engagement.
- GA4 measurement protocol or custom event configuration — that’s GA4 admin work on the client side, out of scope for data engineering.
- BI tool wiring (Looker, Tableau, Hex, Sigma, etc.) on top of the Snowflake landing zone — downstream.
- Identity resolution / user stitching beyond what GA4 itself provides (
user_pseudo_id,user_id) — separate engagement, usually done in dbt. - PII review. GA4 raw data may contain
user_idvalues that are PII (email hashes, customer IDs). Review with the client’s privacy team before granting broad access toGA4_RAW. - Snowflake → Snowflake share-out to other accounts (e.g. agency reporting). Configure separately via Snowflake Secure Data Sharing once the landing zone is stable.
Related
- ../06-reference/2026-04-16-research-bq-to-snowflake-routes — research note that selected GARD over alternatives (Fivetran, Airbyte, custom Storage Read pipeline). Note: file may not exist yet — this is a forward-reference for the upcoming research write-up.
- Snowflake official docs: https://docs.snowflake.com/en/connectors/google/gard/gard-connector-about
- Snowflake Marketplace listing: https://app.snowflake.com/marketplace (search “Google Analytics Raw Data”)
- Adswerve practitioner write-up on GARD vs custom pipelines (cited in the research note)
- mcp-server-setup — internal pattern for credentials handling (1Password-first); the same “no secrets on disk” principle applies to the JSON key in this SOP
- GA4 BigQuery export schema reference: https://support.google.com/analytics/answer/7029846