“The Analytical Skills No One Teaches You” — Olga Berezovsky (guest on @SeattleDataGuy)

Why this is in the vault

Guest post from Olga Berezovsky (analytics leader). Skills-focused, not a pipeline piece — skipping ahead of the SDG series. Direct relevance to my COO role: these are the practices that separate “produces numbers” from “produces decisions,” and several of them map cleanly onto disciplines we’re already running (BiasAudit) or should tighten (baseline awareness).

No third-party ad placements in this issue. The “sponsor slot” at the top goes to Olga’s own newsletter as a cross-promo, which SDG discloses as a guest-author host. Not bias in the commercial sense, but worth noting: the content is framed by a host with a relationship to the author.

The bottom curation section is partially self-promotion — the second “Article Worth Reading” is SDG’s own prior article (“What It Actually Takes to Build a Data Pipeline System”). The skill should detect and label self-cross-promo vs genuine third-party links.

The four skills

1. Analytical intuition

How to estimate when you don’t have data. “How many windows in NYC?” is testing whether you can build a ballpark from proxies and scaling logic.

Key habits:

Set ranges from proxies, then scale — start with something you do know.
“What goes up must come down” critical thinking — natural variance means you should see counterbalancing movement somewhere.
Every metric is a fraction of a whole — if payment success = 25%, failure = 75%. Cross-check against the related metric.
Pull 5-10 random rows and manually eyeball them. “Don’t trust tooling or automation.”

2. Root cause analysis

Start with a baseline. Unexpected ≠ just “different from last week”; it’s “different from the modeled expectation given seasonality, product state, and cohort mix.”

Process:

Confirm the data is real — find ≥2 independent sources showing the same movement. Suspect broken ETL, holiday anomalies, cohort shift before believing the story.
Generate hypotheses across four classes:
- Product — bug or launch. Sharp drops = bug. Gradual rollout decay = release.
- Market / competition — new entrant, shifted acquisition strategy. Gradual.
- User / persona — cohort mix shift. Often inconsistent; hard to catch early.
- External — pandemic, war, social moment. Sharp cross-platform effects.
Prove/disprove each against the data before escalating to the owning team.

3. Developing a KPI

Metric taxonomy:

Top-level — strategic direction, monthly/quarterly cadence.
North Star — one company-wide goal. Must lead to revenue, reflect customer value, and measure progress (Mixpanel’s three-part definition).
Secondary — granular health indicators. Weekly cadence. Sensitive to changes so they can catch A/B and bug impact.
Vanity — impressive but not actionable. Followers, registered users. “Only working your arms at the gym.”
OMTM (One Metric That Matters) — temporary unifying goal during a crisis or migration.

A good metric is: relevant (represents the result you actually want), measurable, specific, prioritized, balanced (positive and negative outcomes).

Four categories of metric math: sums/counts, distributions, probabilities/rates, ratios. Olga provides long example lists across Growth, Revenue, Engagement, Customer Success, and Platform/Engineering domains.

4. KPIs done wrong

Don’t overthink. Unique views + CTA clicks + CTA conversion is often enough.
Effective proxies must be sensitive and independent. If calculating the proxy is complex, it’s a bad proxy.
Don’t stress about benchmarks. Focus on your own trajectory (Signup-to-Paid MoM).
Don’t port metric definitions from your previous job. Every product has a unique user lifecycle.

Mapping against Ray Data Co

Direct applicability is high. Taking each skill in turn:

Analytical intuition “pull 5-10 random rows” = our BiasAudit spirit. BiasAudit exists precisely because we don’t trust a high-level number without manually checking the components. eq3’s survivorship failure was caught by this discipline. We should keep the habit explicit in any new experiment.
Baseline-aware RCA is the discipline I was missing when I parrot-repeated the “65-89 bucket at 93.3%” working-context number for PM1e. Unexpected ≠ model output; unexpected = movement vs the modeled baseline. Lesson relearned this morning.
Hypothesis classes for failure diagnostics (product/market/user/external) map onto our own investing work. When a strategy underperforms, the candidate causes are: signal bug (product), regime shift (market), universe drift (user/cohort), macro event (external). Four buckets. I’ll lift this taxonomy into the autoinv BiasAudit or a post-mortem template.
KPI taxonomy — we don’t really have KPIs at RDCO yet because the project is still in the “build the lab” phase. When we ship (paper-trade a strategy, publish a data product, launch a newsletter, run a store), the first question is “what is the north star and what are the secondary metrics?” and Olga’s framework is the right starting point.

2026-04-07-seattle-data-guy-noisy-data-quality-checks — “fewer better gates” is the check-side of the “baseline-aware RCA” coin
../01-projects/automated-investing/autoinv/README — where BiasAudit lives
2026-01-14-seattle-data-guy-build-a-pipeline-system — the self-promo’d article in the curation section below

Curation section — notes

Apache Hudi at Uber: Engineering for Trillion-Record-Scale Data Lake Operations — genuine third-party link, Uber engineering blog. Interesting but not load-bearing for us; we operate at trivial scale vs Uber. Skipping deep-dive.
“What It Actually Takes to Build a Data Pipeline System” — this is SDG’s own prior article, not real curation. Already filed as 2026-01-14-seattle-data-guy-build-a-pipeline-system.