DEDP 6.2 — Dynamic Query Design Pattern

The most directly actionable DEDP in the book. Solves the fundamental tension between flexibility (answer any question now) and performance (answer it fast). Every data platform eventually hits this wall — this chapter gives you the architecture to resolve it.

The Problem

Three forces collide:

Visualization gap — raw numbers lack context; comparisons across dimensions reveal patterns single metrics cannot show
Performance requirement — “if the answers are slow, the ad-hoc loses its appeal”; doubly true when AI agents query databases in real-time
Flexibility paradox — you cannot pre-compute every possible aggregation, but computing everything at query-time is too slow

Without a pattern, orgs default to data lakes where “it’s really hard to query data quickly” — requiring JSON parsing, domain knowledge, and patience.

The Core Tension: Pre-computation vs. Query-time

The chapter traces an evolution through this tension:

1980s DWH (caching aggregations) → Materialized views → OLAP cubes → Modern OLAP (ClickHouse, Druid) → DuckDB → RAG pipelines

Each step enables more ad-hoc querying but introduces new tradeoffs. “We want the simplicity of a database, but the speed of a cache or OLAP cube.” This connects directly to the cache pattern in 06-reference/2026-04-04-dedp-cache-pattern and the materialization strategies in 06-reference/2026-04-04-dedp-mv-obt-dbt-olap-dwa.

Solution: Universal Analytics API

Four integrated components:

1. No-code Interface

Domain experts produce code artifacts (YAML, extended SQL) without writing implementation code. Democratizes metric definition beyond engineering teams.

2. Variables and Templates

Reusable, parameterized business logic. Prevents metric duplication across consumers. This is the Template Parameterization sub-pattern from 06-reference/2026-04-04-dedp-data-asset-reusability-pattern applied at the metrics layer.

3. Logical Data Model

Facts and dimensions decoupled from physical storage. “Add a new dimension without re-running any pipeline.” The system compiles logical definitions into optimized SQL at query-time. This is 06-reference/2026-04-04-dedp-semantic-layer-bi-olap-virtualization made operational.

4. Materialization and Cache

Pre-aggregation for sub-second queries. Autonomous cache invalidation — updates only affected materializations when metric definitions change, not entire warehouses. Built on the 06-reference/2026-04-04-dedp-cache-pattern.

Implementation Steps

Step 1: Choose an OLAP Engine

Depends on data scale, deployment complexity, and whether metrics layers are built-in:

Scale	Options
Lightweight / embedded	DuckDB, MotherDuck
Semantic OLAP	Cube, Rill
High-scale dedicated	ClickHouse, Druid, Pinot, StarRocks
Cloud OLAP	Firebolt

Step 2: Define Metrics Location

Select a semantic/metrics layer. Example from Rill:

type: metrics_view
model: ad_bids
dimensions:
  - name: publisher
    expression: toUpper(publisher)
measures:
  - name: total_revenue
    expression: SUM(revenue)

Selection heuristics:

Nested/JSON data → Malloy
Large enterprise (tabular) → Cube
dbt warehouse users → dbt Semantic Layer / MetricFlow
Inside Looker → LookML
Early-stage → Boring Semantic Layer, DuckDB macros
AI/LLM workflows → Rill (MCP), Cube (AI API)

Step 3: Enable Unified API

REST, GraphQL, SQL, MCP — consistent metric access across notebooks, dashboards, BI tools, and AI agents.

Common Pitfalls

Over-engineering — start with SQL and materialized views; add semantic layers only when metric inconsistencies across tools create business friction
Reinventing caching — use integrated caching features; Cube replaced Redis with its own Cube Store for good reasons
Manual security — use semantic layer built-in row-level security rather than repeating ACL logic across queries

Real-World Implementations

Google Dremel (2010): Columnar storage + multi-level execution trees + in-situ querying. Became BigQuery. Influenced Drill, Snowflake, Databricks.

Netflix DataJunction: Semantic graph with SQL parsing/generation. “Reduced metric onboarding from weeks to near-trivial effort.” Originated from Facebook’s internal MDF project (~2015).

Airbnb Minerva: 12,000+ metric definitions, 4,000+ dimensions in declarative YAML. Split-apply-combine strategy. Exposes MySQL protocol — any SQL tool consumes metrics without knowing physical storage.

Meta Scuba: In-memory OLAP handling ~1M queries/day at sub-second latency across hundreds of 144GB RAM servers. Accessed through Daiquery unified notebook.

Apache Gravitino: Federated metadata lake with Schema Registry and Hive Metastore support. Enables metadata-driven actions (compaction, TTL, PII identification).

Key Takeaway

The Dynamic Query pattern is not “pick one tool.” It is: logical decoupling (semantic layer) + autonomous caching (materialization triggered by model changes) + unified API (consistent access) + incremental adoption (start simple, add complexity when justified).

“There’s no such tool that solves all” — architecture choices depend on scale, existing tooling, and data characteristics. The pattern gives you the structure; the tools fill the slots.

Connections

Cache foundation: 06-reference/2026-04-04-dedp-cache-pattern
Materialized view convergence: 06-reference/2026-04-04-dedp-mv-obt-dbt-olap-dwa
Semantic layer as logic encapsulation: 06-reference/2026-04-04-dedp-semantic-layer-bi-olap-virtualization
Reusability sub-patterns: 06-reference/2026-04-04-dedp-data-asset-reusability-pattern
DWH/Lake/CDP as underlying platforms: 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp
Pattern intro: 06-reference/2026-04-04-dedp-design-patterns-intro
Craft framing: 06-reference/concepts/analytics-as-craft