DEDP 6.2 — Dynamic Query Design Pattern
The most directly actionable DEDP in the book. Solves the fundamental tension between flexibility (answer any question now) and performance (answer it fast). Every data platform eventually hits this wall — this chapter gives you the architecture to resolve it.
The Problem
Three forces collide:
- Visualization gap — raw numbers lack context; comparisons across dimensions reveal patterns single metrics cannot show
- Performance requirement — “if the answers are slow, the ad-hoc loses its appeal”; doubly true when AI agents query databases in real-time
- Flexibility paradox — you cannot pre-compute every possible aggregation, but computing everything at query-time is too slow
Without a pattern, orgs default to data lakes where “it’s really hard to query data quickly” — requiring JSON parsing, domain knowledge, and patience.
The Core Tension: Pre-computation vs. Query-time
The chapter traces an evolution through this tension:
1980s DWH (caching aggregations) → Materialized views → OLAP cubes → Modern OLAP (ClickHouse, Druid) → DuckDB → RAG pipelines
Each step enables more ad-hoc querying but introduces new tradeoffs. “We want the simplicity of a database, but the speed of a cache or OLAP cube.” This connects directly to the cache pattern in 06-reference/2026-04-04-dedp-cache-pattern and the materialization strategies in 06-reference/2026-04-04-dedp-mv-obt-dbt-olap-dwa.
Solution: Universal Analytics API
Four integrated components:
1. No-code Interface
Domain experts produce code artifacts (YAML, extended SQL) without writing implementation code. Democratizes metric definition beyond engineering teams.
2. Variables and Templates
Reusable, parameterized business logic. Prevents metric duplication across consumers. This is the Template Parameterization sub-pattern from 06-reference/2026-04-04-dedp-data-asset-reusability-pattern applied at the metrics layer.
3. Logical Data Model
Facts and dimensions decoupled from physical storage. “Add a new dimension without re-running any pipeline.” The system compiles logical definitions into optimized SQL at query-time. This is 06-reference/2026-04-04-dedp-semantic-layer-bi-olap-virtualization made operational.
4. Materialization and Cache
Pre-aggregation for sub-second queries. Autonomous cache invalidation — updates only affected materializations when metric definitions change, not entire warehouses. Built on the 06-reference/2026-04-04-dedp-cache-pattern.
Implementation Steps
Step 1: Choose an OLAP Engine
Depends on data scale, deployment complexity, and whether metrics layers are built-in:
| Scale | Options |
|---|---|
| Lightweight / embedded | DuckDB, MotherDuck |
| Semantic OLAP | Cube, Rill |
| High-scale dedicated | ClickHouse, Druid, Pinot, StarRocks |
| Cloud OLAP | Firebolt |
Step 2: Define Metrics Location
Select a semantic/metrics layer. Example from Rill:
type: metrics_view
model: ad_bids
dimensions:
- name: publisher
expression: toUpper(publisher)
measures:
- name: total_revenue
expression: SUM(revenue)
Selection heuristics:
- Nested/JSON data → Malloy
- Large enterprise (tabular) → Cube
- dbt warehouse users → dbt Semantic Layer / MetricFlow
- Inside Looker → LookML
- Early-stage → Boring Semantic Layer, DuckDB macros
- AI/LLM workflows → Rill (MCP), Cube (AI API)
Step 3: Enable Unified API
REST, GraphQL, SQL, MCP — consistent metric access across notebooks, dashboards, BI tools, and AI agents.
Common Pitfalls
- Over-engineering — start with SQL and materialized views; add semantic layers only when metric inconsistencies across tools create business friction
- Reinventing caching — use integrated caching features; Cube replaced Redis with its own Cube Store for good reasons
- Manual security — use semantic layer built-in row-level security rather than repeating ACL logic across queries
Real-World Implementations
Google Dremel (2010): Columnar storage + multi-level execution trees + in-situ querying. Became BigQuery. Influenced Drill, Snowflake, Databricks.
Netflix DataJunction: Semantic graph with SQL parsing/generation. “Reduced metric onboarding from weeks to near-trivial effort.” Originated from Facebook’s internal MDF project (~2015).
Airbnb Minerva: 12,000+ metric definitions, 4,000+ dimensions in declarative YAML. Split-apply-combine strategy. Exposes MySQL protocol — any SQL tool consumes metrics without knowing physical storage.
Meta Scuba: In-memory OLAP handling ~1M queries/day at sub-second latency across hundreds of 144GB RAM servers. Accessed through Daiquery unified notebook.
Apache Gravitino: Federated metadata lake with Schema Registry and Hive Metastore support. Enables metadata-driven actions (compaction, TTL, PII identification).
Key Takeaway
The Dynamic Query pattern is not “pick one tool.” It is: logical decoupling (semantic layer) + autonomous caching (materialization triggered by model changes) + unified API (consistent access) + incremental adoption (start simple, add complexity when justified).
“There’s no such tool that solves all” — architecture choices depend on scale, existing tooling, and data characteristics. The pattern gives you the structure; the tools fill the slots.
Connections
- Cache foundation: 06-reference/2026-04-04-dedp-cache-pattern
- Materialized view convergence: 06-reference/2026-04-04-dedp-mv-obt-dbt-olap-dwa
- Semantic layer as logic encapsulation: 06-reference/2026-04-04-dedp-semantic-layer-bi-olap-virtualization
- Reusability sub-patterns: 06-reference/2026-04-04-dedp-data-asset-reusability-pattern
- DWH/Lake/CDP as underlying platforms: 06-reference/2026-04-04-dedp-dwh-mdm-datalake-reverse-etl-cdp
- Pattern intro: 06-reference/2026-04-04-dedp-design-patterns-intro
- Craft framing: 06-reference/concepts/analytics-as-craft