While in the past speed was essential to keep things running and deliver insight, there’s a tipping point where quickly developed models do not scale anymore. Having a framework to standardize our transformations is key to reduce the risk of isolating ourselves in different parts of the company. (View Highlight)
Note: Unified workflows reduces risk of isolating knowledge in silos
While working our way upstream, we tried to create a single point of failure to align on definitions, reduce maintenance and allow for easy monitoring. This meant setting our definitions in a single place as far upstream as possible, causing all of our downstream models to consume the same definition (View Highlight)
People make definitions quickly and according to their specific needs, which isolates a metric into a specific model. As time passes, some metrics provide valuable insight, while others are forgotten. As the amount of models continues to grow, you end up in a situation where definitions are all over the place in downstream tables and “technical debt” accumulates. (View Highlight)
Note: Bad practice setting definitions for valuable columns too late in the process
The raw layer only contains the imports from our data generators and is the place where most of the data cleaning is done (timezone alignment, NULL checks etc.). A good example is our Snowplow data (click stream), booking data or Google Ads data.
All definitions are set in the derived layer. E.g. what is a “session”?, how do we measure “revenue”? And what is a “search interaction”?.
This causes us to only join derived tables in the “granular layer”, which is the first layer exposed to other teams and visualization tools outside of the data-warehouse.
The aggregate layer collects tables from the granular layer to provide more convenient and fast tables to interact with for important dashboards. (View Highlight)
We know that things usually break because of one of these factors:
underlying input is changing
we make changes to our models which have unexpected side effects
our understanding of the input is changing (View Highlight)