The Gerrymandering of the Modern Data Stack — Categories Shape What We See
Drawing from Robert Sapolsky’s lecture on categorization, Benn Stancil argues that the categories we impose on the data tool landscape are arbitrary and consequential — they shape what we can see and build.
Core mental model
“The categories we create, though necessary to keep us from being overwhelmed by this infinite spectrum, affect what we can actually see. The artificial boundaries we define eventually come to define us.”
Applied to data: the line between BI reporting and analytical research is not a hard line — it is a fluid, shifting continuum. Answers to one-off questions become recurring reports. Reports feed executive dashboards. Dashboards get enriched with ML forecasts. Changing forecasts generate questions needing bespoke SQL. Elements from all of these flow into sales decks, operational systems, and customer-facing data.
The consumption layer problem
The traditional architecture splits data consumers along a technical boundary: SQL/Python users (analysts, data scientists) vs. code-free BI users (“business users”). But this is not how data is actually consumed. The boundary is technical, not experiential.
Implication
Tool boundaries should follow user experience, not technical implementation. The gerrymandering of categories creates artificial friction in workflows that are naturally fluid.
Connects to analytics craft, data team operations, analytics is a mess.
Open questions
- What would a “category-free” data consumption layer actually look like?
- Are we making the same categorization mistakes in how we define data team roles?