The Next Big Challenge for Data Is Organizational — Four Principles of Scale
Software engineering has solved the organizational scaling problem through four principles. Data teams need to adopt the same playbook.
Four principles for scaling data organizations
- Specialization — clear, well-understood roles with defined boundaries. Frontend/Backend is the obvious software parallel. Data needs its own version.
- Modularization — break problems into self-contained, extensible “lego-able” chunks. No chunk repeats major work from another. Shared code is separated out. Enforced via services/APIs at the code level, and team structure at the org level. “Who owns what” must be clear.
- Clarity — interactions between modules/teams have clear contracts at pass-off points. APIs are the contract in code; team agreements are the contract organizationally. Upstream teams must understand who depends on their code and communicate regularly. Breaks are done with fair warning.
- Buy-in — shared cultural expectations. In software, everyone agrees that shipping hacky code is faster short-term but slower long-term. Data teams need the same: convincing data consumers of the value of “going slow to go fast” is key.
The meta-challenge
Data/information architecture is an entire-company problem. Solving the “entire company” problem — getting non-data roles to value scalable data systems the same way they value scalable software — is the real holy grail.
Connects to Uber data culture, downfall of the data engineer, E-Myth Revisited (modularization = franchise prototype thinking), systems over goals.
Open questions
- What does “modularization” look like for analytics work specifically? dbt models? Metric definitions?
- How do you win organizational buy-in when stakeholders demand data “right now”?