My interest here is in the second definition: a decentralized architecture in which teams are responsible for their own data products. The technical definition doesn’t resonate with me (although it’s not clear to me whether that’s more about me or the idea itself). But the distributed-ownership-of-data-products-by-distinct-teams thing… well, that’s important. (View Highlight)
Note: Early data mesh thoughts
High-functioning technology organizations largely devolve ownership of decisions down to the team level, or as close to the team level as they can get, to ensure that the folks who understand the problems are empowered to create appropriate solutions for them. Teams can then identify, write code to address, and ship solutions to solve customer problems without getting approval for their decisions from some central authority that would inevitably make the process slower and worse. (View Highlight)
Note: Empower the team to address problems with code - centralized governance will make decisions slower and worse
All of these teams and their component humans are, in essence, jointly evolving a single collective codebase. Even if engineers are working across hundreds or thousands of code repositories, there are expectations that these repositories will interconnect with one another in reliable ways. They’re all part of a … dare I say it … mesh.
What we observe here is distributed production and governance of a shared resource: software code.
I share all of this because what we are actually looking for, yearning for, as a data industry right now is distributed production and governance of a shared resource: knowledge. (View Highlight)
All centralized / top-down modes of organizational control over what constitutes knowledge have failed in predictable ways (velocity, responsiveness, general dysfunction). We lack good end-to-end distributed production / governance / consumption paradigms currently to enable all stakeholders to jointly steward this resource. (View Highlight)
Problem #1: Things Breaking
This manifests in two ways. First, bugs inside Team A’s code can have cause unpredictable (and hard-to-trace) downstream bugs in Team B’s code. Second, changes in Team A’s code (that are 100% intentional) can cause similarly unpredictable and hard-to-trace downstream bugs in Team B’s code. (View Highlight)
Note: Unintended side effects from changes in one project to another.
Software engineers deal with these problems in a few ways.
Extensive testing and code coverage metrics. If you can’t show how battle-hardened your code is others won’t trust it.
Semantic versioning, version-aware package managers, multiple cloud API versions, and upgrade / deprecation procedures. Team B doesn’t allow Team A to just change the functionality of their dependent code without making a proactive, well-considered decision to upgrade to the newest version.
Public / private interfaces. Downstream code is forced / encouraged to use specific integration points / APIs that are intended to be supported in a stable way. Libraries maintain internal logic and state that they can change without fear of breaking downstream dependencies. (View Highlight)
Note: How to handle change and prevent unexpected breaking
Problem #2: Infra and Deployment
If Team A maintains a codebase that Team B interfaces with, Team B not only needs to know how to build a test environment for their codebase…they also need to know how to build a test environment for Team A’s codebase. Often this requires Team B to know how to deploy Team A’s code!
This is one of the great benefits of the Docker + Kubernetes combo. In an idealized version of the “infrastructure as code” world, Team A actually ships their code along with a Docker image and Team B can incorporate that into their own build tooling. (View Highlight)
Problem #3: Knowledge about the Code
If Team A’s codebase is used by Teams B through Z, and every time those teams have a question about it they are forced to ask Team A for help…well, Team A is never going to get any work done. Software engineers have done a ton of work to help downstream users of codebases understand how to correctly interact with their work:
Automatic documentation, from things like javadoc to swagger and automated API documentation products that create amazing developer experiences.
Conventions that emphasize writing programs as a mechanism to communicate about their functionality to other humans. (View Highlight)