The Data Team: An Short Story

Metadata

Author: erikbern.com
Full Title: The Data Team: An Short Story
Category: #articles
URL: https://erikbern.com/the-data-team-a-short-story.html

Highlights

This is basically a (somehat cynical) depiction of things that may happen at a lot of companies early in the data maturity stage: Lack of data, and fragmented data The product is poorly instrumented so data often doesn’t exist in the first place A fragmentation of data systems, with data spread out over many different ones Brittle business processes driven by data but with little or no automation An unclear expectation of what the data team’s job is supposed to be Data scientists hired to do R&D and figure out some way to deploy AI or whatever — as a result not having any clear business goal Data team complaining about it being hard to productionize ML, yet the product team doesn’t really seem to care about the feature People coming to the data team wanting them to be a “English to SQL translators” A product team not trained to be data driven Product managers not thinking about data as a tool for building better features A lack of alignment between what product teams want to build versus what data teams have A culture that fundamentally is at odds with being data driven A culture of celebrating shipping, versus celebrating measurable progress and learnings To the extent teams actually use metrics, they are inconsistent, poorly measured, and in some cases at conflict with other teams No data leadership A fractured data org with various data people reporting into other functional areas Other departments not getting the help they need, so they work around the data team and hire lots of analysts Lack of standardizations of toolchain and best practice Wow! This is depressing! (View Highlight)
- Note: Hitting a little too close to home
You’re starting to lay the most basic foundation of what is most critically needed: all the important data, in the same place, easily queryable. Opening up SQL access and training other teams to use it means a lot of the “SQL translation” goes away. At this point, the big challenge is organizational. You’re starting to centralize a data team, but other teams still struggle to work with the data team, and will build around it in many cases. This will cause brittle processes elsewhere, and friction with the data team. You need to start paying attention to this before it brings the business to gridlock. A good thing is to some extent your results drive organizational centralization in itself: the junior data scientists in the marketing team ends up moving into the centralized data team because she wants to work for you. This is the first step towards centralizing the reporting structure, but keeping the work management decentralized. Fig 2: Data team with centralized backlog vs decentralized backlog (View Highlight)
bad. The CMO emphasizes that the numbers are “still baking” (View Highlight)
- Note: I like the fallout of terminology in this article. “still baking”
The good news is that the product team is starting to experiment with A/B tests. The bad news is that it’s ignoring the results and that projects seem mostly driven by milestones and artificial deadlines. The excellent news is the CEO is pushing for teams to use data as the truth. Once there is an organizational pressure to be more data driven, this is a time to accelerate the way the data team works with other teams. In particular, people at the highest level will start to focus more on metrics, and it’s your responsibility that the data team works with them on it. One simple thing that goes a long way is to work with every team and make sure they have their own dashboard with the top set of metrics they care about. Fig 3: Different services for different layers of the org drives the most progress (View Highlight)
note what’s happening with the customer support team. The journey is roughly: That team started out with their own “business analysts” (outside the data team) but need the data team to run queries for them to get data Those business analysts are starting to run queries themselves with the help of the data team They start to build up “shadow tech debt” (in this case monster SQL queries) which first causes a bunch of friction with the data team The data team starts embedding into the team and helping them get to a better place Because of the embedding, the need for business analysts goes down and data scientists goes up Over time, certain functionality (reports and analytics that aren’t ad-hoc) move to the product engineering team It’s tempting to try to prevent people outside the data team to do things by putting very strict guardrails on access to data. I think in almost every case, this is a bad idea (with the exception of any major security concerns). People are generally mostly rational, and do things that generate positive ROI for the business. (View Highlight)
A final note is that you took on a lot of “tech debt” earlier when you started dumping the production database tables straight into the data warehouse. Data consumers downstream will have SQL queries that break a lot. Over time, you’re going to have to add some sort of layer in between, that takes the raw data from the production database and translates it into various derived datasets that are more stable and easier to query. This will be a LOT of work to do right. It’s probably also needed for security reasons: you need to strip out lots of PII in the production data. (View Highlight)
Metrics are defined in such a way that people feel a responsibility for generating business value. The data culture is driven both from above (the CEO pushing for it) as well as from below (people in the trenches). It’s OK to fail if at least you learned something from it. (View Highlight)