When you first start learning about data governance, it often seems like a hairball of tightly knit ideas where you can’t understand any one piece until you’ve studied and learned the whole thing. I’m not an expert by any stretch, but I’ve wrestled with learning about data governance long enough to find a way of approaching it that has worked reasonably well for me. In this post, I want to share this approach with anyone else who’s struggling to untangle it.
There are many different definitions of data governance, and even more sources that write about it without ever defining it. I think these are written for readers who are already familiar with the general notion of organizational “governance”, and can just adjust their definition to apply to data. For the rest of us, the DAMA’s reference book, the Data Management Body of Knowledge has the simplest definition I’ve found: data governance “establishes a system of decision rights over data.”
But rather than just thinking about “rights”, I like to think of data governance in terms of accountability: Whose job is it to ensure that your organization carries out each of the activities needed to effectively manage and use data?
Most, though not all, of these activities will be making decisions. And your organization should already have structures in place for assigning accountability for different types of activities, including decision making. Many data governance frameworks come with their own accountability structures specifically for activities related to data that can theoretically be layered on top of this. But I think it’s much more natural to first try to extend your existing governance to as many data-related activities as possible, then look to data governance frameworks for hints on the gaps that remain.
The first difficult part is to figure out what all those activities are. Every person who touches data will have a slightly different view of the processes involved, and thus a different list of activities focused primarily on the ones that immediately impact their ability to do their job. Below, I’ll present a framework of categories that will help identify others that you and your colleagues might not have thought of.
The second difficult part is to decide how accountability should be assigned. For each one, a single should be accountable for not only making sure it gets done, but making sure that the resources and processes are in place to do it. For many types of activities, your organization may already have a clear structure of accountability. For others, there may be natural ways to incorporate them into your existing structures.
The rest of the activities might be things that you could assign to roles called “data steward”, “data owner”, etc. But it will be much easier to define these roles if you have a list of responsibilities that need a name. The standard data governance approach of starting with the name and trying to assign responsibilities will often leave you going in circles. So in this post, I’ll present a framework for identifying data-related activities. The rest is up to you.
Want to stay up to date with new posts? Sign up for my newsletter to get notified in your email inbox each time I post, with a short summary, links to additional reading and a preview of upcoming posts. I’ll also include questions and feedback from subscribers and occasional shorter pieces on topics that I haven’t yet boiled down into a blog post. Visit my newsletter to see past posts and subscribe.
The Tragedy of the Commons
This post differs from my last series of posts in two important ways: First, the previous posts focused on design decisions for specific applications and use cases. These decisions fall under data governance, but they’re only one part of it. There are many more decisions that address behind-the-scenes issues you might take for granted when looking at the application itself. Many of these decisions apply across multiple use cases, and often fall victim to the tragedy of the commons, where everyone expects someone else to take care of a shared resource. By explicitly identifying and calling out these activities, you reduce (or ideally eliminate) this problem.
Second, the previous posts addressed the question of how the software should be designed. This post is explicitly focused on the human component, particularly the organizational processes and structures that define how people will use and support the software. For software engineers, it’s easy to forget this side of the equation. But it doesn’t matter how good the software is without organizational structures and processes that ensure the software is used effectively.
In order to identify the data-related activities that someone in your organization needs to be accountable for, we’ll divide them into six categories along two axes. I’ll list some common types of activities in the following sections, but the categories should also help you come up with your own:
Strategic/Tactical/Operational: These levels are fairly common in data governance literature and more generally. The earliest appearance I know of is the Amsterdam Information Model (1997), but they may predate that.
- Strategic activities drive decisions that apply to the entire organization, addressing general policies, direction and priorities, including which types of activities the organization will and won’t pursue.
- Tactical activities drive decisions about individual projects, applying strategic-level decisions and policies to determine how the chosen activities will be implemented.
- Operational activities implement the tactical-level decisions to manage data or carry out projects. This is where the rubber meets the road.
Defensive/Offensive: These columns were defined (in this context) by Leandro DalleMule and Tom Davenport in an HBR article in 2017. He was mostly addressing the strategic layer, but as we’ll see below the differentiation is useful for all three layers.
- Defensive activities protect the organization by ensuring it uses data within legal, contractual and ethical standards, and in ways that ensure long-term utility. These tend to be related to data sources rather than projects, particularly shared datasets.
- Offensive activities address how to create value by enabling and driving applications of data. These tend to focus on projects and analysis, but they may also address datasets that will primarily be used for a single specific project.
These two dimensions define six categories of activities that someone should be accountable for. As noted above, each individual in an organization may only think about one or two of these categories that impact their daily work. But by explicitly considering all six categories, all these individuals can come to a shared understanding of the activities involved, and agree on who should be accountable for each.
The chart below includes examples of the types of activities you may identify in each of the six categories.
|Strategic||– Propose and drive adoption of internal policies and ethical standards for use and access for all data or particular categories of data.|
– Propose and drive adoption of common data quality frameworks.
|– Identify, determine and communicate the value of datasets or data sources.|
– Identify, determine and communicate the value of data-enabled projects.
|Tactical||– Determine rules for approving access to individual datasets.|
– Ensure that individual projects consistently follow appropriate security and privacy standards.
– Define data quality metrics and criteria for individual datasets.
|– Ensure that users and individuals are granted access to the data they need with minimal process and overhead.|
– Define and implement schemas, conventions and processes.
|Operational||– Grant or deny access for individuals to particular datasets.|
– Maintain accuracy and completeness of internally generated data.
– Measure and mitigate accuracy/completeness issues in externally sourced data.
|– Communicate analysis and other results to appropriate stakeholders.|
– Share data products and applications with stakeholders and users.
As you look through these examples, there are a few things you might notice. The first is that many of the activities in the strategic column are things that your organization probably already has processes for. Since these are value-generating, they’re both easier to identify and easier to prioritize. As a result, most data governance frameworks focus on the defensive column since it’s more likely to get neglected. But approaching both columns in the same framework both reduces the chances of activities falling through the cracks and makes it easier to balance relative priorities between the two.
The second thing to notice is that the defensive column is mostly split between 1) Managing access to data and 2) defining data quality. It might make sense to split these out into separate columns. However, there may be other defensive activities that don’t fall naturally into one or the other, so I prefer to leave them in one general category.
The final thing to notice is that you often see a version of each of these general ideas in multiple different categories. For example for controlling access to data, you have defining organization-wide policies in strategic defensive, defining access policies for individual datasets in tactical defensive, then granting access to individual users in operational defensive. While all these activities ultimately decide who has access to the data, they’re very different types of activities and you’ll probably have different people accountable for each. I’ve seen conversations go in circles when you either don’t differentiate between them, or don’t clarify which level you’re talking about.
The chart also distinguishes between preventing inappropriate access on the defensive side and ensuring appropriate access on the offensive side. You can think of these as the angel and demon on the shoulders of the organization – one promoting caution and the other pushing to get things done. Often, having separate people accountable for each of these roles will lead to a better compromise than having one person play both sides.
The goal of the discussion above is to help you start thinking about the activities your organization needs to do to manage and use data. The six categories should help you begin identifying activities that you hadn’t thought about before. Once you have that list, you can begin matching up the activities to processes and accountability structures within your existing organization, or examine data governance frameworks for ways to extend them.