In my last post, I discussed an idea called Requirement Diameters – the distance between all the lines of code that enforce a given software requirement – and the coding principle that these diameters should be kept as small as possible, particularly for requirements that are more likely to change. In this post, I will explore how introducing layers of abstraction into your code can impact requirement diameters. In particular, I’ll argue that layers of abstraction define subsets of requirements and allow code to follow natural boundaries between these subsets. And while the benefits of clean abstractions are well established, this perspective provides a new way of understanding how to design your abstractions, and why it matters.
Example
Let’s say we’re writing a script to process tabular files, which can either be CSV or Excel files. To avoid duplicate code, we want to share as much code between the logic that opens CSV files and the logic that opens Excel files. In an object-oriented language, we can do this by creating a shared parent class called something like TabularFile
, with subclasses CsvFile
and XlsxFile
. This splits our code into two layers of abstraction: a more abstract layer containing TabularFile
and a less abstract layer containing both CsvFile
and XlsxFile
.
From the perspective of requirement diameters, each of the three classes enforces a different set of requirements: The two subclasses enforce requirements about the specific file formats they support as well as shared requirements related to the underlying tabular data. The parent class only enforces requirements that apply to both subclasses. Ideally, if we’ve factored things carefully, TabularFile
should enforce every requirement that is shared between the two subclasses while the subclasses should only address requirements specific to their file type, while inheriting the logic for the shared requirements.
Because of the way object-oriented code works, the diameters of any requirement that are enforced in the parent class and inherited by the subclasses will not extend into the subclass code. Similarly, the diameters of any requirements that are only enforced in one of the subclasses will not extend outside of it. So the layers of this abstraction cut between these requirements, limiting the diameter of each requirement to within a single layer. And even though this discussion has been in terms of object-oriented classes, you could implement the same layers of abstraction with functions or other constructs.
We could go a step further and split the TabularFile
class into two layers of abstraction by creating a parent class for managing files in general. So the code for the parent class File
addresses requirements related to opening and closing files, while the code for the subclass TabularFile
enforces requirements related to the tabular nature of the data. Each of these two classes will be smaller than the original, so the requirement diameters will be as well.
As in the last post, there will always be high-level requirements that cut across layers of abstraction. In fact, these requirements will often define the structure that the layers of abstraction will conform to. If these high-level requirements change, your chosen scheme of layers may need to change as well, but this is always a risk with requirement diameters.
Layers as Subsets of Requirements
The concept of abstraction is subtle enough that defining it in its full generality is a hopeless goal. But the example and discussion above suggests a definition that will suit the narrow scope of this essay: A layer of abstraction is defined by a subset of requirements, with successive layers defined by adding (or removing) requirements. The most abstract layers have the fewest requirements, and as we add in requirements, we get progressively more concrete until we’ve added in all the requirements for the overall program.
This is very similar to the notion of abstraction in mathematics. But since mathematics is mostly about describing things rather than creating them, layers of abstraction are defined by the set of assumptions that a model makes, rather than the requirements it enforces.
The most basic example of this is numbers. The numbers are an abstract class of any concept that can be counted or measured as a scalar. The assumptions at this level of abstraction have to do with ordering, addition, multiplication, etc. You can make it more concrete by adding assumptions about what you’re measuring, and there are countless potential layers and assumptions before you get as concrete as apples vs. oranges.
In fact, there’s a fork in the abstractions, as there was between CsvFile
and XlsxFile
, when you add assumptions about whether you’re counting a discrete quantity or measuring a continuous quantity. The fact that these two subclasses share a common parent, i.e. that discrete and continuous quantities follow the same rules of arithmetic, is one of the most fundamental ideas in mathematics.
Because math deals explicitly with abstractions and assumptions, its authors tend to be very careful to identify which assumptions are being made (axioms), and thus what level of abstraction is being applied. In programming, the requirements associated with each layer of abstraction are often implicit, but the levels of abstraction are still there.
So if we define a level of abstraction in programming as above, as a set of requirements, then we can place a line of code in a layer based on the set of requirements that it enforces plus the set of requirements it relies on from more abstract layers. Since the code in a subclass relies on the code in the parent class, the subclass sits in a more concrete layer of abstraction than its parent class because.
Why It Matters
The functions and classes that make up your code define boundaries between components of the logic that address different requirements. When these boundaries cut across requirements, they make the requirement diameters larger. When they cut between requirements, they tend to make the diameters smaller. The TabularFile
class and its subclasses are a clean abstraction because each requirement is addressed by exactly one of the three classes. The boundaries of the classes cut between the requirements.
The major boundaries between components follow the boundaries between the layers of abstraction that they define, whether you choose them intentionally or otherwise. An abstraction is clean if most requirements are addressed by a single layer so the boundaries of the abstraction, and thus of the components, cut between requirements, reducing their diameters.
In a leaky abstraction, on the other hand, requirements tend to be stretched across multiple layers. Since abstraction pushes code for different layers apart (as a result of pulling code for each individual layer closer together) this makes the requirement diameter larger.
So the abstractions you choose play a major role in determining how effectively you can minimize requirement diameters.
Common Coding Practices
As noted above, abstraction is part of the motivation behind object oriented programming. But it comes up in a number of other places too:
It’s common practice to wrap the I/O pieces of an application with helper functions that separate the details of how the application interacts with its user from the business logic. These functions create separate layers of abstraction for IO and business logic, which are often explicitly called out in architecture documentation, along with other layers.
Shared libraries define layers of abstraction that encapsulate the requirements that they address. These requirements are more generic than the requirements that your own code addresses, since the library author can’t know what your specific requirements are. So they force you to define your layers in a particular way. For a well-designed library, this separation will make sense in many situations. But using a library can increase your requirement diameters if the boundary doesn’t fit well for your particular requirements, forcing you to adopt a leaky abstraction.
Some libraries define a Domain Specific Language that adds a layer of abstraction on top of the programming language. In fact every programming language defines a layer of abstraction on top of assembly and machine code. So you can think of these as forming a continuous series of layers from machine code up to the most concrete layer of your code.
Trade-offs
Many refactorings that reduce requirement diameters will tend to increase the overall size of your code, and adding layers of abstraction is no different. Some of this is boilerplate code from new functions and classes. Some is from passing more variables between layers of abstraction.
This is true for clean layers of abstraction as well as leaky ones. The difference is that clean layers will reduce the diameters of the requirements that they separate, while leaky or unnecessary abstractions increase code size without reducing diameters. So it’s important to be thoughtful and careful about defining layers of abstraction. For example, if you find you’re passing too many variables between layers, this often means you have a leaky abstraction.
Finding the best abstractions usually requires an iterative process that allows the layers to evolve. Your views on the most natural way of defining layers will tend to change as your view of the code and your understanding of the requirements grows. When you see a natural separation between requirements, split their layer of abstraction into two. When you notice that requirement diameters are growing because of a leaky abstraction, look for layers that could be merged and re-split.
Caveats
As with any of the principles I’ve written about, defining layers of abstraction before you understand the requirements and how they’re likely to change can be counter-productive. As noted above, the high-level requirements determine the structure that your layers of abstraction should model. If these requirements change, or if you don’t understand them well when you define your layers, the abstraction will become leaky.
For example, the layers we defined for tabular file reader were based on the requirements that the table would be read from a file and would be small enough to fit in memory. If either of those requirements changes, a different configuration of layers might be cleaner.
In addition to keeping requirement diameters small, abstraction also makes code more understandable by breaking it up into more manageable and understandable conceptual chunks. So even for experimental code or code that you’re writing with the goal of learning something, you’ll probably introduce some amount of abstraction. But you should generally avoid creating too much formal, structured abstraction until you have a good understanding of what’s appropriate. This is sometimes call premature generalization.
One anti-pattern I’ve noticed myself falling into on occasion when I don’t know how to approach a problem is to start adding unnecessary layers of abstraction around the issue. This often starts out as working from the outside in, but it can degenerate into a form of procrastination – cutting off smaller and smaller layers from the parts that I understand while avoiding the hard part in the middle that I don’t.
Ultimately, abstractions are more important for helping you and others to understand the code than for minimizing requirement diameters. In most cases, the layers of abstraction that are clean in terms of requirements will tend to be the most understandable, even if the abstractions that initially seem most intuitive turn out to be leaky. But in cases where the abstraction that minimizes requirement diameters is completely incomprehensible, it’s better to go with the layers that you can understand and explain.
Conclusion
Creating abstraction is a fundamentally human activity that we often do without thinking, whether writing code or understanding our daily life. Consciously and deliberately examining the layers of abstraction that you introduce into your code can make it cleaner and more maintainable by helping you choose abstractions that will reduce requirement diameters and recognize when they become leaky.
The next time you’re refactoring some code or trying to understand someone else’s code, you’ll probably be able to identify layers of abstraction, whether they’re deliberate or not. Look at how these layers affect different requirement diameters and how they effect your understanding of the code. Are some of the abstractions leaky? Would a different configuration of layers be more compatible with the requirements for the project?
Once you’ve thought this through for a number of different projects, it will start to become more intuitive and more of a habit. And soon you’ll be able to quickly recognize ways to improve your code and explain the trade-offs involved to teammates and stakeholders.