The Experiment Factory

For a biotech organization to scale its research program, it must balance the flexibility needed to explore a variety of biological hypotheses against the consistency needed to make a collection of observations into more than the sum of its parts. The balance between these opposing forces will shift over time, from nearly complete flexibility whileContinue reading “The Experiment Factory”

The Operational-Analytical Data Cycle

For a biotech organization to effectively scale its research program, its data platform must enable all users to leverage as much data as possible within their decision making context – the levers that allow them to make decisions and take action. This is easier said than done, and one of the main issues is againContinue reading “The Operational-Analytical Data Cycle”

Designing a Chimera Data Platform

One of the key roles of a software team at a Biotech organization is to enable the research program to scale without sacrificing flexibility and innovation. I’ll write more about this in upcoming posts, but today I want to explore how my understanding of the scope of this work has shifted since switching from aContinue reading “Designing a Chimera Data Platform”

Building Your Data Governance Toolbox

When you first start learning about data governance, it often seems like a hairball of tightly knit ideas where you can’t understand any one piece until you’ve studied and learned the whole thing. I’m not an expert by any stretch, but I’ve wrestled with learning about data governance long enough to find a way ofContinue reading “Building Your Data Governance Toolbox”

Layers of Data Infrastructure 3: Storage

In my last two posts I’ve explored the high-level design decisions related to two of the three layers that define each pipeline stage of each category of data use cases: Control and Compute. The Control layer defines how the user interacts with the system, while the Compute layer defines how the system does the work.Continue reading “Layers of Data Infrastructure 3: Storage”

Data Infrastructure Layers 2: Compute

In my last post I described how you can think of your organization’s data infrastructure as a grid of blocks defined by category of use case and stage of the pipeline. Each block can be further broken down into three layers: Control, Compute and Storage. Last time I briefly described these layers, then discussed differentContinue reading “Data Infrastructure Layers 2: Compute”

Data Infrastructure Layers 1: Control

In my last two posts, I started to break down the types of areas where an organization might need to deploy data tools/infrastructure along two axes: the categories of common use cases and the stages that you’ll encounter in most of these use cases. You can think of these as defining a grid of functionality.Continue reading “Data Infrastructure Layers 1: Control”

Categories of Data Use Cases

As the head of software engineering at a small startup with ambitions to grow much larger, I think a lot about how to design data infrastructure that will both address our immediate needs and adapt to future needs. I’ve seen what happens at large companies when each team has their own set of data infrastructure:Continue reading “Categories of Data Use Cases”

Requirement Diameters and Abstraction

In my last post, I discussed an idea called Requirement Diameters – the distance between all the lines of code that enforce a given software requirement – and the coding principle that these diameters should be kept as small as possible, particularly for requirements that are more likely to change. In this post, I willContinue reading “Requirement Diameters and Abstraction”