For a biotech organization to effectively scale its research program, its data platform must enable all users to leverage as much data as possible within their decision making context – the levers that allow them to make decisions and take action.
This is easier said than done, and one of the main issues is again how we define the scope of the problem.
When I first began learning about data science and machine learning, I became obsessed with data. But I noticed that half the time, what other people called data was not what I thought of when I used the term.
I was interested in data that could be used to generate summary statistics or train a predictive model. This data was usually a static snapshot in a form that could be read in large batches, but couldn’t be easily updated.
I’m going to call this analytical data.
The other kind of data that I occasionally bumped into was the individual elements of information that allowed people to do their daily jobs. Users needed to interact with only a few elements at once, but would need to update it in real time.
I’ll call this operational data.
The distinction is not about the underlying information, but rather how the data is stored and accessed. The same information will often appear in both forms at different times.
In fact, information tends to follow a cycle between these two types of data:
Operational data is exported as a snapshot from an operational database, becoming analytical data.
Analytical data is processed into descriptive statistics, predictions and insights that become operational data for a decision-making process.
Data science and machine learning focus on analytical data because this is what drives the technical complexity of these subjects. Analytical data is the raw material that they turn into insights and predictions.
But an organization runs on operational, not analytical data.
The operational data system for a sales team is the Customer Relationship Management (CRM) system. For the biologists at a biotech, it’s an Electronic Lab Notebook (ELN).
Users spend time in these operational data systems because that’s where their levers are.
Most members of your organization aren’t going to open up a Jupyter notebook and run your predictive models. They can only leverage these insights and predictions as operational data, linked to their levers.
And this is where things tend to break down.
The easiest ways to turn insights into operational data is to share it as a slide deck, a spreadsheet or even a stand-alone dashboard. But these forms of operational data ignore the existing operational data systems and structures.
They create a solution outside the existing ecosystem of operational data.
To make a decision or take an action, users interact with their operational data system. Before calling a customer, they’ll look up their phone number in the CRM. Before creating a sample, they’ll look up the protocol in the ELN.
It’s possible that they’ll also look somewhere else – a spreadsheet or a dashboard – so they can add your predictions and insights into the mix. But do you want to take that chance?
If you want users to maximally leverage data, you need to make it available within, or as close as possible to, the systems they’re already using: The ideal time to call the customer next to their phone number. The predicted outcome of the experiment next to its protocol.
This applies to insights and predictions from your data science team as well as additional data sources that are useful in raw form.
In other words, build your solution within the existing operational data ecosystem.
Designing from the perspective of a chimera data platform both helps identify how analytical data can integrate into operational systems, and creates a foundation for implementing these connections.
In upcoming blog posts, I’ll explore more concrete ways to do this.
Want to read more? Subscribe to my weekly newsletter where I’ll announce new blog posts and explore ideas from earlier posts in more depth.