Image by Gerhard G. from Pixabay
Scaling a biotech research program requires coordinating the flow of data between teams, functions and scientific fields, from the people that generate it to the ones that use it.
You may think of data as objective and universal. But in fact, its interpretation depends on the context in which it was captured and in which it’s analyzed.
A measurement that wasn’t written down prevents the analyst from answering a key question. The distinctions that the bench scientist didn’t make limit the comparisons that the data scientist can explore.
Data doesn’t just travel from database to database. It travels from context to context.
This inconvenient fact creates a trade-off: the immediacy of the initial use you collected the data for against the generality of all its other potential uses.
I learned about the idea of making data travel from work of Sabina Leonelli exploring how biological ontology projects allow scientists to share data across subfields.
A shared ontology or vocabulary never fits any one context perfectly. But without it the data won’t fit any other context.
In other words, a shared ontology trades immediacy for generality.
Other data collection practices affect this trade-off as well:
Saving lab notes in a table rather than free text limits what you can capture, but makes it accessible for statistics later.
Capturing values that you don’t have an immediate user for slows down data collection but allows others to answer more questions later.
Leonelli identifies two processes that allow data to travel:
Decontextualization enables individuals outside the original context to understand what the data is and whether it’s relevant to their own context.
These are mostly annotations that tell someone what data was collected – species, tissue type, disease, …
Recontextualization enables others to apply the data to questions in their own context.
This is more detailed metadata about how the data was collected – experimental conditions, collection protocols, which samples ended up where, …
Whenever one team in your organization uses data that was collected by another, these two processes allow the data to travel between them.
For some trips, immediacy is what matters. For others it’s generality.
Usually it’s a bit of both.
Data collectors will naturally favor immediacy because that’s the priority closest to their work.
Some aspects of a platform will enable this, while others will force encourage them to support the decontextualization and recontextualization that enable generality.
In other words, every design decision pushes these two processes one way or the other along the immediacy/generality trade-off.
So that raises two important questions:
Which way do you want to push each of the data trips in your organization?
What design decisions will make that happen?
These are not simple questions to answer, but thinking about the immediacy/generality trade-off gives can give you a place to start.
3 thoughts on “Making Data Travel”