Qualities of a Data Contract
Off the back of a couple of in person and online interactions, I have encountered the valid question of: what is a data contract? Or more specifically: what makes up a data contract?
Simply put I believe these are the qualities that define a data contract:
It is a structured document
Because: if it can be parsed, it can be validated, and thus it can be executed – so that the document can be used for making “stuff” in your world as well as making sense of your world.
Typical approaches to documenting requirements and sense making tend to favour rapid capture with minimal constraints (i.e. spreadsheets) and rely on after the fact parsing done by people reviewing and revising their own understanding.
This presents two immediate problems: slow feedback loops and the illusion of a shared understanding.
It is not only schema
Because: if context, responsibilities, and limitations are described then we are better equipped to interrogate the ‘correctness’ beyond “have we been able to store the data?”
One of the most common responses observed to the concept of a data contract is “so it’s API documentation”. I understand why that is, given that to the point of “a structured document” most examples of data contracts are a yaml
or json
document which an awful lot like well established approaches to documenting APIs. In addition to this familiarity in form there is the familiarity of function – as technologists the idea of types and their constraints well ingrained.
However schema details the bare minimum for interoperability concerns of the machine, and nowhere near the bare minimum for the interoperability between people.
It evolves as part of an ongoing conversation concerning value creation
Because: agreement is temporary and if the agreement doesn’t serve the new reality people will find ways to work outside of the agreement, thus nullifying any other benefits.
As data is a side effect of things happening – in the parlance of data contracts: data producers producing data. The ways things happen is in a constant state of flux. This change is often the result of a desire to realise some new value and/or mitigate an identified risk. Thus the conversations need to acknowledge this, and not wholly on the data engineering related concerns. Focusing the conversation on the benefits sought by producers and consumers will help foster empathy, which in turn supports shared ownership.
In Summary
The qualities listed above are not exhaustive, however the combination thereof is what differentiates my understanding of data contracts from the similar yet distinctly different concepts that have come before.
I do believe that the conversation around and evolution of data contracts is akin to the early days of the agile manifesto. People are still trying to figure out the ways what has been presented does or does not apply to their situation. With that comes (will come) some missteps such as:
- Looking for a purely technical solution and overlooking the social aspect
- Desire for some kind of blueprint to drag and drop in, while the details of the technical implementations (note the plural) are still emerging
- The greater community building their own things on top of the idea, which will lead to meaningful advances (evolutionary design, inspection and adaption), and some cookie cutter thinking (looking at you scrum), and some downright harmful stuff (scaled agile framework, I said what I said)