A friend and data analyst at a client forwarded to me an e-mail chain between 2 development manager/leads that were in a dispute over data and source systems of record (SSoR). This particular organization has a number of bad habits concerning their data (which they have many terabytes of) and its administration (sorely lacking and deficient in numerous aspects, but that's not a new problem).
One of the worst things they practice concerning their data is that they move it around via replication and an ESB, storing it in multiple data servers specific to particular applications - too much. The obvious data quality problems aside, the concept of SSoR's, while not foreign to them, is not enforced properly and their overall application quality suffers from it on a routine basis. Their build-out and eventual move to a SOA will amplify these deficiencies if data quality, stewardship, and administration isn't part-and-parcel of the overall approach.
Anyway, IT Managers A and B are in an e-mail diatribe over SSoRs and the retention of data received from a service as follows:
A, the SOA implementer, argues that application-specific databases have no need to retain SSoR data at all since applications can invoke services at any time to receive data. He further opined that the SOA approach will eliminate application silos as his primary argument in the thread.
B, the applications development manager, is worried that he won't get the 'correct' value from A's services and that he has to retain what he receives from SSoRs to reconcile aggregations and calculated values at any point in time. This is a telling requirement as we will see below.
After reading the thread, my take to my colleague was as follows:
In general, B is wrong, but A did not do a good job of explaining his rationale and needs to meet the telling requirement. I outlined the approach they should take as follows:
By definition, an SSoR is the final authority on the enterprise value of every piece of data so designated to it. Once exceptions to this start being made, the scheme breaks down rapidly into the data value and multiple movement/storage morass that they're in now. However, a properly implemented SSoR and associated service-enablement not only provides the current values of designated data, but historical values as well.
B's application databases only need to store aggregations and calculations that they make using SSoR data, not the actual data itself. This makes sense in a number of aspects, including the fact that B's apps must always query (by service invocations) to get the current values of SSoR data every time an aggregation or calculation must be performed.
What A missed in his arguments, and what he must offer to B to avoid pack-rat duplication of data across application databases, is historical SSoR values, most likely delimited by time. A's SSoR's must store and offer through services every change to SSoR values, which relieves the need for B to store any SSoR data in his databases. This also relieves B of having to write any code to manipulate SSoR data other than asking A for it through services and using the data in the application's code.
There are other morals to this story that enterprise and data architects should pay attention to:
- Old habits die hard in large organizations, and architects are usually the arbitrators of disputes like this. Manager B has to be convinced that he gains a lot and loses nothing through these approaches. Make certain that you as architect understand and have appropriate responses to 'telling' requirements such as B's need for historical data values from A's SSoR.
- Related to old habits, if the argument is not convincing enough to managers such as B, you will be asked to make exceptions to fundamentals such as SSoRs. Avoid caving on issues like this at all costs because if you do, you may regret it later when the problems approaches like this are supposed to solve wind up not doing so.
- Expect managers and leads such as B to make specific counter-intuitive arguments to the approach. A good one here would be for B to exclaim that the network will become saturated with numerous and sundry requests to A's services for SSoR data. The proper response to that could be "We'll invest more in the network to handle the load," or better, "We'll do this as a proof-of-concept and measure the impact on the network and servers; and make decisions from there." Most of the time, arguments like this in large organizations are disingenuous and usually false even though it represents the worldview of the person making the argument.
Issues like this make architecture very interesting work...dealing with the human issues as well as the technical ones.