Journal Journal: Rant: Untrusted Data from the Source 4
While trying to load test data, we found duplicates (based on the unique key) in the provided file. So, the BA (English is not her first language) asked them:
Does the test file present valid business scenarios?
The response
Test data is never as constrained as production data - we have a lot of [...] users setting up test data every day for a lot of different reasons plus there will be historic test data that has been abandoned after either successful or failed tests - it can never be said that test data is as clean as production data
... but I would expect that comment to apply to most if not all applications
Really?? Test data is not constrained? I understand that test data can be bogus, but unconstrained?? What exactly is the purpose of this test data then? I can supply Lorem Ipsum myself.
Another question:
If combination of [two columns] doesn't provide [main id] uniqueness as it was discussed and stated in use case, what would be additional attribute(s) defining [main id] uniqueness?
A simple question asking how to resolve duplicates when we were not expecting any.
The repsonse:
[The two columns] code combination is unique from a Business perspective - do not re-design your [application] tables
I would however expect you to have exception processing in your load job (as [their application] does for all its inbound feeds) e.g. if you try to load something to a table and it can't load for whatever reason (a duplicate or whatever) you would write it to an exception report
Really?? We need an exception report? If they are the trusted source they are supposed to be, any error in the file should completely reject the file as bad, not just individual records, because any bad data means the entire file is suspect.
In general, i am against writing exception code in the database. (See Tom Kyte's posts on the topic for related concerns.) Exceptions, by definition, are unexpected. Handling an exception means they are expected. Only the calling system should handle unexpected errors, the reason being, as it is unexpected we do not know what to do. It's then up to the calling system to decide what its output will be.
To be fair, they do not expect duplicates, and it might just be an issue with the test data. But the whole attitude of "exception reports" is absurd. In short, the source system's team doesn't care about their own data.
This happened to me before on a different team when were to receive data from another team. I noted the absurdity of some of the dates (worst offender was a business that started ~400 CE) . When i notified their BA, he asked me to fill out a request to have it fixed. IOW, they wanted our team to pay to fix their bad data. That case is embarrassing for me as i lost my cool with their BA. When he asked me what we wanted in our feed, i told him to give whatever he wanted as we would not trust his information (more than we had to).