Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
User Journal

Journal Chacham's Journal: Rant: Untrusted Data from the Source 4

While trying to load test data, we found duplicates (based on the unique key) in the provided file. So, the BA (English is not her first language) asked them:

Does the test file present valid business scenarios?

The response

Test data is never as constrained as production data - we have a lot of [...] users setting up test data every day for a lot of different reasons plus there will be historic test data that has been abandoned after either successful or failed tests - it can never be said that test data is as clean as production data ... but I would expect that comment to apply to most if not all applications

Really?? Test data is not constrained? I understand that test data can be bogus, but unconstrained?? What exactly is the purpose of this test data then? I can supply Lorem Ipsum myself.

Another question:

If combination of [two columns] doesn't provide [main id] uniqueness as it was discussed and stated in use case, what would be additional attribute(s) defining [main id] uniqueness?

A simple question asking how to resolve duplicates when we were not expecting any.

The repsonse:

[The two columns] code combination is unique from a Business perspective - do not re-design your [application] tables
I would however expect you to have exception processing in your load job (as [their application] does for all its inbound feeds) e.g. if you try to load something to a table and it can't load for whatever reason (a duplicate or whatever) you would write it to an exception report

Really?? We need an exception report? If they are the trusted source they are supposed to be, any error in the file should completely reject the file as bad, not just individual records, because any bad data means the entire file is suspect.

In general, i am against writing exception code in the database. (See Tom Kyte's posts on the topic for related concerns.) Exceptions, by definition, are unexpected. Handling an exception means they are expected. Only the calling system should handle unexpected errors, the reason being, as it is unexpected we do not know what to do. It's then up to the calling system to decide what its output will be.

To be fair, they do not expect duplicates, and it might just be an issue with the test data. But the whole attitude of "exception reports" is absurd. In short, the source system's team doesn't care about their own data.

This happened to me before on a different team when were to receive data from another team. I noted the absurdity of some of the dates (worst offender was a business that started ~400 CE) . When i notified their BA, he asked me to fill out a request to have it fixed. IOW, they wanted our team to pay to fix their bad data. That case is embarrassing for me as i lost my cool with their BA. When he asked me what we wanted in our feed, i told him to give whatever he wanted as we would not trust his information (more than we had to).

This discussion has been archived. No new comments can be posted.

Rant: Untrusted Data from the Source

Comments Filter:
  • Turns out they didn't define their business case properly. I had to add the time field in to achieve a unique logical key. And even then, I still had to rewrite my loader (tossing XML files in C# to call stored procedures) to report exceptions properly so that I could figure out when they lied to me once again.

    Data integrity? We don't need no stinkin' data integrity!

    • by Chacham ( 981 ) *

      I had to add the time field in to achieve a unique logical key.


      • This was not entirely unexpected, given the nature of the data (floating inventory moving through a factory and a warehouse- sometimes it returns to the same location in three dimensional space, but is unique in four dimensions). But it sure would have been nice if they could have given me a scan sequence number as well; a date time field is nice for reporting but should NEVER be part of a logical primary key.

        • by Chacham ( 981 ) *

          but should NEVER be part of a logical primary key.

          I wouldn't say never. The odd case would be in the table held data by the minute or second or the like.

          But, i share your adamant stand against it it nearly all cases.

"Never face facts; if you do, you'll never get up in the morning." -- Marlo Thomas