Monday, December 14, 2015

Data Linage - Credibility for ETL History

Lack of credibility is perhaps the biggest problem with data warehouses. So now I'm noticing some tools to help people trace data in a data warehouse back to the source. Where it came from, how it was changed, etc.  - all this is after-the-fact. Perhaps valuable, but a well designed ETL process already includes this type of information. A result from the premise that the data in a data warehouse is 100% accurate. It rarely has been, but it should be close.

Big data, Hadoop, web searches, and much of the on-going world of statistics are approximations. Best guesses  -  that can change once you make an adjustment to the incoming data or the rules used for the query.

No comments: