Lack of credibility is perhaps the biggest problem with data warehouses. So now I'm noticing some tools to help people trace data in a data warehouse back to the source. Where it came from, how it was changed, etc. - all this is after-the-fact. Perhaps valuable, but a well designed ETL process already includes this type of information. A result from the premise that the data in a data warehouse is 100% accurate. It rarely has been, but it should be close.
Big data, Hadoop, web searches, and much of the on-going world of statistics are approximations. Best guesses - that can change once you make an adjustment to the incoming data or the rules used for the query.
No comments:
Post a Comment