Setting up the polling query for our real-time ROLAP cube
against an EXASOL database, we used an easy, but very bad practice approach. We set our polling query directly against a
fact table. Nice and easy for testing and proof of concept. Production? don't
consider using it.
Polling a count against a small or medium size fact table might
won't create much of a problem. When the dataset is huge it might take more
resources. So, when the data set is
really large, and our goal is very near real time, why not poll a very small
table.
Data Change Table
Let's call it TrackDataChange. A minimalist version will have just a few
columns: the source system that loaded
the data, the load date and a primary key ID. The only difficult part is you
have to configure your ETL process(es) to insert a single row to your new
TrackDataChange table. The new record simply marks that the ETL process has
completed.
Pros
the polling query avoids data tables used for data analysis
the TrackDataChange table is narrow and holds few records
the TrackDataChange table is quick to query
the TrackDataChange table is quick to insert into
Cons
the ETL process is marginally more complex
No comments:
Post a Comment