For this post, we are building a real-time ROLAP cube using
EXASOL. For us, the concept of ROLAP,
and the goal of a real time multi-dimensional data warehouse provided an ideal
design. Use an analytics & columnar
based relational database for both the structured data warehouse and as the
back end to our ROLAP cube.
As technology advances, real-time, or near real-time,
analysis is becoming increasingly important. And while Hadoop, Big Data, and
Data Science have captured most of the attention - cubes, or more specifically,
multi-dimensional cubes are still a great tool. Upgrade the cube to real-time
using ROLAP and it just gets better. On their website, EXASOL indicates support
for use as the database backend to an Analysis Services (SSAS) Real-time ROLAP
cube. We decided to take them up on it.
Overall, the process was straight forward. First, implement a traditional star schema
data warehouse in EXASOL. Load the data warehouse with SSIS, plus setup some
test incremental data adds. Next, use SQL Server Data Tools to design and
deploy a ROLAP cube. And finally, use EXCEL to connect with and test out our
cube.
What we are looking for in this process:
- Ease of data population, updates and data maintenance- using
SSIS
- Design and configuration of SSAS ROLAP cube using EXASOL
- Usability - the Analyst experience
Data population, updates and data maintenance
Our primary focus is the ability to use an ETL tool to easily
add supplemental records to the fact table. To simulate a near real-time data
warehouse environment. And once added to the database, are the new records visible
in our ROLAP cube?
EXASOL supports both the legacy ODBC protocol as well as an ADO.NET
connection. With both of these protocols, we were able to easily create and run
SSIS ETL packages from our source data into the EXASOL data warehouse. For this
exercise, we used the SQL Server Data Tools 2016 for Visual Studio 2015,
without any problems. The version of the EXASOL ADO.NET Data Provider was for
version 5.0.15.
Design and configuration of ROLAP cubes
Here we had our first speed bump. The current version of SSDT 2016 for Visual
Studio 2015 had multiple issues and would not properly connect to EXASOL to create a new SSAS cube - so we pulled
up an older version (1). Using SQL Server
Data Tools 2014 for Visual Studio 2013 - for an analysis services multi-dimensional
project, we were able to easily setup a new Data Source connection to our
EXASOL data warehouse using the ADO.NET data provider. Once connected, we could
easily create a new Data Source View.
With our data source view in place and adjusted to our
satisfaction, we could start cube design. We were pleased that the overall
process flowed seamlessly. With no special adjustments required to support
EXASOL. Since ours was a ROLAP cube, rather than a more common MOLAP,
partitions required more attention.
(1) I have been informed by Mathias Golombek, CTO, that they are working on an updated ADO.NET driver to resolve these issues.
Real-time ROLAP, with Options
In Partition Storage Settings, we chose the standard Real-time
ROLAP setting. Our only adjustment beyond this was to set up the Scheduled Polling
settings in the Notifications section. Here, the polling interval was set to
one (1) minute, with the polling query set to do a count against our fact
table. One minute is good for near real time.
If needed, you can shorten the polling interval granularity to a matter
of seconds. With our interval of one minute, our scheduled polling worked as expected. After running an incremental fact table update, the latest records were visible in the ROLAP cube after the interval wait time. For more detail on the partition settings used see:
Partition Settings for our SSAS ROLAP cube using EXASOL.
Usability
Every analyst I know uses Excel. And as a free form tool,
it's great. It also connects to SSAS cubes, so it makes a good tool to test a
user's experience. Sure Tableau might be better, but at a very high cost.
First off, the overall analyst experience was good and really
no different than if the tool was connected to an SSAS MOLAP cube. For this
test we are working with EXASOL's community edition VM, which limits our configuration
to a single node. Our single node VM
configuration has four (4) virtual 3.07GHz Xeon processors and 15 GB of memory running
on a dedicated spindle. Our fact table has slightly more than 11 million
records, plus five dimension tables. While
11 million records is on the smaller side for a data warehouse, it's not an unreasonable
size for medium sized companies and divisions. I know quite a few companies that would love
to have over a million sales transactions a year.
Warming the Indexes
Looking over the user manual, there are no discussions on
setting up indexes. And that is by design. EXASOL manages all indexes and
optimizations internally. As users query the database, the optimizer determines
what indexes and statistics it needs. And we found this out with our single
node VM. As we simulated an analyst querying the database via Excel, the
response improved rapidly. Where a first query might take several seconds, the
follow on queries dropped to milliseconds. Essentially, user demand drove the
optimizer.
But I need my hands on the Optimizer control
I understand. DBA's need to have the ability to make
adjustments. But since most of you work with SQL Server, you already benefit
from some internal optimization features. That's why there are fewer controls
compared to Oracle. It's not that they don't exist. It's just that behind the scenes, a few very bright database guru's worked out algorithms that respond in real time. That's progress. A 1960's Jaguar E-type needs a full time mechanic. It is a great museum
piece, but your daily car is probably more modern. With both cars and databases, less is often
more.
Summary
With advancements in columnar relational databases, ROLAP should be back on the list of tools that support day to day operational business intelligence, analytics and reporting.