Several years ago, Microsoft introduced PolyBase as part of
the Parallel Data Warehouse, now called Analytic Platform System.
Unfortunately, since it was only available to the few who could afford the
Analytic Platform System, not much has been written about PolyBase. With SQL Server 2016's Community Technology
Preview 2.2 (CTP 2.3) released this summer PolyBase will now be available to a
much wider audience. As a premium feature, it will require the Enterprise
Edition license. So just what is
PolyBase, and should anyone be interested? Update: On June 1, 2016, the GA version
(general availability) was released. See SQL Server 2016 Now Available - June 1, 2106 for more details.
PolyBase acts as a real-time query engine that bridges the traditional relational database structures held in SQL Server with the massive amounts of data held in Hadoop. It is real time in that once you have setup your Hadoop data set within SQL Server, you can query it using T-SQL. Even to the point where you are joining relational tables with Hadoop tables.
PolyBase acts as a real-time query engine that bridges the traditional relational database structures held in SQL Server with the massive amounts of data held in Hadoop. It is real time in that once you have setup your Hadoop data set within SQL Server, you can query it using T-SQL. Even to the point where you are joining relational tables with Hadoop tables.
Perhaps the hardest part about PolyBase is the setup and
configuration required to connect to your Hadoop environment. It is a
multi-step process. First to establish the SQL Server to hadoop connection. And
then to configure your External Data Source, External File formats, and
finally, the External Table.
The first step is to enable the PolyBase Feature. This first
step is part of the standard setup process built into SQL Server. I've installed the CTP 2.3 version of SQL
Server 2016 several times, and I've found that for PolyBase, it is best to
first install SQL Server first. Ensure that it s installed properly, and then
install the PolyBase feature. The screen snapshot below shows that PolyBase as
just another feature selection.
The default port range for PolyBase; as of CTP 3.3 is: 16450-16460
The next step is to actually install and initially configure PolyBase that is discussed in the next post. Setting up PolyBase in SQL Server 2016
If you are interested in reading more about PolyBase, a good
place to start is with James Serra's blog on PolyBase. Polybase Explained - James Serra
No comments:
Post a Comment