Tuesday, September 1, 2015

Getting Started with PolyBase in SQL Server 2016

Several years ago, Microsoft introduced PolyBase as part of the Parallel Data Warehouse, now called Analytic Platform System. Unfortunately, since it was only available to the few who could afford the Analytic Platform System, not much has been written about PolyBase.  With SQL Server 2016's Community Technology Preview 2.2 (CTP 2.3) released this summer PolyBase will now be available to a much wider audience. As a premium feature, it will require the Enterprise Edition license.  So just what is PolyBase, and should anyone be interested?  Update:  On June 1, 2016, the GA version
(general availability) was released.  See SQL Server 2016 Now Available - June 1, 2106 for more details.

PolyBase acts as a real-time query engine that bridges the traditional relational database structures held in SQL Server with the massive amounts of data held in Hadoop. It is real time in that once you have setup your Hadoop data set within SQL Server, you can query it using T-SQL. Even to the point where you are joining relational tables with Hadoop tables.

Perhaps the hardest part about PolyBase is the setup and configuration required to connect to your Hadoop environment. It is a multi-step process. First to establish the SQL Server to hadoop connection. And then to configure your External Data Source, External File formats, and finally, the External Table.

The first step is to enable the PolyBase Feature. This first step is part of the standard setup process built into SQL Server.  I've installed the CTP 2.3 version of SQL Server 2016 several times, and I've found that for PolyBase, it is best to first install SQL Server first. Ensure that it s installed properly, and then install the PolyBase feature. The screen snapshot below shows that PolyBase as just another feature selection.


The default port range for PolyBase; as of CTP 3.3 is:  16450-16460


The next step is to actually install and initially configure PolyBase that is discussed in the next post. Setting up PolyBase in SQL Server 2016

If you are interested in reading more about PolyBase, a good place to start is with James Serra's blog on PolyBase. Polybase Explained - James Serra

No comments: