Tuesday, September 1, 2015

Setting up PolyBase in SQL Server 2016

Installing and completing the initial setup for PolyBase on SQL Server 2016 is straight forward. However, PolyBase does have a few key requirements, so I've outlined these below.  
They include:


  • Oracle Java SE RunTime Environment (JRE) version 7.51 or higher. The installer will fail if JRE is not present. For my installation, I used the file:  jre-8u51-windows-x64.exe. You can find the latest JRE here Java SE Runtime download.
  • Microsoft .NET Framework 4.0
  • 64-bit SQL Server Evaluation Edition (CTP 2.2)
  • Minimum memory of 4 GB
  • Minimum hard disk space of 2 GB

The easiest way to install the PolyBase feature is to use the SQL Server Installation Center installation wizard.  When you get to the feature selection page, simply select the PolyBase feature. Then, in the server configuration page, setup both the PolyBase Engine Service and the PolyBase Data Movement Service to run under the same account. With the PolyBase feature installed, it's time to move on to the next step of configuring the database server for PolyBase.



Server Configuration
The next step is to configure SQL Server for your specific Hadoop server. For my initial test, I was running an older version of Hortonworks 2.0 on Linux, so my configuration value was 5.  
Configure the server for Hortonworks 2.0.

       

sp_configure @configname = 'hadoop connectivity', @configvalue = 5;   
go
reconfigure
go
       
 


The configuration value is specific to the target Hadoop environment you plan to use, so be sure to review the full documentation for setting up the server configuration. MSDN-PolyBase Configuration Options

The current listing for the Hadoop configuration options is as follows:

·         Option 0: Disable Hadoop connectivity
·         Option 1: Hortonworks HDP 1.3 on Windows Server
·         Option 1: Azure blob storage (WASB[S])
·         Option 2: Hortonworks HDP 1.3 on Linux
·         Option 3: Cloudera CDH 4.3 on Linux
·         Option 4: Hortonworks HDP 2.0 on Windows Server
·         Option 4: Azure blob storage (WASB[S])
·         Option 5: Hortonworks HDP 2.0 on Linux
·         Option 6: Cloudera 5.1 on Linux
·         Option 7: Hortonworks 2.1 and 2.2 on Linux  ( used this for Hortonworks 2.3 )
Note: Hortonworks 2.3 contains Hadoop version 2.7.1. I have found references where Option 7 works with the generic Apache 2.7.x installs.
·         Option 7: Hortonworks 2.2 on Windows Server
·         Option 7: Azure blob storage (WASB[S])

Which Azure Option:  1,4 or 7?
If you are planning on setting up a connection to an Azure Blob, it is unclear which option to choose. Is it Option 1, Option 4, or Option 7. As of September 2, 2015, I could not find any documentation on which one to use.

One more unknown, is that when you get to setting up an External Data Source, one of the options is a connection to an Azure SQL Database ({type =  SHARD_MAP_MANAGER}. As of September 2, 2015, it was not clear which configuration option you would choose if you wanted to connect to an Azure SQL Database.

If your target Hadoop is a YARN based cluster, you will need to update the yarn-site.xml file. You can find the current setup information for PolyBase in MSDN here: MSDN Getting Started with PolyBase

Confirm Successful Install

Use SERVERPROPERTY to confirm if your installation was successful.
Successful returns 1.  Otherwise you get a big 0.

       
        SELECT SERVERPROPERTY ('IsPolybaseInstalled') AS IsPolybaseInstalled; 

        Results
        IsPolybaseInstalled
        ------------------- 
        1

       (1 row(s) affected)
       
 

Security and User Considerations

Our test system used an open Hortonworks Sandbox. However, we did find one posting that suggested that the HadoopUserName value be updated. Thanks to Hilmar Buchta for pointing out this hidden user name setting.


See also:



No comments: