Tuesday, October 27, 2015

PolyBase Configuration for Cloudera

Cloudera is perhaps the biggest player in Hadoop, so it makes sense that we understand what's needed to connect SQL Server 2016 to Cloudera.  To get started, we downloaded the latestvirtual server image from Cloudera, which for our purposes was 5.4.2.0. You can get a Cloudera QuickStart VM here.  

Once we had Cloudera up and running, we could move on to the next step of configuring our server for Cloudera. The PolyBase documentation does not specifically list out our version of Cloudera 5.4.2.0, so we chose the next best listing - which was Option 6: Cloudera 5.1 on Linux.  With this, we need to update our server configuration accordingly:

sp_configure 'hadoop connectivity', 6;
reconfigure
 
One thing to keep in mind is that PolyBase can only connect to one Hadoop installation at atime.  For a more detailed listing of how to setup PolyBase in SQL Server 2016 see our post on:  Setting Up PolyBase in SQL Server 2016. 


Hadoop YARN - Locating the yarn.application.classpath


Cloudera 5.4.2.0 is a YARN based Hadoop server, so we'll also need to locate the yarn.application.classpath value in Hadoop and then update the SQL Server 2016 with the  yarn.application.classpath value from Hadoop. Fortunately, the Cloudera VM starts up with a typical CentoOS UI, so locating the 'yarn-site.xml'  file is a lot easier than in Hortonworks. And you only need to do it once.

In Cloudera, the easiest way to find the file using the UI tools is with the File Browser. At the top left of the UI locate the Applications tab and navigate to Applications | System Tools | File Browser, and open the File Browser. 




Next, in File Browser using the search "binoculars" locate 'yarn-site.xml'.   Then open the yarn-site.xml file that is in the /etc/hadoop/conf.empty folder. 




Open the file and locate the yarn-application.classpath.  




For my Cloudera installation the yarn.application.classpath is:

       <property>
         <description>Classpath for typical applications.</description>
          <name>yarn.application.classpath</name>
          <value>
             $HADOOP_CONF_DIR,
             $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
             $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
             $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
             $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
          </value>
 </property>

Use this value to update the complementary yarn-site.xml file on your SQL Server 2016 installation. Typically, you can find it here:

         C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf\

Our post on Setting up PolyBase for YARN in SQL Server 2016 has additional details on configuring SQL Server for a YARN server.  Once you have the server configured for Cloudera and YARN, you should be all set to connect and use Hadoop via the PolyBase engine. 

See our posts on:




No comments: