Once we had Cloudera up and running, we could move on to the next step of configuring our server for Cloudera. The PolyBase documentation does not specifically list out our version of Cloudera 5.4.2.0, so we chose the next best listing - which was Option 6: Cloudera 5.1 on Linux. With this, we need to update our server configuration accordingly:
sp_configure
'hadoop connectivity', 6;
reconfigure
One thing to keep in mind is that PolyBase can only connect to one Hadoop installation at atime. For a more detailed listing of how to setup PolyBase in SQL Server 2016 see our post on: Setting Up PolyBase in SQL Server 2016. Hadoop YARN - Locating the yarn.application.classpath
Cloudera 5.4.2.0 is a YARN based Hadoop server, so we'll also need to locate the yarn.application.classpath value in Hadoop and then update the SQL Server 2016 with the yarn.application.classpath value from Hadoop. Fortunately, the Cloudera VM starts up with a typical CentoOS UI, so locating the 'yarn-site.xml' file is a lot easier than in Hortonworks. And you only need to do it once.
In Cloudera, the easiest way to find the file using the UI
tools is with the File Browser. At the top left of the UI locate the
Applications tab and navigate to Applications | System Tools | File Browser,
and open the File Browser.
Next, in File Browser using the search
"binoculars" locate 'yarn-site.xml'.
Then open the yarn-site.xml file that is in the /etc/hadoop/conf.empty
folder.
Open the file and locate the yarn-application.classpath.
For my Cloudera installation the
yarn.application.classpath is:
<property>
<description>Classpath for typical
applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>
Use this value to update the complementary yarn-site.xml
file on your SQL Server 2016 installation. Typically, you can find it here:
C:\Program Files\Microsoft SQL
Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf\
Our post on Setting up PolyBase for YARN in SQL Server 2016
has additional details on configuring SQL Server for a YARN server. Once you have the server configured for Cloudera and YARN,
you should be all set to connect and use Hadoop via the PolyBase engine.
See
our posts on:
- Getting started with PolyBase
- Creating an External Data Source
- Creating an External File Format
- Creating an External Table
No comments:
Post a Comment