Monday, December 7, 2015

SSIS and Hadoop

Coming with the release of SQL Server 2016 are several new SSIS control flow and data flow components specifically for Hadoop.

Earlier Versions

But what if you need something now? Microsoft has a technical article for just this situation. Leveraging a Hadoop cluster from SQL Server Integration Services (SSIS) lists out sample scripts, and connection approaches such as WebHDFS, and SCOOP.

For Azure HDInsight, consider the Microsoft Hive ODBC Driver.

New for SQL Server 2016 SSIS

Control Flow:

  • Hadoop File System Task
  • Hadoop Hive Task
  • Hadoop Pig Task

Data Flow:

  • HDFS File Destination
  • HDFS File Source

Connection to Hadoop for the new SSIS task components uses either WebHCat or WebHDFS. Both of which will need to be setup by the Hadoop administrator. For Hortonworks you can find setup information here:

No comments: