Sunday, September 6, 2015

PolyBase Resources

Just What is PolyBase, and how do I find out more about it:  a list of PolyBase resources 

PolyBase Explained, James Serra, an excellent overview on PolyBase, and what it does, and why you might want to use it.

It's a bit old, having been published on November 15, 2012, but ZDNet's Microsoft's PolyBase mashes up SQL Server and Hadoop is a nice summary of PolyBase and other MPP mash-ups.

Gray Systems Lab at Microsoft - on PolyBase, its origins, including links to publications by David Dewitt.

Another Paper by David DeWitt, Split Query Processing in PolyBase MSDN: Getting Started with PolyBase, SQL Server 2016,  MSDN, always a good reference.

PolyBase vs. Scoop; Ginger Grant,  Scoop provides a link between the relational world and Hadoop, but it is very different from PolyBase.  Ginger Grant provides a nice comparison.

PolyBase in APS - Yet another SQL over Hadoop solution?, from the SQL Server team, some commentary on their view of PolyBase.

SQLPass 2014 Event - The Role of PolyBase in the MDW (PDF), by Brian Mitchell. Based on the APS, but still relevant to the upcoming release in SQL Server 2016.  A good overview for PolyBase. Downloads the file: 24HOP_BMitchell_The_Role_Polybase_in_the_MDW.pdf

SQLPass:  Big Data on the Microsoft Platform (a pdf), by Andrew Brust. Download the file 24HOP_session3_Brust.pdf. Nothing on PolyBase, but nice summary of other Hadoop related items and how they might work with SQL Server.

APS documentation:  PolyBase was first introduced with APS (formerly Parallel Data Warehouse). So, while intended for APS, some of the documents relate to how the database, PolyBase and Hadoop interact. It may not be a direct, one-to-one comparison, but they provide insight. A good place to start is the document: Optimizing Distributed Database Design for Analytics Platform System.docx.  If you download the APS help file {aps.au3.chm}, if it appears empty, go to properties, and at the bottom of the General tab, click on the unblock command.

Hive DDL Language Manual,  A first resource for Hive data definition language (DDL) syntax. Scroll down to the section on External Tables. When you create an External Table in Hive you are just creating a schema against data located elsewhere. Similar to creating a linked table (or perhaps a view). Apparently the PolyBase designers decided to continue with this format.

Polybase in SQL Server 2016 CTP 2, Hilmar Buchta has a brief overview on setting up PolyBase in SQL Server 2016 CTP2.  His post is the only reference I have found that discusses the Hadoop.config file on the SQL Server machine, and updating the HadoopUserName setting.

No comments: