Monday, October 19, 2015

PolyBase - Inserts, Updates now supported (DML)

With SQL Server 2016 CTP2.x PolyBase connected to Hadoop, it's a great way to access all the data stored in your Hadoop installation using familiar T-SQL commands.  And if you have been reading some of the latest comments on NoSQL, your company may have the urge to start using Hadoop as your sole data warehouse and repository. All you have to do is migrate your data warehouse data, and with the PolyBase connection, it should be fairly simple - right?

[ Take me directly to PolyBase - Insert data into a Hadoop Hue Directory, where we outline just how to do it.]

For the early CTP releases INSERT was not supported. I have now tested it ( see PolyBase - Insert data into a Hadoop Hue Directory ), and you can now INSERT INTO an external table via PolyBase. Once you have an external table setup the next step is to run an INSERT command. 

Note: The following was written before INSERT was fully supported.

Perhaps something like this:

       insert into [dbo].InsertTest_1 ( firstcol, secondcol )
       values ('firstcolvalue1','secondcolvalue1');

In the earlier CTP versions, after running the insert command, you were greeted with an error message similar to the following:

       Msg 46519, Level 16, State 16, Line 68
       DML Operations are not supported with external tables.

With SQL Server 2016 released, we decided to cycle back and test the INSERT function. To keep it simple, we used the following command against a three column, all nvarchar table in Hadoop:


   insert into [dbo].AWDW_CSV_3String ( FullName, AddressLine1, DataSrc )
   values ( 'Ziggy Stardust','1 Moonshot Drive','Manual' );

The message returned was:

   Msg 46914, Level 16, State 1, Line 61
   INSERT into external table is disabled. Turn on the configuration
   option 'allow polybase export' to enable.


OK, update the configuration option.

   sp_configure 'allow polybase export', 1;
   reconfigure

Close, but still have issues.  PolyBase denies the existence of my table:

   Msg 7320, Level 16, State 102, Line 64
   Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11"
   for linked server "SQLNCLI11". EXTERNAL TABLE access failed because the
   specified path name 'hdfs://192.168.1.120:8020/user/hue/AWDW_CSV_3String.csv'
   does not exist. Enter a valid path and try again.



Switching to a Hive table, we were successful in inserting data into our table. But not initially.
The full file address for our test Hive table was

     LOCATION='/apps/hive/warehouse/sample_08/sample_08'

This did not work. Using only the relative Hive table location (below) did allow us to insert new data:


     LOCATION='/apps/hive/warehouse/sample_08'



Ok, now take me directly to PolyBase - Insert data into a Hadoop Hue Directory, where we outline just how to do it.




1 comment:

Anonymous said...

Thank you

Same issue.. I could not insert into my directory only into Hive/Warehouse

I wish the error message was better than invalid directory.. howbout a permission display
Thanks again !!!

Rafael