Wednesday, October 5, 2016

Best Practice for PolyBase Table Location - Use a Folder

Use Folders!  For production, create a new folder for each file type.  Then point your LOCATION= parameter to just the folder, and not the specific file.

Why?  


1) Add more files to the directory, and Polybase External table will automagically read them.
2) Do INSERTS and UPDATES from PolyBase back to your files in Hadoop.
    ( See PolyBase - Insert data into a Hadoop Hue Directory ,
             PolyBase - Insert data into new Hadoop Directory    ).
3) It's cleaner.


Here is a typical data folder in Hortonworks:





And here is the corresponding Create External Table:

       
  CREATE EXTERNAL TABLE [dbo].AWDW_CSV_Sales_Date (
     ModelName  nvarchar(200) NULL ,
     ShipDate  datetime      NULL ,
     ExtendedAmt  money         NULL ,
     OrderQty  smallint   NULL 
 )
 WITH (LOCATION='/user/hue/AWDW2012_SalesData',
    DATA_SOURCE = hdp23 ,
    FILE_FORMAT = CSVfile  
    );       
 


No comments: