Friday, October 2, 2015

PolyBase – error connecting to Hadoop file

PolyBase – error connecting to Hadoop file

At least with the current SQL Server 2016 CTP 2.3, PolyBase errors out trying to connect to certain text files in Hortonworks Hadoop. You typically get an error message similar to the following:

Update - June 14, 2016:  MSFT published a Polybase focused white paper for the Azure Data Warehouse, which supports Polybase.  This out lined - at least for Azure DW, the file types supported by Polybase.  For Azure DW, Polybase only supports UTF-8 files. This could explain some of the failed load issues experienced.  


Msg 107090, Level 16, State 1, Line 79
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
OLE DB provider "SQLNCLI11" for linked server "(null)" returned message "Unspecified error".
Msg 7421, Level 16, State 2, Line 79
Cannot fetch the rowset from OLE DB provider "SQLNCLI11" for linked server "(null)". 

I’ve investigated what might be causing these errors, trying to discover if there are certain traits that cause the error.



Could not find or load main class path.....

Dennes found on Hortonworks a possible solution for the following error.

         
      Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster  
 

This may be beneficial for other issues, such as Could not obtain block.
       
Answer by Montrial Harrell · Jun 11 at 04:19 AM
Got it!! I added the below property to the mapred-site.xml file and the query ran successfully.

<property> <name>mapreduce.app-submission.cross-platform</name> <value>True</value> </property>   
 

(this was provided by Montrial Harrell at the Yarn ClassPath Value for Polybase Pushdown - Hortonworks Community Connection    -- review the entire post, as there may be other suggestons that might be of assistance.)

...cannot be cast...

If you're error message includes the phrase:  "...cannot be cast..." see page PolyBase wants a strongly typed ORC Hive Table.

Could not obtain block:

Dennes found that the error message Could not obtain block:  "...usually means the port 50010 (data node port) is not responding to SQL Server."  See also adjustments noted above for "Could not find or load main class"

EOF does not match expected format:

From my investigation, if EOF (end of file) has a different format from that declared in the CREATE EXTERNAL TABLE command, you will get an error. If, for example, the file has a trailing line with row count, or other meta data.

Data Type Mismatch

If the declared data type does not match with the incoming data type, you’ll get an error. In setting up a new table, it might be useful to first declare all columns as varchar or nvarchar to minimize any potential errors.  Once you establish a working connection, create a new table with the correctly typed columns. Loading dates into PolyBase has a very specific protocol, and not all date formats are supported. 

Data Size insufficient

If the size of the declared data type is not sufficiently large for incoming data type, you’ll get an error. In setting up a new table, it might be useful to first declare all columns as long varchar or nvarchar to minimize any potential errors. An example might be where you declare the column to be varchar(10), but the maximum size found exceeds 10.  No auto-truncate.

Line Delimiter

At least so far, I’ve been able to connect to files with the three ‘standard’ approaches to establishing the EOL: 
1)  CR/LF   0D 0A     (DOS/Windows)
2)  LF                 0A    (UNIX)
3)  CR        0D           (Mac)

File Format

The UTF-8 format seems to be the most common format, PolyBase works with the UTF-8 format. I have had trouble with files in the older ANSI/ASCII format.  My single attempt with UTF-16 did not work, but that may not be due to the UTF-16 format.  As noted above, Azure Polybase only supports UTF-8.

Other – still to be discovered problems


Several Hadoop files I have are not connecting to PolyBase, and I have not resolved what the problem. As I get them resolved, I’ll add those discoveries here.

10 comments:

kiran kumar said...

I am getting the following while executing polybase query to read hive file

Msg 106000, Level 16, State 1, Line 1
Could not obtain block: BP-601678901-10.0.2.15-1439987491556:blk_1073742288_1485 file=/apps/hive/warehouse/xademo.db/call_detail_records/cdrs.txt
OLE DB provider "SQLNCLI11" for linked server "(null)" returned message "Unspecified error".
Msg 7421, Level 16, State 2, Line 2
Cannot fetch the rowset from OLE DB provider "SQLNCLI11" for linked server "(null)". .

Any help appreciated. Thanks in advance !!!

Andrew Peterson said...

Kiran, I have not seen that error before. PolyBase errors messages are generally little help. I am sure that this point, you have verified that you can connect to a basic text file. One potential is that the Hive table is setup in a different format, such as an RCFile. If you do work out the prolem, please let me know so I can add it to the listing. Andrew

Dennes said...

Hi,

Finally it worked !

I figured that the message "could not obtain block" usually means the port 50010 (data node port) is not responding to SQL Server.

HDP 2.3 sandbox in marketplace has this behavior, doesn't answer the SQL Server request in 50010, even when telnet to this port is working.

Cheers,

Andrew Peterson said...

Awesome Dennes. Can you list out how you fixed, or links to support for others. Thanks

Dennes said...

Hi, Andrew,

In HDP 2.3 I didn't yet. Event with telnet working, Polybase can't connect to HDP.

In a local 2.1 sandbox it worked and I could query HDP, but when I try to do a query pushdown two errors happens: The job inside HDP runs forever, never getting over 5%; Polybase try to connect to job history, but uses a wrong address (0.0.0.0:10020, if I didn't mistake the port).

I tried to change the job history configuration in polybase xml and in hdp and nothing worked, it still try to connect to 0.0.0.0

Unknown said...

Do we have a solution to this error "Could not obtain block“ ?
I am also getting this error while trying to query a table after setting up Polybase, I am using HortonWorks Hadoop 2.5 and Sql Server 2016 SP1.
Any response will be highly appreciated.

Thanks in advance.
Aman

Andrew Peterson said...

Dennes found that the error message Could not obtain block: "...usually means the port 50010 (data node port) is not responding to SQL Server." That's all we know at this time.

Dennes said...

I had success connecting polybase with hadoop copying several hadoop configuration files to the SQL Server machine and customizing a few configurations.

I found this information described here https://community.hortonworks.com/questions/38806/yarn-classpath-value-for-polybase-pushdown.html


Cheers,

Unknown said...

Thanks Andrew, Dennes also mentioned he didnt have this problem in 2.1 sandbox
,i tried to find older version but couldnt get hold of anything, do you by any chance know where i could find an 2.1 or 2.0 sandbox? Again, thanks for the help.

Unknown said...

Hi Dennes, I think I followed the steps for copying and customizing hadoop configurations but I keep getting the same error, I might keep looking on the same lines , I do have a question though, did you get it working in Hadoop 2.3 or later versions ?