Skip to main content

Polybase - Installation and Implementation

This part is the continuation of the previous blog Polybase Introduction. If you guys are new, I recommend you to please go through that once for better understanding. In this part we will see how we can Install Polybase feature and how it is being implemented. So, let’s get started.

For the demonstration purpose we are using SQL Server 2019(Developer Edition). Before installing the Polybase feature let us look at the pre-requisites.

Required Resources:      

1. Minimum requirements:

1.1   64-bit SQL Server.

1.2   Microsoft .NET Framework 4.5.

1.3   Minimum memory: 4 GB.

1.4   Minimum hard-disk space: 2 GB.

1.5   Microsoft .NET Framework 4.5.

1.6   Recommended: Minimum of 16-GB RAM.

2. SQL Server 2019 (Edition: Enterprise, Standard or Developer)

3. Domain Account to run Polybase services. 

Steps to implement: 

                  1. Install SQL Server 2019 with Polybase Feature using a Domain User Account.

            2. Create a Database.

                  3. Create Master Key Encryption.

                  4. Create Database Scoped Credentials.

5. Create External Data Source.

6. Create External Tables.

7. Start Querying the original Table using External Tables. 

Polybase Installation: 

1. Run the SQL Server setup.exe.

2. Select Installation, and then select New standalone SQL Server installation or add features.

3. On the Feature Selection page, select PolyBase Query Service for External Data and option Java connector for HDFS data sources.

Polybase Installation, Polybase Feature Selection Page

Note: You need to install Oracle JRE 7 update 51 or higher to install Polybase. If it is not installed, you will get below error message while checking the rules for installation.

Polybase Oracle JRE Error

4.On the Server Configuration page, configure the SQL Server PolyBase Engine Service and SQL Server PolyBase Data Movement Service to run under the same domain account.

5.On the PolyBase Configuration page, Use the SQL Server instance as a standalone PolyBase-enabled instance. And proceed with the rest of the installation process. 

Post Installation Steps:

1. DWConfiguration, DWDiagnostics, and DWQueue databases will be create once you install Polybase. These databases are for PolyBase use. Do not make any changes to them. Keep them as it is.

2. After installation, we need to confirm whether the Polybase feature is installed. The T-SQL, SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled; will return 1 if Polybase is installed and 0 if it is not installed.

3. After installation, PolyBase must be enabled, but still we can do so using the T-SQL: 

        exec sp_configure @configname = 'polybase enabled', @configvalue = 1;RECONFIGURE; 

Polybase Configuration: 

You may have read BOL or went through some articles regarding Polybase Implementation and must have seen the examples quoting the use of Polybase with Hadoop or Azure as External Data Sources. But, here we will be using another SQL Server as an External Data Source, which will give you a different insight of how we can use Polybase to connect to a different SQL Server which can be used as an External Data Source. We have two servers AMLGO-01 and EX-SQL-01. We need to follow the below steps in order to use Polybase.

1. Connect to the Server AMLGO-01 and create a Database.

Execute T-SQL, Create Database Polybase Test

2. Create Master Key Encryption.

Execute T-SQL, Use Polybase Test;

Create Master Key Encryption By Password = ‘Polybase@123’ (Can be any key)

3. Create Database Scoped Credentials. Here Identity is the User who has access to the other SQL Server and Secret is its password.

Execute T-SQL, Use Polybase Test;

Create Database Scoped Credential User_Credential

With Identity = 'Polybase User', Secret = 'Polybase@123';

4. Create External Data Source. Here EX-SQL-01 is the other SQL Server which will be treated as an External Data Source.

Execute T-SQL, Use Polybase Test;

Create External Data Source EX_Source

With (Location = 'sqlserver://EX-SQL-01:1433', Credential = User_Credential, Connection_Options = 'UseDefaultEncryptionOptions=false')

The Connection_Options parameter is used for the sources below Sql Server 2016, in order to fetch entire data, else it will give below TCP Error.

[Cannot execute the query "Remote Query" against OLE DB provider "MSOLEDBSQL" for linked server "(null)". 105082;Generic ODBC error: [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: An existing connection was forcibly closed by the remote host.]

Polybase External Data Sources
5. We have AdventureWorksDW2012 database on our External Source EX-SQL-01, and we  want to access the DimAccount Table from AMLGO-01. For that we need to create an External Table with the same structure on AMLGO-01 Server.

Table Structure for Polybase External Table

Before we create an External Table we need to keep in mind certain Polybase Limitations.

  1. External Tables cannot be created using User Defined Data Types.
  2. Identity is Not Supported with External Tables.
  3. Unable to use Table Hints with T-SQL on External Table.
  4. Cannot add Constraints to the Columns on External Tables.
  5. Cannot perform DML Operations on External Table.
Polybase Limitations and Polybase External Table

We can now see the External Table Ex_DimAccount in Polybase Test Database on AMLGO-01

Polybase External Table

After all this is done, we can now access the data of DimAccount in AdventureWorksDW2012 on EX-SQL-01 using External Table created in AMLGO-01.

Fetching Data From Polybase External Table

As we have created and External Table to access the data of a Table located in an External Source. We can similarly create External Tables for Data Sources like Hadoop, Azure, Oracle etc and Polybase helps you to access them using T-SQL only. You do not need to learn any new language or technology for the same. This is how Polybase helps us to connect to different External Data Sources and access them using T-SQL.

Hope this blog helps you to get a better understanding of how the Polybase works. Till now you have the knowledge of What is Polybase, How to install it, How to Configure it And How to Implement it? Going forward we will learn what is Polybase Scale-Out Group and why it is being used. 

About Amlgo Labs : Amlgo Labs is an advanced data analytics and decision sciences company based out in Gurgaon and Bangalore, India. We help our clients in different areas of data solutions includes design/development of end to end solutions (Cloud, Big Data, UI/UX, Data Engineering, Advanced Analytics and Data Sciences) with a focus on improving businesses and providing insights to make intelligent data-driven decisions across verticals. We have another vertical of business that we call - Financial Regulatory Reporting for (MASAPRAHKMAEBAFEDRBI etc) all major regulators in the world and our team is specialized in commonly used regulatory tools across the globe (AxiomSL Controller ViewOneSumX DevelopmentMoody’s RiskIBM Open Pages etc).We build innovative concepts and then solutions to give an extra edge to the business outcomes and help to visualize and execute effective decision strategies. We are among top 10 Data Analytics Start-ups in India, 2019 and 2020.

Please feel free to comment or share your views and thoughts. You can always reach out to us by sending an email at or filling a contact form at the end of the page.



More Popular Posts

Amlgo Blog - Experience The Experiments

Amlgo Labs Blog  is a step towards our vision to share knowledge and experiences, Amlgoites accept every challenge very enthusiastically. We do experiments, we fail but we learn and build complex solutions to help our clients solve their problems in Data, Analytics, Prediction, Forecasting, Reporting, Designing and Development area. During this process we enjoy immense learning everyday and we have decided to share our thoughts, learnings, experiments and experiences so that we don't work in silos and contribute the best of our knowledge towards community and learn more by views and reviews. This website is maintained and brough to you by  Amlgo Labs Professionals .   Our Strong Basics -  1)   KISS (Keep It Simple and Straightforward) :  We believe most of the problems can be solved by keeping things simple and straight. This is the learning we had in past, sometimes we try to solve technical problems using high end algorithms and complex codes but this results into complications.

Power BI Blog - What is Power BI and Why we use Power BI.

This Power BI blog is an introduction to the Power BI. In this Power BI tutorial, you will learn Power BI basics - what is Power BI, Power BI Desktop and Power BI Service. Power BI tool is a combination of Business Intelligence and Visualization. Before we dig deep let’s understand what is Business Intelligence. Business Intelligence is a broad spectrum which consists of using different business strategies and techniques to gather data, refine it and process it for the purpose of data analysis which gives us some meaningful information which helps in data driven decision making. When we present this information in the form of some charts, graphs it takes the form of Visualization.

Power BI Card Visual With Conditional Formatting

In this blog we will take you through the Power BI Card Visual and a workaround to present card with conditional formatting. To know more about what is Power BI please do check Power BI Blog . The conditional formatting, we do will not be on the Card Visual, but the output will certainly look like Card, which is a requirement in some cases. Many Power BI users have used Card Visual to represent some value on the Power BI Reports separately or on above of any other visual, but sometimes clients ask which is deemed difficult to achieve, but not impossible. So, let’s dive into the practical to know more about it.

A simplified view of a QlikView tool:

In our last blog related to Qlikview, we discussed the use of Qlikview for Financial Data Analytics . Qlikview is a Business Intelligence tool that consists of a front end to visualize the processed data and a back end to provide the security and publication of the mechanism for QlikView user documents.

Polybase Installation for Scale-Out process

This part is the continuation of the previous blog about the introduction of  Polybase Scale-Out Group . As we have discussed in our earlier blog PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources. SQL Server 2016 and higher can access external data in Hadoop and Azure Blob Storage. Starting in SQL Server 2019, PolyBase can be used to access external data in SQL Server, Oracle, Teradata, and MongoDB.

JavaScript: The Important Basics

In this weird world of technology people often get confused how does a website work? What all things should I be aware of? So we are here to help you. Most of the content on web is developed with the help of JavaScript. JavaScript is really becoming popular these days with the coming of ECMAScript2015 and ECMAScript 2016, for this reason, some beginners learning React and Angular and are also trying to tackle more modern JavaScript syntax at the same time. If you're new to both, it can be confusing as to "what is JavaScript and what are its important features". This document should serve as a primer to help you get up-to-speed on JavaScript syntax that we feel matters the most for learning React as well as Angular.

How to Connect Windows with Ansible?

Hello Techies, this blog is the continuation of the last blog Install Ansible on Windows which is the second part of the series Ansible with Windows . Before you proceed, I recommend you to please take a look at the introductory blog of this series so to have an understanding of this series.