Skip to main content

Polybase Installation for Scale-Out process

This part is the continuation of the previous blog about the introduction of Polybase Scale-Out Group. As we have discussed in our earlier blog PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources. SQL Server 2016 and higher can access external data in Hadoop and Azure Blob Storage. Starting in SQL Server 2019, PolyBase can be used to access external data in SQL Server, Oracle, Teradata, and MongoDB.

To improve query performance, we can use SQL Server PolyBase scale-out groups. This enables parallel data transfer and it adds compute resources for operating on the external data. PolyBase Scale-out Groups, a group of SQL Server instances, enable you to process large external data sets in a parallel processing architecture. Data loading and query performance can increase linearly as you add more SQL Server instances to the group.

In the scenario, we need to add computation power to the existing SQL Server instance to process the T-SQL queries and this can be achieved by implementing the Polybase Scale-Out Group. At least two machines are required to implement the Polybase Scale-Out Group. One can be the Head Node and the other be Compute Node.

Source: Microsoft.com

PolyBase queries are submitted to the SQL Server on the head node. The part of the query that refers to external tables is handed-off to the PolyBase engine.

The Polybase engine is the key component behind PolyBase queries. It parses the query on external data generates the query plan and distributes the work to the data movement service on the compute nodes for execution. After completion of the work, it receives the results from the compute nodes and submits them to SQL Server for processing and returning to the client.

The PolyBase data movement service receives instructions from the PolyBase engine and transfers data between HDFS and SQL Server, and between SQL Server instances on the head and compute nodes.

Polybase implementations steps for the Scale-out process is as follows:

Required Resources:

1.       At least two machines on the same domain to create a Polybase Scale-Out Group with the minimum requirements:

1.1   64-bit SQL Server.

1.2   Microsoft .NET Framework 4.5.

1.3   Minimum memory: 4 GB.

1.4   Minimum hard-disk space: 2 GB.

1.5   Microsoft .NET Framework 4.5.

1.6   Recommended: Minimum of 16-GB RAM.

2.       SQL Server 2019 (Edition: Enterprise, Standard or Developer)

3.       Domain Account to run Polybase services.

Available Resources:

1.       Machines: SQLCONTROL and SQLCOMPUTE2

2.       SQL Server 2019

3.       Domain Account: SVC-Polybase

Steps to implement:

1.       Install SQL Server 2019 with Polybase Feature using a Domain User Account.

2.       Configure one machine as Head Node(SQLCONTROL).

3.       Join the other machine as Compute Node(SQLCOMPUTE2) in the group.

4.       Create a Database in the Head Node(SQLCONTROL).

5.       Create Master Key Encryption.

6.       Create Database Scoped Credentials.

7.       Create an External Data Source.

8.       Create External Tables.

9.       Start Querying the original Table using External Tables.

Limitations:

1.       Unable to use Table Hints with T-SQL on External Table.

2.       Cannot add Constraints to the Columns on External Tables.

3.       Cannot perform DML Operations on External Table.

Steps to implement: Detailed View

1.       Install SQL Server 2019 with Polybase Feature using a Domain User Account.

1.1   Run the SQL Server setup.exe.

1.2   Select Installation, and then select New standalone SQL Server installation or add features.

1.3   On the Feature Selection page, select PolyBase Query Service for External Data.

1.4 On the Server Configuration page, configure the SQL Server PolyBase Engine Service and SQL Server PolyBase Data Movement Service to run under the same domain account SVC-Polybase.

Source: Mssqltips.com

1.5   Use the SQL Server instance as part of a PolyBase scale-out group. This option opens the firewall to allow incoming connections. Connections are allowed for the SQL Server Database Engine, SQL Server PolyBase Engine, SQL Server PolyBase Data Movement service, and the SQL browser. The firewall also allows incoming connections from other nodes in a PolyBase scale-out group.

1.6   On the PolyBase Configuration page, specify a port range with at least six ports. SQL     Server setup allocates the first six available ports from the range.

1.7   After these options for the Polybase Scale-Out

 

Group are selected, you can proceed with the installation of SQL Server 2019.

1.8   If the installation is with Windows Authentication instead of Mixed, please enable SQL Server Authentication from the Server Properties.

1.9   Create a Generic SQL Server Login in SQLCONTROL, SQLCOMPUTE2. We created a Login SVCPolybase as SQL Server Login with the sysadmin role. You can also access the server with the Domain Account SVC-Polybase in Windows Authentication mode which was created.

To join the Nodes in the scale-out group please follow the blog Polybase Scale-Out Group. Also to create external tables and check the performance of the Polybase scale-out group over standalone structure please refer blog Polybase Installation Guide.

About Amlgo Labs : Amlgo Labs is an advanced data analytics and decision sciences company based out in Gurgaon and Bangalore, India. We help our clients in different areas of data solutions includes design/development of end to end solutions (Cloud, Big Data, UI/UX, Data Engineering, Advanced Analytics and Data Sciences) with a focus on improving businesses and providing insights to make intelligent data-driven decisions across verticals. We have another vertical of business that we call - Financial Regulatory Reporting for (MASAPRAHKMAEBAFEDRBI etc) all major regulators in the world and our team is specialized in commonly used regulatory tools across the globe (AxiomSL Controller ViewOneSumX DevelopmentMoody’s RiskIBM Open Pages etc).We build innovative concepts and then solutions to give an extra edge to the business outcomes and help to visualize and execute an effective decision strategies. We are among top 10 Data Analytics Start-ups in India, 2019 and 2020.

Please feel free to comment or share your views and thoughts. You can always reach out to us by sending an email at info@amlgolabs.com or filling a contact form at the end of the page.


Comments

  1. This is a smart blog. I mean it. You have an excellent knowledge about this topic. Thanks for sharing such a great blogs to us. Vietnam Export Data

    ReplyDelete
  2. Hi, the information that you have shared is very helpful. The detailing in resources and implentation process is great. keep Sharing! SQL Server Load Rest API

    ReplyDelete

Post a Comment

More Popular Posts

JavaScript: The Important Basics

In this weird world of technology people often get confused how does a website work? What all things should I be aware of? So we are here to help you. Most of the content on web is developed with the help of JavaScript. JavaScript is really becoming popular these days with the coming of ECMAScript2015 and ECMAScript 2016, for this reason, some beginners learning React and Angular and are also trying to tackle more modern JavaScript syntax at the same time. If you're new to both, it can be confusing as to "what is JavaScript and what are its important features". This document should serve as a primer to help you get up-to-speed on JavaScript syntax that we feel matters the most for learning React as well as Angular.

Amlgo Blog - Experience The Experiments

Amlgo Labs Blog  is a step towards our vision to share knowledge and experiences, Amlgoites accept every challenge very enthusiastically. We do experiments, we fail but we learn and build complex solutions to help our clients solve their problems in Data, Analytics, Prediction, Forecasting, Reporting, Designing and Development area. During this process we enjoy immense learning everyday and we have decided to share our thoughts, learnings, experiments and experiences so that we don't work in silos and contribute the best of our knowledge towards community and learn more by views and reviews. This website is maintained and brough to you by  Amlgo Labs Professionals .   Our Strong Basics -  1)   KISS (Keep It Simple and Straightforward) :  We believe most of the problems can be solved by keeping things simple and straight. This is the learning we had in past, sometimes we try to solve technical problems using high end algorithms and complex codes but this results into complications.

Power BI Blog - What is Power BI and Why we use Power BI.

This Power BI blog is an introduction to the Power BI. In this Power BI tutorial, you will learn Power BI basics - what is Power BI, Power BI Desktop and Power BI Service. Power BI tool is a combination of Business Intelligence and Visualization. Before we dig deep let’s understand what is Business Intelligence. Business Intelligence is a broad spectrum which consists of using different business strategies and techniques to gather data, refine it and process it for the purpose of data analysis which gives us some meaningful information which helps in data driven decision making. When we present this information in the form of some charts, graphs it takes the form of Visualization.

Polybase Blog - Introduction

Overview: This Polybase blog series is all about the use of Polybase Technology in today’s era to be able to take advantage of the Data(Relational and Non-Relational) by using T-SQL only. Data whether Big or not is the lifeline to many different sectors to cope up with Production, Maintenance, Predictions, Taking Precautionary Measures, Customer Satisfaction, Customer Retention, Sales, Revenue Generation and many more.

A simplified view of a QlikView tool:

In our last blog related to Qlikview, we discussed the use of Qlikview for Financial Data Analytics . Qlikview is a Business Intelligence tool that consists of a front end to visualize the processed data and a back end to provide the security and publication of the mechanism for QlikView user documents.

Stages of New Regulatory Reporting Implementation

In the last blog, we discussed what is  Financial Regulatory Reporting  and its importance in the banking industry. Regulatory Reporting comprises the submission of business numbers as required by regulators to evaluate and track the financial and operational status of financial institutions and their compliance with required regulatory provisions.