This part is the continuation of the previous blog about the introduction of Polybase Scale-Out Group. As we have discussed in our earlier blog PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources. SQL Server 2016 and higher can access external data in Hadoop and Azure Blob Storage. Starting in SQL Server 2019, PolyBase can be used to access external data in SQL Server, Oracle, Teradata, and MongoDB.
To improve query performance, we can use SQL Server PolyBase scale-out
groups. This enables parallel data transfer and it adds compute resources for
operating on the external data. PolyBase Scale-out Groups, a group of SQL
Server instances, enable you to process large external data sets in a parallel
processing architecture. Data loading and query performance can increase
linearly as you add more SQL Server instances to the group.
In the scenario, we need to add computation power to the existing SQL Server instance to process the T-SQL queries and this can be achieved by implementing the Polybase Scale-Out Group. At least two machines are required to implement the Polybase Scale-Out Group. One can be the Head Node and the other be Compute Node.
PolyBase queries
are submitted to the SQL Server on the head node. The part of the query that
refers to external tables is handed-off to the PolyBase engine.
The Polybase engine
is the key component behind PolyBase queries. It parses the query on external
data generates the query plan and distributes the work to the data movement
service on the compute nodes for execution. After completion of the work, it
receives the results from the compute nodes and submits them to SQL Server for
processing and returning to the client.
The PolyBase data
movement service receives instructions from the PolyBase engine and transfers
data between HDFS and SQL Server, and between SQL Server instances on the head
and compute nodes.
Polybase
implementations steps for the Scale-out process is as follows:
Required
Resources:
1. At least two machines
on the same domain to create a Polybase Scale-Out Group with the minimum
requirements:
1.1 64-bit SQL Server.
1.2 Microsoft .NET Framework 4.5.
1.3 Minimum memory: 4 GB.
1.4 Minimum hard-disk space: 2 GB.
1.5 Microsoft .NET Framework 4.5.
1.6 Recommended: Minimum of 16-GB
RAM.
2. SQL
Server 2019 (Edition: Enterprise, Standard or Developer)
3. Domain
Account to run Polybase services.
Available
Resources:
1. Machines: SQLCONTROL and SQLCOMPUTE2
2. SQL
Server 2019
3. Domain
Account: SVC-Polybase
Steps
to implement:
1. Install
SQL Server 2019 with Polybase Feature using a Domain User Account.
2. Configure
one machine as Head Node(SQLCONTROL).
3. Join
the other machine as Compute Node(SQLCOMPUTE2) in the group.
4. Create
a Database in the Head Node(SQLCONTROL).
5. Create
Master Key Encryption.
6. Create
Database Scoped Credentials.
7. Create
an External Data Source.
8. Create
External Tables.
9. Start
Querying the original Table using External Tables.
Limitations:
1. Unable
to use Table Hints with T-SQL on External Table.
2. Cannot
add Constraints to the Columns on External Tables.
3. Cannot
perform DML Operations on External Table.
Steps to implement: Detailed View
1. Install
SQL Server 2019 with Polybase Feature using a Domain User Account.
1.1 Run the SQL
Server setup.exe.
1.2 Select Installation,
and then select New standalone SQL Server installation or add features.
1.3 On the Feature
Selection page, select PolyBase Query Service for External Data.
1.4 On the Server Configuration page, configure the SQL Server PolyBase Engine Service and SQL Server PolyBase Data Movement Service to run under the same domain account SVC-Polybase.
1.5 Use the SQL Server
instance as part of a PolyBase scale-out group. This option opens the firewall
to allow incoming connections. Connections are allowed for the SQL Server
Database Engine, SQL Server PolyBase Engine, SQL Server PolyBase Data Movement
service, and the SQL browser. The firewall also allows incoming connections
from other nodes in a PolyBase scale-out group.
1.6 On the PolyBase
Configuration page, specify a port range with at least six ports. SQL
Server setup allocates the first six available ports from the
range.
1.7 After these options for the
Polybase Scale-Out
Group are selected, you can proceed
with the installation of SQL Server 2019.
1.8 If the installation is with Windows
Authentication instead of Mixed, please enable SQL Server Authentication from
the Server Properties.
1.9 Create a Generic SQL Server Login
in SQLCONTROL, SQLCOMPUTE2. We created a Login SVCPolybase as SQL Server Login
with the sysadmin role. You can also access the server with the Domain Account
SVC-Polybase in Windows Authentication mode which was created.
To join the Nodes
in the scale-out group please follow the blog Polybase Scale-Out Group.
Also to create external tables and check the performance of the Polybase
scale-out group over standalone structure please refer blog Polybase Installation Guide.
About
Amlgo Labs : Amlgo Labs is
an advanced data analytics and decision sciences company based out in Gurgaon
and Bangalore, India. We help our clients in different areas of
data solutions includes design/development of end to end
solutions (Cloud, Big Data, UI/UX, Data Engineering, Advanced Analytics
and Data Sciences) with a focus on improving businesses and providing
insights to make intelligent data-driven decisions across verticals. We have
another vertical of business that we call - Financial Regulatory Reporting for
(MAS, APRA, HKMA, EBA, FED, RBI etc) all
major regulators in the world and our team is specialized in commonly used
regulatory tools across the globe (AxiomSL Controller
View, OneSumX Development, Moody’s
Risk, IBM Open Pages etc).We
build innovative concepts and then solutions to give an extra edge to the
business outcomes and help to visualize and execute an effective decision
strategies.
Please feel free to comment or share your views and thoughts. You can always reach out to us by sending an email at info@amlgolabs.com or filling a contact form at the end of the page.
This is a smart blog. I mean it. You have an excellent knowledge about this topic. Thanks for sharing such a great blogs to us. Vietnam Export Data
ReplyDeleteHi, the information that you have shared is very helpful. The detailing in resources and implentation process is great. keep Sharing! SQL Server Load Rest API
ReplyDeleteVietnam import data best import export data provider Vietnam trade data
ReplyDelete