This part is the continuation of the previous
blog Polybase Introduction. If you guys are new, I recommend you to please go
through that once for better understanding. In this part we will see how we can
Install Polybase feature and how it is being implemented. So, let’s get
started.
For the demonstration purpose we are using SQL Server 2019(Developer Edition). Before installing the Polybase feature let us look at the pre-requisites.
Required Resources:
1. Minimum requirements:
1.1 64-bit SQL Server.
1.2 Microsoft .NET Framework 4.5.
1.3 Minimum memory: 4 GB.
1.4 Minimum hard-disk space: 2 GB.
1.5 Microsoft .NET Framework 4.5.
1.6 Recommended: Minimum of 16-GB
RAM.
2. SQL Server 2019 (Edition:
Enterprise, Standard or Developer)
3. Domain Account to run Polybase services.
Steps to implement:
1. Install SQL Server 2019
with Polybase Feature using a Domain User Account.
2. Create
a Database.
3. Create Master Key
Encryption.
4. Create Database Scoped
Credentials.
5. Create External Data Source.
6. Create External Tables.
7. Start Querying the original Table using External Tables.
Polybase Installation:
1. Run the SQL Server setup.exe.
2. Select Installation, and then
select New standalone SQL Server installation or add features.
3. On the Feature Selection page, select PolyBase
Query Service for External Data and option Java connector for
HDFS data sources.
|
Note: You need to install Oracle JRE 7 update 51 or
higher to install Polybase. If it is not installed, you will get below error
message while checking the rules for installation.
4.On the Server Configuration page, configure
the SQL Server PolyBase Engine Service and SQL Server
PolyBase Data Movement Service to run under the same domain account.
5.On the PolyBase Configuration page, Use the SQL
Server instance as a standalone PolyBase-enabled instance. And proceed with the
rest of the installation process.
Post Installation Steps:
1. DWConfiguration, DWDiagnostics, and DWQueue databases will be create
once you install Polybase. These databases are for PolyBase use. Do not make
any changes to them. Keep them as it is.
2. After installation, we need to confirm whether
the Polybase feature is installed. The T-SQL, SELECT SERVERPROPERTY
('IsPolyBaseInstalled') AS IsPolyBaseInstalled; will return 1 if Polybase is
installed and 0 if it is not installed.
3. After installation, PolyBase must be enabled,
but still we can do so using the T-SQL:
exec sp_configure @configname =
'polybase enabled', @configvalue = 1;RECONFIGURE;
Polybase Configuration:
You may have read BOL or went through some articles
regarding Polybase Implementation and must have seen the examples quoting the
use of Polybase with Hadoop or Azure as External Data Sources. But, here we
will be using another SQL Server as an External Data Source, which will give
you a different insight of how we can use Polybase to connect to a different
SQL Server which can be used as an External Data Source. We have two servers
AMLGO-01 and EX-SQL-01. We need to follow the below steps in order to use
Polybase.
1. Connect to the Server AMLGO-01 and create a
Database.
Execute T-SQL, Create Database Polybase Test
2. Create Master Key Encryption.
Execute T-SQL, Use Polybase Test;
Create Master Key Encryption By Password =
‘Polybase@123’ (Can be any key)
3. Create Database Scoped Credentials. Here
Identity is the User who has access to the other SQL Server and
Secret is its password.
Execute T-SQL, Use Polybase Test;
Create Database Scoped Credential User_Credential
With Identity = 'Polybase User', Secret =
'Polybase@123';
4. Create External Data Source. Here EX-SQL-01
is the other SQL Server which will be treated as an External Data Source.
Execute T-SQL, Use Polybase Test;
Create External Data Source EX_Source
With (Location = 'sqlserver://EX-SQL-01:1433',
Credential = User_Credential, Connection_Options =
'UseDefaultEncryptionOptions=false')
The Connection_Options parameter is used for the
sources below Sql Server 2016, in order to fetch entire data, else it will give
below TCP Error.
[Cannot execute the query "Remote Query"
against OLE DB provider "MSOLEDBSQL" for linked server
"(null)". 105082;Generic ODBC error: [Microsoft][ODBC Driver 17 for
SQL Server]TCP Provider: An existing connection was forcibly closed by the
remote host.]
5. We have AdventureWorksDW2012 database on our
External Source EX-SQL-01, and we want to access the DimAccount
Table from AMLGO-01. For that we need to create an External Table with the same
structure on AMLGO-01 Server. |
Before we create an External Table we need to keep
in mind certain Polybase Limitations.
- External Tables cannot be created using User Defined Data Types.
- Identity is Not Supported with External Tables.
- Unable to use Table Hints with T-SQL on External Table.
- Cannot add Constraints to the Columns on External Tables.
- Cannot perform DML Operations on External Table.
We can now see the External Table Ex_DimAccount in
Polybase Test Database on AMLGO-01
After all this is done, we can now access the data
of DimAccount in AdventureWorksDW2012 on EX-SQL-01 using External Table created
in AMLGO-01.
As we have created and External Table to access the
data of a Table located in an External Source. We can similarly create External
Tables for Data Sources like Hadoop, Azure, Oracle etc and Polybase helps you
to access them using T-SQL only. You do not need to learn any new language or
technology for the same. This is how Polybase helps us to connect to different
External Data Sources and access them using T-SQL.
Hope this blog helps you to get a better
understanding of how the Polybase works. Till now you have the knowledge of
What is Polybase, How to install it, How to Configure it And How to Implement
it? Going forward we will learn what is Polybase Scale-Out Group and why it is
being used.
About Amlgo Labs : Amlgo Labs is
an advanced data analytics and decision sciences company based out in Gurgaon
and Bangalore, India. We help our clients in different areas of
data solutions includes design/development of end to end
solutions (Cloud, Big Data, UI/UX, Data Engineering, Advanced Analytics
and Data Sciences) with a focus on improving businesses and providing
insights to make intelligent data-driven decisions across verticals. We have
another vertical of business that we call - Financial Regulatory Reporting for
(MAS, APRA, HKMA, EBA, FED, RBI etc) all
major regulators in the world and our team is specialized in commonly used
regulatory tools across the globe (AxiomSL Controller
View, OneSumX Development, Moody’s
Risk, IBM Open Pages etc).We build innovative concepts and then
solutions to give an extra edge to the business outcomes and help to visualize
and execute effective decision strategies.
Please feel free to comment or share
your views and thoughts. You can always reach out to us by sending an
email at info@amlgolabs.com or filling a contact form at the end of the
page.
Comments
Post a Comment