One of the new features introduced with SQL Server 2022 is the ability to connect to any S3-compatible object storage and SQL Server supports both Backup/Restore and data lake virtualization with Polybase integration. In this blog, we will demonstrate both of these features for SQL Server 2022 Containers running on Kubernetes. As usual, I will use the Azure Kubernetes Service as my Kubernetes environment :).
To know more about these features, refer SQL Server backup to URL for S3-compatible object storage - SQL Server | Microsoft Learn and Access external data: S3-compatible object storage - PolyBase - SQL Server | Microsoft Learn.
Before we enable Polybase for SQL Server containers, let's go over the prerequisites for this demo:
You need to have an S3 – Object storage that you can connect to and in this blog, I have configured a MinIO based S3-object storage server on Ubuntu with TLS configured. For details on how to configure this refer: Deploy MinIO: Single-Node Single-Drive — MinIO Object Storage for Linux and Network Encryption (TLS) — MinIO Object Storage for Linux.
## Step 1: Create a DockerFile with the below content
# Base OS layer: Latest Ubuntu LTS
FROM mcr.microsoft.com/mssql/server:2022-latest
USER root
# Install prerequistes since it is needed to get repo config for SQL server
RUN apt-get update && \
apt-get install -y software-properties-common && \
rm -rf /var/lib/apt/lists/*
RUN add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/20.04/mssql-server-2022.list)" && \
apt-get install -y mssql-server-polybase
EXPOSE 1433
USER mssql
# Run SQL Server process
CMD ["/opt/mssql/bin/sqlservr"]
# Step 2: Let's build the container image
sudo docker build -t mssql-pb-update .
# Step 3: Tag the image and push to your container registry, in this case I am using Azure Container Registry(ACR)that I have.
sudo docker tag sql-pb-update amvinacr.azurecr.io/sql-pb-update:latest
sudo docker tag mssql-pb-update amvinacr.azurecr.io/sql-pb-update:latest
# To know more about the ACR refer: https://learn.microsoft.com/en-us/azure/container-registry/container-registry-tutorial-quick-task
With the pre-requisties done, now we can start deploying SQL Server containers on our AKS cluster using the custom image that we created, that installs polybase. But, before we start deploying SQL Server containers, we need two configmaps and one secret object in the AKS cluster, with the details as shown below, I am running all these commands from my Windows client which has kubectl installed and is able to connect to the AKS cluster 'sqlk8s' that I created in previous steps.
# Use the below command to create the configmap using the file "public.crt"
kubectl create configmap mssql-tls --from-file="C:\pbdemo\certs\public.crt"
# You can verify that the, the configmap is created using the command below and the sample output is shown below with obfuscated lines in the output.
C:\>kubectl describe configmap mssql-tls
Name: mssql-tls
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
public.crt:
----
-----BEGIN CERTIFICATE-----
MIIDjjCCAnagAwIBAgIUf/RP+g+9uFAKD4TB0R2ot6jZMJAwDQYJKoZIhvcNAQEL
BQAwZjELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAlZBMRIwEAYDVQQHDAlTb21ld2hl
cmUxDjAMBgNVBAoMBU15T3JnMQ0wCwYDVQQLDARNeU9VMRcwFQYDVQQDDA5taW5p
b3NlcnZlcnVidTAeFw0yMzAxMDYxNTA0MjhaFw0zMzAxMDMxNTA0MjhaMGYxCzAJ
BgNVBAYTAlVTMQswCQYDVQQIDAJWQTESMBAGA1UEBwwJU29tZXdoZXJlMQ4wDAYD
VQQKDAVNeU9yZzENMAsGA1UECwwETXlPVTEXMBUGA1UEAwwObWluaW9zZXJ2ZXJ1
YnUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCcC0irpVFJQaVqiuKP
LuRQUdYxehTvFCuCJLpGhS03XczCWKzemd0a3W/UC01N+tbFzpmCi0/l+R5SFsVC
3tRPKlngtDOICnYPpKSaWz03o3BgwNqtuKtucboXp3Qo+5DSd7XDi1qY+plY6LwC
vbjLtjNBfyWlwS6nhTjDebvUY6w0xNmG8w1H9fPZcq/gsSGUnPyvywEENYmk5R6X
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1J/rLOJUVeKRvpbLraN25VoEoS8hz07DT3gmtVyJgpEK1oJ+mEs94ae76ujr5MGm
M4HgV4+o9wxqjWdPyFDny/1EPsv30VqbcwmbkuPE7UHG4JH/5gxqxnXJcPnr3BQy
4GU=
-----END CERTIFICATE-----
# Create a yaml file called: mssql_config.yaml and save it to the location "C:\pbdemo\certs\mssql_config.yaml"
kind: ConfigMap
apiVersion: v1
metadata:
name: mssql
data:
mssql.conf: |
[EULA]
accepteula = Y
accepteulaml = Y
[coredump]
captureminiandfull = true
coredumptype = full
[language]
lcid = 1033
[filelocation]
defaultdatadir = /var/opt/mssql/userdata
defaultlogdir = /var/opt/mssql/userlog
[traceflag]
traceflag0 = 13702
# We are ready to create the configmap using the command as shown below:
Kubectl apply -f “C:\pbdemo\certs\ mssql_config.yaml”
# You can inspect the configmap using the command shown below with the sample output as well
C:\>kubectl describe configmap mssql
Name: mssql
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
mssql.conf:
----
[EULA]
accepteula = Y
accepteulaml = Y
[coredump]
captureminiandfull = true
coredumptype = full
[language]
lcid = 1033
[filelocation]
defaultdatadir = /var/opt/mssql/userdata
defaultlogdir = /var/opt/mssql/userlog
[traceflag]
traceflag0 = 13702
Events: <none>
kubectl create secret generic mssql --from-literal=MSSQL_SA_PASSWORD="MyC0m9l&xP@ssw0rd"
We are now ready to deploy SQL Server containers, below is the deployment yaml called "sqldeploy.yaml" that is used to deploy one SQL Server container as a statefulset deployment using the Azure disk as the storage class.
Using this YAML we deploy:
# Create the sqldeploy.yaml at the location "C:\pbdemo\certs\sqldeploy.yaml", and paste the content below, you can modify it according to your requirements.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azure-disk
provisioner: kubernetes.io/azure-disk
parameters:
storageaccounttype: Standard_LRS
kind: Managed
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mssql
labels:
app: mssql
spec:
serviceName: "mssql"
replicas: 1
selector:
matchLabels:
app: mssql
template:
metadata:
labels:
app: mssql
spec:
securityContext:
fsGroup: 10001
containers:
- name: mssql
command:
- /bin/bash
- -c
- cp /var/opt/config/mssql.conf /var/opt/mssql/mssql.conf && /opt/mssql/bin/sqlservr
image: amvinacr.azurecr.io/sql-pb-update:latest
ports:
- containerPort: 1433
env:
- name: ACCEPT_EULA
value: "Y"
- name: MSSQL_ENABLE_HADR
value: "1"
- name: MSSQL_SA_PASSWORD
valueFrom:
secretKeyRef:
name: mssql
key: MSSQL_SA_PASSWORD
volumeMounts:
- name: mssql
mountPath: "/var/opt/mssql"
- name: userdata
mountPath: "/var/opt/mssql/userdata"
- name: userlog
mountPath: "/var/opt/mssql/userlog"
- name: tempdb
mountPath: "/var/opt/mssql/tempdb"
- name: mssql-config-volume
mountPath: "/var/opt/config"
- name: mssql-tls-configmap
mountPath: "/var/opt/mssql/security/ca-certificates/"
volumes:
- name: mssql-config-volume
configMap:
name: mssql
- name: mssql-tls-configmap
configMap:
name: mssql-tls
volumeClaimTemplates:
- metadata:
name: mssql
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
- metadata:
name: userdata
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
- metadata:
name: userlog
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
- metadata:
name: tempdb
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
---
apiVersion: v1
kind: Service
metadata:
name: mssql-0
spec:
type: LoadBalancer
selector:
statefulset.kubernetes.io/pod-name: mssql-0
ports:
- protocol: TCP
port: 1433
targetPort: 1433
name: tcpsql
# You can run the below command to deploy SQL Server conatiner based on the deployment yaml file called sqldeploy.yaml
Kubectl apply -f “C:\pbdemo\certs\sqldeploy.yaml”
# You can run the kubectl get command to see all the SQL Server container deployed, sample command and output shown below for reference
C:\>kubectl get all
NAME READY STATUS RESTARTS AGE
pod/mssql-0 1/1 Running 0 92m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 16h
service/mssql-0 LoadBalancer 10.0.252.15 20.235.55.7 1433:31551/TCP 13s
NAME READY AGE
statefulset.apps/mssql 1/1 97m
We are ready to connect to the SQL Server, feel free to use your favorite SQL Server tool .viz SQL Server Management Studio(SSMS) or Azure Data Studio (ADS) and connect to the SQL Server and run the below commands to virtualize data and access CSV or parquet files and/or to take backups of the databases on the S3 based object storage.
I'm using the SSMS tool and running the below queries. I connect to the SQL Server and confirm that that TLS certificate public.crt is loaded by SQL Server, I can do that from the errorlog using the command : sp_readerrorlog from the T-SQL edition and the sample output is shown below with the certificate loaded:
To take S3 based backups, I need to first create bucket on the storage in this case, I connect to the MinIO portal and create the bucket called "sqltest" as shown below and then create a credential to access the storage.
Now, use the below sample T-SQL script to create the credential and running the backup, the backup should be successful as shown below
CREATE CREDENTIAL [s3://10.0.0.82:9000/sqltest]
WITH
IDENTITY = 'S3 Access Key',
SECRET = 'amit:mypass12345';
BACKUP DATABASE S3_backup_demo
TO URL = 's3://10.0.0.82:9000/sqltest/S3_backup_demo_db.bak'
WITH STATS = 10
You can view the backup file created on the MinIO console as shown below:
Here are the steps to configure data virtualization, below T-SQL sample script is used to configure the following steps:
-- Verify the Polybase feature is installed:
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
-- Enable polybase feature using the commmands below:
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
reconfigure with override
exec sp_configure @configname = 'polybase enabled'
--Next, let’s create a database and database scoped credential to access the object storage.
create database pb_demo
USE pb_demo
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'mypass123@';
CREATE DATABASE SCOPED CREDENTIAL s3_dc WITH IDENTITY = 'S3 Access Key', SECRET = 'amit:mypass12345' ;
-- To verify, the credential is created you can run the below command:
SELECT * FROM sys.database_scoped_credentials;
-- Now go ahead and try creating the External data source:
CREATE EXTERNAL DATA SOURCE s3_ds WITH (LOCATION = 's3://10.0.0.82:9000/',CREDENTIAL = s3_dc)
-- When creating the external data source, if you see the below error, then please restart the container and run the same command again that should solve the below error
Msg 46530, Level 16, State 11, Line 20
External data sources are not supported with type GENERIC.
--We now have the credential and the external data source created to query files on the object storage to query a CSV based file that is already present on the object storage you can run the command as shown below
SELECT * FROM OPENROWSET( BULK '/sqltest/test.csv', FORMAT = 'CSV', DATA_SOURCE = 's3_ds') WITH ( c1 varchar(50) ) AS [Test1]
-- To query a parquet file that is available on the object storage you can run the below query, I copied a sample parquet file that is available here: https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet to my Minio object storage.
SELECT *
FROM OPENROWSET
( BULK '/sqltest/userdata1.parquet'
, FORMAT = 'PARQUET'
, DATA_SOURCE = 's3_ds'
)
AS [Test12]
Note: You can also use Create External Table as (CETAS) to access both parquet and CSV files for SQL Server on Linux.
I hope you find this helpful to get started with polybase and S3 storage integration with SQL Server 2022 containers. Special thanks to my colleagues from the Polybase team for helping me with this demo. Happy learning!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.