SQL Server containers on Kubernetes with S3-compatible object storage - Getting started

Microsoft

Jan 16, 2023

One of the new features introduced with SQL Server 2022 is the ability to connect to any S3-compatible object storage and SQL Server supports both Backup/Restore and data lake virtualization with Polybase integration. In this blog, we will demonstrate both of these features for SQL Server 2022 Containers running on Kubernetes. As usual, I will use the Azure Kubernetes Service as my Kubernetes environment 🙂.

To know more about these features, refer SQL Server backup to URL for S3-compatible object storage - SQL Server | Microsoft Learn and Access external data: S3-compatible object storage - PolyBase - SQL Server | Microsoft Learn.

Before we enable Polybase for SQL Server containers, let's go over the prerequisites for this demo:

Pre-requisites:

You need to have an S3 – Object storage that you can connect to and in this blog, I have configured a MinIO based S3-object storage server on Ubuntu with TLS configured. For details on how to configure this refer: Deploy MinIO: Single-Node Single-Drive — MinIO Object Storage for Linux and Network Encryption (TLS) — MinIO Object Storage for Linux.

MinIO setting details:
- Minio Console is configured at : https://10.0.0.82:9001/ and the Server endpoint is configured at : https://10.0.0.82:9000/
- On the MinIO server, I have enabled ports 9000 and 9001 in the firewall.
- The public certificate that I used to configure TLS on the MinIO server is called "public.crt," and I have it downloaded and saved on my Windows client.

Customized SQL Server Container:

Let's create the customized SQL Server container image, that'll be used to deploy the SQL Server containers on the Kubernetes cluster. Here are the steps for creating the customised SQL Server containers; for this demonstration, I created the custom container image for this demo on the same Ubuntu-based Azure VM that I used to configure the MinIO Server.

## Step 1: Create a DockerFile with the below content
# Base OS layer: Latest Ubuntu LTS
FROM mcr.microsoft.com/mssql/server:2022-latest
USER root
# Install prerequistes since it is needed to get repo config for SQL server
RUN apt-get update && \
    apt-get install -y software-properties-common && \
    rm -rf /var/lib/apt/lists/*
RUN add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/20.04/mssql-server-2022.list)" && \
    apt-get install -y mssql-server-polybase
EXPOSE 1433
USER mssql
# Run SQL Server process
CMD ["/opt/mssql/bin/sqlservr"]

# Step 2: Let's build the container image
 sudo docker build -t mssql-pb-update .

# Step 3: Tag the image and push to your container registry, in this case I am using Azure Container Registry(ACR)that I have.
 sudo docker tag sql-pb-update  amvinacr.azurecr.io/sql-pb-update:latest
 sudo docker tag mssql-pb-update  amvinacr.azurecr.io/sql-pb-update:latest

# To know more about the ACR refer: https://learn.microsoft.com/en-us/azure/container-registry/container-registry-tutorial-quick-task

Azure Kubernetes Service based cluster creation and attaching it to the Azure Container Registry(ACR)
- In this blog, I am deploying SQL Server containers on Azure Kubernetes Service(AKS) based Kubernetes cluster called "sqlk8s", I have integrated my AKS cluster with the above Azure Container Registry(ACR) to which I pushed my images in the above sample code. For detailed steps on AKS cluster creation and attaching it to your choice of container registry in Azure please refer the below two articles:
  - Quickstart: Deploy an AKS cluster by using Azure CLI - Azure Kubernetes Service | Microsoft Learn
  - Integrate Azure Container Registry with Azure Kubernetes Service - Azure Kubernetes Service | Microsoft Learn

Let's get started

With the pre-requisties done, now we can start deploying SQL Server containers on our AKS cluster using the custom image that we created, that installs polybase. But, before we start deploying SQL Server containers, we need two configmaps and one secret object in the AKS cluster, with the details as shown below, I am running all these commands from my Windows client which has kubectl installed and is able to connect to the AKS cluster 'sqlk8s' that I created in previous steps.

As the endpoint will be validated by the certificate, we will use one configmap object to load the public.crt certificate required for configuring the secure channel between SQL Server and the MinIO Storage. You can use the command below to generate the configmap from the public.crt file. I have copied the certificate public.crt to the following location on my Windows machine: "C:\pb\demo\certs\public.crt"

# Use the below command to create the configmap using the file "public.crt"
kubectl create configmap mssql-tls --from-file="C:\pbdemo\certs\public.crt"

# You can verify that the, the configmap is created using the command below and the sample output is shown below with obfuscated lines in the output.

C:\>kubectl describe configmap mssql-tls
Name:         mssql-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>
 
Data
====
public.crt:
----
-----BEGIN CERTIFICATE-----
MIIDjjCCAnagAwIBAgIUf/RP+g+9uFAKD4TB0R2ot6jZMJAwDQYJKoZIhvcNAQEL
BQAwZjELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAlZBMRIwEAYDVQQHDAlTb21ld2hl
cmUxDjAMBgNVBAoMBU15T3JnMQ0wCwYDVQQLDARNeU9VMRcwFQYDVQQDDA5taW5p
b3NlcnZlcnVidTAeFw0yMzAxMDYxNTA0MjhaFw0zMzAxMDMxNTA0MjhaMGYxCzAJ
BgNVBAYTAlVTMQswCQYDVQQIDAJWQTESMBAGA1UEBwwJU29tZXdoZXJlMQ4wDAYD
VQQKDAVNeU9yZzENMAsGA1UECwwETXlPVTEXMBUGA1UEAwwObWluaW9zZXJ2ZXJ1
YnUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCcC0irpVFJQaVqiuKP
LuRQUdYxehTvFCuCJLpGhS03XczCWKzemd0a3W/UC01N+tbFzpmCi0/l+R5SFsVC
3tRPKlngtDOICnYPpKSaWz03o3BgwNqtuKtucboXp3Qo+5DSd7XDi1qY+plY6LwC
vbjLtjNBfyWlwS6nhTjDebvUY6w0xNmG8w1H9fPZcq/gsSGUnPyvywEENYmk5R6X
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1J/rLOJUVeKRvpbLraN25VoEoS8hz07DT3gmtVyJgpEK1oJ+mEs94ae76ujr5MGm
M4HgV4+o9wxqjWdPyFDny/1EPsv30VqbcwmbkuPE7UHG4JH/5gxqxnXJcPnr3BQy
4GU=
-----END CERTIFICATE-----

The second configmap object is used to load mssql.conf, which contains the polybase Trace flag 13702, which is required for SQL Server on Linux. This configmap is generated using the yaml script shown below:

# Create a yaml file called: mssql_config.yaml and save it to the location "C:\pbdemo\certs\mssql_config.yaml"

kind: ConfigMap 
apiVersion: v1 
metadata:
  name: mssql
data:
  mssql.conf: |
    [EULA]
    accepteula = Y
    accepteulaml = Y
    
    [coredump]
    captureminiandfull = true
    coredumptype = full
    
    [language]
    lcid = 1033
 
    [filelocation]
    defaultdatadir = /var/opt/mssql/userdata
    defaultlogdir = /var/opt/mssql/userlog
    
    [traceflag]
    traceflag0 = 13702

# We are ready to create the configmap using the command as shown below:
Kubectl apply -f “C:\pbdemo\certs\ mssql_config.yaml”

# You can inspect the configmap using the command shown below with the sample output as well
C:\>kubectl describe configmap mssql
Name:         mssql
Namespace:    default
Labels:       <none>
Annotations:  <none>
 
Data
====
mssql.conf:
----
[EULA]
accepteula = Y
accepteulaml = Y
 
[coredump]
captureminiandfull = true
coredumptype = full
 
[language]
lcid = 1033
 
[filelocation]
defaultdatadir = /var/opt/mssql/userdata
defaultlogdir = /var/opt/mssql/userlog
 
[traceflag]
traceflag0 = 13702
 
Events:  <none>

The secret object in the cluster is used to save the sa password for SQL Server, the command to create the secret is as shown below
```
kubectl create secret generic mssql --from-literal=MSSQL_SA_PASSWORD="MyC0m9l&xP@ssw0rd"
```

Deploy SQL Server Containers

We are now ready to deploy SQL Server containers, below is the deployment yaml called "sqldeploy.yaml" that is used to deploy one SQL Server container as a statefulset deployment using the Azure disk as the storage class.

Using this YAML we deploy:

One SQL Server instance, with separate PV(persistent volumes) and Persistent volume claims (PVCs) for User database, user log and tempdb files.
We are loading the public.crt certificate to the folder: /var/opt/mssql/security/ca-certificates folder inside the container.
We are loading the mssql.conf file with various parameter and the required trace flag 13702.

At the end, we create the external load balancer Service to be able to connect to the SQL Server.

# Create the sqldeploy.yaml at the location "C:\pbdemo\certs\sqldeploy.yaml", and paste the content below, you can modify it according to your requirements.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
     name: azure-disk
provisioner: kubernetes.io/azure-disk
parameters:
  storageaccounttype: Standard_LRS
  kind: Managed
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
 name: mssql
 labels:
  app: mssql
spec:
 serviceName: "mssql"
 replicas: 1
 selector:
  matchLabels:
   app: mssql
 template:
  metadata:
   labels:
    app: mssql
  spec:
   securityContext:
     fsGroup: 10001
   containers:
   - name: mssql
     command:
       - /bin/bash
       - -c
       - cp /var/opt/config/mssql.conf /var/opt/mssql/mssql.conf && /opt/mssql/bin/sqlservr
     image: amvinacr.azurecr.io/sql-pb-update:latest
     ports:
     - containerPort: 1433
     env:
     - name: ACCEPT_EULA
       value: "Y"
     - name: MSSQL_ENABLE_HADR
       value: "1"
     - name: MSSQL_SA_PASSWORD
       valueFrom:
         secretKeyRef:
          name: mssql
          key: MSSQL_SA_PASSWORD
     volumeMounts:
     - name: mssql
       mountPath: "/var/opt/mssql"
     - name: userdata
       mountPath: "/var/opt/mssql/userdata"
     - name: userlog
       mountPath: "/var/opt/mssql/userlog"
     - name: tempdb
       mountPath: "/var/opt/mssql/tempdb"
     - name: mssql-config-volume
       mountPath: "/var/opt/config"
     - name: mssql-tls-configmap
       mountPath: "/var/opt/mssql/security/ca-certificates/"
   volumes: 
     - name: mssql-config-volume
       configMap:
        name: mssql
     - name: mssql-tls-configmap
       configMap:
        name: mssql-tls
 volumeClaimTemplates:
   - metadata:
      name: mssql
     spec:
      accessModes:
      - ReadWriteOnce
      resources:
       requests:
        storage: 8Gi
   - metadata:
      name: userdata
     spec:
      accessModes:
      - ReadWriteOnce
      resources:
       requests:
        storage: 8Gi
   - metadata:
      name: userlog
     spec:
      accessModes:
      - ReadWriteOnce
      resources:
       requests:
        storage: 8Gi
   - metadata:
      name: tempdb
     spec:
      accessModes:
      - ReadWriteOnce
      resources:
       requests:
        storage: 8Gi        
---
apiVersion: v1
kind: Service
metadata:
  name: mssql-0
spec:
  type: LoadBalancer
  selector:
    statefulset.kubernetes.io/pod-name: mssql-0
  ports:
  - protocol: TCP
    port: 1433
    targetPort: 1433
    name: tcpsql

# You can run the below command to deploy SQL Server conatiner based on the deployment yaml file called sqldeploy.yaml

Kubectl apply -f “C:\pbdemo\certs\sqldeploy.yaml”

# You can run the kubectl get command to see all the SQL Server container deployed, sample command and output shown below for reference

C:\>kubectl get all
NAME          READY   STATUS    RESTARTS   AGE
pod/mssql-0   1/1     Running   0          92m
 
NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
service/kubernetes   ClusterIP      10.0.0.1      <none>        443/TCP          16h
service/mssql-0      LoadBalancer   10.0.252.15   20.235.55.7   1433:31551/TCP   13s
 
NAME                     READY   AGE
statefulset.apps/mssql   1/1     97m

Connect to SQL Server - Demo time

We are ready to connect to the SQL Server, feel free to use your favorite SQL Server tool .viz SQL Server Management Studio(SSMS) or Azure Data Studio (ADS) and connect to the SQL Server and run the below commands to virtualize data and access CSV or parquet files and/or to take backups of the databases on the S3 based object storage.

I'm using the SSMS tool and running the below queries. I connect to the SQL Server and confirm that that TLS certificate public.crt is loaded by SQL Server, I can do that from the errorlog using the command : sp_readerrorlog from the T-SQL edition and the sample output is shown below with the certificate loaded:

Backup/Restore to S3-based object storage:

To take S3 based backups, I need to first create bucket on the storage in this case, I connect to the MinIO portal and create the bucket called "sqltest" as shown below and then create a credential to access the storage.

Now, use the below sample T-SQL script to create the credential and running the backup, the backup should be successful as shown below

CREATE CREDENTIAL [s3://10.0.0.82:9000/sqltest]
WITH
        IDENTITY    = 'S3 Access Key',
        SECRET      = 'amit:mypass12345';
 
BACKUP DATABASE S3_backup_demo
TO      URL = 's3://10.0.0.82:9000/sqltest/S3_backup_demo_db.bak'
WITH     STATS = 10

You can view the backup file created on the MinIO console as shown below:

Data Virtualization- Polybase

Here are the steps to configure data virtualization, below T-SQL sample script is used to configure the following steps:

We verify the polybase feature is installed, then configure and enable polybase feature for SQL Server.
Create the database scoped credential to access the object storage
Now, we create the external data source and link it with the credential that we created in the preceding step.
We are now ready to query the data from the external data source

-- Verify the Polybase feature is installed:
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
 
-- Enable polybase feature using the commmands below:
 
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
reconfigure with override
exec sp_configure @configname = 'polybase enabled'

--Next, let’s create a database and database scoped credential to access the object storage.
 
create database pb_demo
 
USE pb_demo
 
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'mypass123@';  
 
CREATE DATABASE SCOPED CREDENTIAL s3_dc WITH IDENTITY = 'S3 Access Key', SECRET = 'amit:mypass12345' ;
 
-- To verify, the credential is created you can run the below command:
SELECT * FROM sys.database_scoped_credentials;
 
-- Now go ahead and try creating the External data source:
 
CREATE EXTERNAL DATA SOURCE s3_ds WITH (LOCATION = 's3://10.0.0.82:9000/',CREDENTIAL = s3_dc)
 
-- When creating the external data source, if you see the below error, then please restart the container and run the same command again that should solve the below error
 Msg 46530, Level 16, State 11, Line 20
External data sources are not supported with type GENERIC.

--We now have the credential and the external data source created to query files on the object storage to query a CSV based file that is already present on the object storage you can run the command as shown below 

SELECT * FROM OPENROWSET( BULK '/sqltest/test.csv', FORMAT = 'CSV', DATA_SOURCE  = 's3_ds') WITH ( c1 varchar(50) ) AS   [Test1]

-- To query a parquet file that is available on the object storage you can run the below query, I copied a sample parquet file that is available here: https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet to my Minio object storage.
 
SELECT  * 
FROM OPENROWSET
(    BULK '/sqltest/userdata1.parquet'
,    FORMAT       = 'PARQUET'
,    DATA_SOURCE  = 's3_ds'
)           
AS   [Test12]

Note: You can also use Create External Table as (CETAS) to access both parquet and CSV files for SQL Server on Linux.

I hope you find this helpful to get started with polybase and S3 storage integration with SQL Server 2022 containers. Special thanks to my colleagues from the Polybase team for helping me with this demo. Happy learning!

Updated Jan 16, 2023

Version 1.0

sqlserver2022

sqlserveronlinux

amvin87

Microsoft

Joined October 08, 2020

View Profile

SQL Server Blog

Follow this blog board to get notified when there's new activity