SQL Server containers on Kubernetes with S3-compatible object storage - Getting started
Published Jan 16 2023 10:44 AM 4,006 Views
Microsoft

One of the new features introduced with SQL Server 2022 is the ability to connect to any S3-compatible object storage and SQL Server supports both Backup/Restore and data lake virtualization with Polybase integration.  In this blog, we will demonstrate both of these features for SQL Server 2022 Containers running on Kubernetes. As usual, I will use the Azure Kubernetes Service as my Kubernetes environment :).

 

To know more about these features, refer SQL Server backup to URL for S3-compatible object storage - SQL Server | Microsoft Learn and  Access external data: S3-compatible object storage - PolyBase - SQL Server | Microsoft Learn.

 

Before we enable Polybase for SQL Server containers, let's go over the prerequisites for this demo:

 

Pre-requisites:

You need to have an S3 – Object storage that you can connect to and in this blog, I have configured a MinIO based S3-object storage server on Ubuntu with TLS configured.  For details on how to configure this refer: Deploy MinIO: Single-Node Single-Drive — MinIO Object Storage for Linux and  Network Encryption (TLS) — MinIO Object Storage for Linux.

  • MinIO setting details:
    • Minio Console is configured at : https://10.0.0.82:9001/ and the Server endpoint is configured at : https://10.0.0.82:9000/
    • On the MinIO server, I have enabled ports 9000 and 9001 in the firewall.
    • The public certificate that I used to configure TLS on the MinIO server is called "public.crt," and I have it downloaded and saved on my Windows client.
  • Customized SQL Server Container:
    • Let's create the customized SQL Server container image, that'll be used to deploy the SQL Server containers on the Kubernetes cluster. Here are the steps for creating the customised SQL Server containers; for this demonstration, I created the custom container image for this demo on the same Ubuntu-based Azure VM that I used to configure the MinIO Server. 
      ## Step 1: Create a DockerFile with the below content
      # Base OS layer: Latest Ubuntu LTS
      FROM mcr.microsoft.com/mssql/server:2022-latest
      USER root
      # Install prerequistes since it is needed to get repo config for SQL server
      RUN apt-get update && \
          apt-get install -y software-properties-common && \
          rm -rf /var/lib/apt/lists/*
      RUN add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/20.04/mssql-server-2022.list)" && \
          apt-get install -y mssql-server-polybase
      EXPOSE 1433
      USER mssql
      # Run SQL Server process
      CMD ["/opt/mssql/bin/sqlservr"]​
      
      # Step 2: Let's build the container image
       sudo docker build -t mssql-pb-update .
      
      # Step 3: Tag the image and push to your container registry, in this case I am using Azure Container Registry(ACR)that I have.
       sudo docker tag sql-pb-update  amvinacr.azurecr.io/sql-pb-update:latest
       sudo docker tag mssql-pb-update  amvinacr.azurecr.io/sql-pb-update:latest
      
      # To know more about the ACR refer: https://learn.microsoft.com/en-us/azure/container-registry/container-registry-tutorial-quick-task
  • Azure Kubernetes Service based cluster creation and attaching it to the Azure Container Registry(ACR)

Let's get started

With the pre-requisties done, now we can start deploying SQL Server containers on our AKS cluster using the custom image that we created, that installs polybase. But, before we start deploying SQL Server containers, we need two configmaps and one secret object in the AKS cluster, with the details as shown below, I am running all these commands from my Windows client which has kubectl installed and is able to connect to the AKS cluster 'sqlk8s' that I created in previous steps. 

  • As the endpoint will be validated by the certificate, we will use one configmap object to load the public.crt certificate required for configuring the secure channel between SQL Server and the MinIO Storage. You can use the command below to generate the configmap from the public.crt file. I have copied the certificate public.crt to the following location on my Windows machine: "C:\pb\demo\certs\public.crt"
    # Use the below command to create the configmap using the file "public.crt"
    kubectl create configmap mssql-tls --from-file="C:\pbdemo\certs\public.crt"
    
    # You can verify that the, the configmap is created using the command below and the sample output is shown below with obfuscated lines in the output.
    
    C:\>kubectl describe configmap mssql-tls
    Name:         mssql-tls
    Namespace:    default
    Labels:       <none>
    Annotations:  <none>
     
    Data
    ====
    public.crt:
    ----
    -----BEGIN CERTIFICATE-----
    MIIDjjCCAnagAwIBAgIUf/RP+g+9uFAKD4TB0R2ot6jZMJAwDQYJKoZIhvcNAQEL
    BQAwZjELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAlZBMRIwEAYDVQQHDAlTb21ld2hl
    cmUxDjAMBgNVBAoMBU15T3JnMQ0wCwYDVQQLDARNeU9VMRcwFQYDVQQDDA5taW5p
    b3NlcnZlcnVidTAeFw0yMzAxMDYxNTA0MjhaFw0zMzAxMDMxNTA0MjhaMGYxCzAJ
    BgNVBAYTAlVTMQswCQYDVQQIDAJWQTESMBAGA1UEBwwJU29tZXdoZXJlMQ4wDAYD
    VQQKDAVNeU9yZzENMAsGA1UECwwETXlPVTEXMBUGA1UEAwwObWluaW9zZXJ2ZXJ1
    YnUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCcC0irpVFJQaVqiuKP
    LuRQUdYxehTvFCuCJLpGhS03XczCWKzemd0a3W/UC01N+tbFzpmCi0/l+R5SFsVC
    3tRPKlngtDOICnYPpKSaWz03o3BgwNqtuKtucboXp3Qo+5DSd7XDi1qY+plY6LwC
    vbjLtjNBfyWlwS6nhTjDebvUY6w0xNmG8w1H9fPZcq/gsSGUnPyvywEENYmk5R6X
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    1J/rLOJUVeKRvpbLraN25VoEoS8hz07DT3gmtVyJgpEK1oJ+mEs94ae76ujr5MGm
    M4HgV4+o9wxqjWdPyFDny/1EPsv30VqbcwmbkuPE7UHG4JH/5gxqxnXJcPnr3BQy
    4GU=
    -----END CERTIFICATE-----
    ​
  • The second configmap object is used to load mssql.conf, which contains the polybase Trace flag 13702, which is required for SQL Server on Linux. This configmap is generated using the yaml script shown below:
    # Create a yaml file called: mssql_config.yaml and save it to the location "C:\pbdemo\certs\mssql_config.yaml"
    
    kind: ConfigMap 
    apiVersion: v1 
    metadata:
      name: mssql
    data:
      mssql.conf: |
        [EULA]
        accepteula = Y
        accepteulaml = Y
        
        [coredump]
        captureminiandfull = true
        coredumptype = full
        
        [language]
        lcid = 1033
     
        [filelocation]
        defaultdatadir = /var/opt/mssql/userdata
        defaultlogdir = /var/opt/mssql/userlog
        
        [traceflag]
        traceflag0 = 13702
    
    # We are ready to create the configmap using the command as shown below:
    Kubectl apply -f “C:\pbdemo\certs\ mssql_config.yaml”
    
    # You can inspect the configmap using the command shown below with the sample output as well
    C:\>kubectl describe configmap mssql
    Name:         mssql
    Namespace:    default
    Labels:       <none>
    Annotations:  <none>
     
    Data
    ====
    mssql.conf:
    ----
    [EULA]
    accepteula = Y
    accepteulaml = Y
     
    [coredump]
    captureminiandfull = true
    coredumptype = full
     
    [language]
    lcid = 1033
     
    [filelocation]
    defaultdatadir = /var/opt/mssql/userdata
    defaultlogdir = /var/opt/mssql/userlog
     
    [traceflag]
    traceflag0 = 13702
     
    Events:  <none>
    
  • The secret object in the cluster is used to save the sa password for SQL Server, the command to create the secret is as shown below
    kubectl create secret generic mssql --from-literal=MSSQL_SA_PASSWORD="MyC0m9l&xP@ssw0rd"

Deploy SQL Server Containers 

We are now ready to deploy SQL Server containers, below is the deployment yaml called "sqldeploy.yaml" that is used to deploy one SQL Server container as a statefulset deployment using the Azure disk as the storage class.

Using this YAML we deploy:

  • One SQL Server instance, with separate PV(persistent volumes) and Persistent volume claims (PVCs) for User database, user log and tempdb files.
  • We are loading the public.crt certificate to the folder: /var/opt/mssql/security/ca-certificates folder inside the container.
  • We are loading the mssql.conf file with various parameter and the required trace flag 13702.
  • At the end, we create the external load balancer Service to be able to connect to the SQL Server.
    # Create the sqldeploy.yaml at the location "C:\pbdemo\certs\sqldeploy.yaml", and paste the content below, you can modify it according to your requirements.
    
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
         name: azure-disk
    provisioner: kubernetes.io/azure-disk
    parameters:
      storageaccounttype: Standard_LRS
      kind: Managed
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
     name: mssql
     labels:
      app: mssql
    spec:
     serviceName: "mssql"
     replicas: 1
     selector:
      matchLabels:
       app: mssql
     template:
      metadata:
       labels:
        app: mssql
      spec:
       securityContext:
         fsGroup: 10001
       containers:
       - name: mssql
         command:
           - /bin/bash
           - -c
           - cp /var/opt/config/mssql.conf /var/opt/mssql/mssql.conf && /opt/mssql/bin/sqlservr
         image: amvinacr.azurecr.io/sql-pb-update:latest
         ports:
         - containerPort: 1433
         env:
         - name: ACCEPT_EULA
           value: "Y"
         - name: MSSQL_ENABLE_HADR
           value: "1"
         - name: MSSQL_SA_PASSWORD
           valueFrom:
             secretKeyRef:
              name: mssql
              key: MSSQL_SA_PASSWORD
         volumeMounts:
         - name: mssql
           mountPath: "/var/opt/mssql"
         - name: userdata
           mountPath: "/var/opt/mssql/userdata"
         - name: userlog
           mountPath: "/var/opt/mssql/userlog"
         - name: tempdb
           mountPath: "/var/opt/mssql/tempdb"
         - name: mssql-config-volume
           mountPath: "/var/opt/config"
         - name: mssql-tls-configmap
           mountPath: "/var/opt/mssql/security/ca-certificates/"
       volumes: 
         - name: mssql-config-volume
           configMap:
            name: mssql
         - name: mssql-tls-configmap
           configMap:
            name: mssql-tls
     volumeClaimTemplates:
       - metadata:
          name: mssql
         spec:
          accessModes:
          - ReadWriteOnce
          resources:
           requests:
            storage: 8Gi
       - metadata:
          name: userdata
         spec:
          accessModes:
          - ReadWriteOnce
          resources:
           requests:
            storage: 8Gi
       - metadata:
          name: userlog
         spec:
          accessModes:
          - ReadWriteOnce
          resources:
           requests:
            storage: 8Gi
       - metadata:
          name: tempdb
         spec:
          accessModes:
          - ReadWriteOnce
          resources:
           requests:
            storage: 8Gi        
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: mssql-0
    spec:
      type: LoadBalancer
      selector:
        statefulset.kubernetes.io/pod-name: mssql-0
      ports:
      - protocol: TCP
        port: 1433
        targetPort: 1433
        name: tcpsql
    
    # You can run the below command to deploy SQL Server conatiner based on the deployment yaml file called sqldeploy.yaml
    
    Kubectl apply -f “C:\pbdemo\certs\sqldeploy.yaml”
    
    # You can run the kubectl get command to see all the SQL Server container deployed, sample command and output shown below for reference
    
    C:\>kubectl get all
    NAME          READY   STATUS    RESTARTS   AGE
    pod/mssql-0   1/1     Running   0          92m
     
    NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
    service/kubernetes   ClusterIP      10.0.0.1      <none>        443/TCP          16h
    service/mssql-0      LoadBalancer   10.0.252.15   20.235.55.7   1433:31551/TCP   13s
     
    NAME                     READY   AGE
    statefulset.apps/mssql   1/1     97m
    
    

 

Connect to SQL Server - Demo time

We are ready to connect to the SQL Server, feel free to use your favorite SQL Server tool .viz SQL Server Management Studio(SSMS) or Azure Data Studio (ADS) and connect to the SQL Server and run the below commands to virtualize data and access CSV or parquet files and/or to take backups of the databases on the S3 based object storage.

 

I'm using the SSMS tool and running the below queries. I connect to the SQL Server and confirm that that TLS certificate public.crt is loaded by SQL Server, I can do that from the errorlog using the command : sp_readerrorlog from the T-SQL edition and the sample output is shown below with the certificate loaded:

amvin87_0-1673883305525.png

 

Backup/Restore to S3-based object storage:

To take S3 based backups, I need to first create bucket on the storage in this case, I connect to the MinIO portal and create the bucket called "sqltest" as shown below and then create a credential to access the storage.

amvin87_1-1673883466691.png

 

Now, use the below sample T-SQL script to create the credential and running the backup, the backup should be successful as shown below

 

CREATE CREDENTIAL [s3://10.0.0.82:9000/sqltest]
WITH
        IDENTITY    = 'S3 Access Key',
        SECRET      = 'amit:mypass12345';
 
BACKUP DATABASE S3_backup_demo
TO      URL = 's3://10.0.0.82:9000/sqltest/S3_backup_demo_db.bak'
WITH     STATS = 10

 

amvin87_2-1673883634423.png

You can view the backup file created on the MinIO console as shown below:

amvin87_3-1673883703881.png

 

Data Virtualization- Polybase

Here are the steps to configure data virtualization, below T-SQL sample script is used to configure the following steps:

  • We verify the polybase feature is installed, then configure and enable polybase feature for SQL Server.
  • Create the database scoped credential to access the object storage
  • Now, we create the external data source and link it with the credential that we created in the preceding step.
  • We are now ready to query the data from the external data source

 

-- Verify the Polybase feature is installed:
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
 
-- Enable polybase feature using the commmands below:
 
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
reconfigure with override
exec sp_configure @configname = 'polybase enabled'

--Next, let’s create a database and database scoped credential to access the object storage.
 
create database pb_demo
 
USE pb_demo
 
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'mypass123@';  
 
CREATE DATABASE SCOPED CREDENTIAL s3_dc WITH IDENTITY = 'S3 Access Key', SECRET = 'amit:mypass12345' ;
 
-- To verify, the credential is created you can run the below command:
SELECT * FROM sys.database_scoped_credentials;
 
-- Now go ahead and try creating the External data source:
 
CREATE EXTERNAL DATA SOURCE s3_ds WITH (LOCATION = 's3://10.0.0.82:9000/',CREDENTIAL = s3_dc)
 
-- When creating the external data source, if you see the below error, then please restart the container and run the same command again that should solve the below error
 Msg 46530, Level 16, State 11, Line 20
External data sources are not supported with type GENERIC.

--We now have the credential and the external data source created to query files on the object storage to query a CSV based file that is already present on the object storage you can run the command as shown below 

SELECT * FROM OPENROWSET( BULK '/sqltest/test.csv', FORMAT = 'CSV', DATA_SOURCE  = 's3_ds') WITH ( c1 varchar(50) ) AS   [Test1]

-- To query a parquet file that is available on the object storage you can run the below query, I copied a sample parquet file that is available here: https://github.com/Teradata/kylo/tree/master/samples/sample-data/parquet to my Minio object storage.
 
SELECT  * 
FROM OPENROWSET
(    BULK '/sqltest/userdata1.parquet'
,    FORMAT       = 'PARQUET'
,    DATA_SOURCE  = 's3_ds'
)           
AS   [Test12]

 

amvin87_0-1673884206579.png

Note: You can also use Create External Table as (CETAS) to access both parquet and CSV files for SQL Server on Linux.

 

I hope you find this helpful to get started with polybase and S3 storage integration with SQL Server 2022 containers. Special thanks to my colleagues from the Polybase team for helping me with this demo. Happy learning!

 

4 Comments
Co-Authors
Version history
Last update:
‎Jan 16 2023 10:43 AM
Updated by: