terraform

6 Topics

Technical Walkthrough: Deploying a SQL DB like it's Terraform
Introduction This post will be a union of multiple topics. It is part of the SQL CI/CD series and as such will build upon Deploying .dacpacs to Multiple Environments via ADO Pipelines | Microsoft Community Hub and Managed SQL Deployments Like Terraform | Microsoft Community Hub while also crossing over with the YAML Pipeline series This is an advanced topic in regard to both Azure DevOps YAML and SQL CI/CD. If both of these concepts are newer to you, please refer to the links above as this is not designed to be a beginner's approach in either one of these domains. Assumptions To get the most out of this and follow along we are going to assume that you are 1.) On board with templating your Azure DevOps YAML Pipelines. By doing this we will see the benefit of quickly onboarding new pipelines, standardizing our deployment steps, and increasing our security. We also are going to assume you are on board with Managed SQL Deployments Like Terraform | Microsoft Community Hub for deploying your SQL Projects. By adopting this we can increase our data security, confidence in source control, and speed our time to deployment. For this post we will continue to leverage the example cicd-adventureWorks repository for the source of our SQL Project and where the pipeline definition will live. Road mapping the Templates Just like my other YAML posts let's outline the pieces required in this stage and we will then break down each job Build Stage Build .dacpac job run `dotnet build` and pass in appropriate arguments execute a Deploy Report from the dacpac produced by the build and the target environment copy the Deploy Report to the build output directory publish the pipeline artifact Deploy Stage Deploy .dacpac job run Deploy Report from dacpac artifact (optional) deploy dacpac, including pre/postscripts Build Stage For the purposes of this stage, we should think of building our .dacpac similar to a terraform or single page application build. What I am referring to is we will produce an artifact per environment, and this will be generated from the same codebase. Additionally, we will run a 'plan' which will be the proposed result of deploying our dacpac file. Build Job We will have one instance of the build job for each environment. Each instance will produce a different artifact as they will be passing different build configurations which in turn will result in a different .dacpac per environment. If you are familiar with YAML templating, then feel free to jump to the finish job template. One of the key differences with this job structure, as opposed to the one outlined in Deploying .dacpacs to Multiple Environments via ADO Pipelines is the need for a Deploy Report. This is the key to unlocking the CI/CD approach which aligns with Terraform. This Deploy Report detects our changes on build, similar to running a terraform plan. Creating a Deploy Report is achieved by setting the DeployAction attribute in the SQLAzureDacpacDeployment@1 action to 'DeployReport' Now there is one minor "bug" in the Microsoft SQLAzureDacpacDeployment task, which I have raised with the ADO task. It appears the output path for the Deploy Report as well as the Drift Report are hardcoded to the same location. To get around this I had to find out where the Deploy Report was being published and, for our purposes, have a task to copy the Deploy Report to the same location as the .dacpac and then publish them both as a single folder. Here is the code for the for a single environment to build the associated .dacpac and produce the Deploy Repo stages: - stage: adventureworksentra_build variables: - name: solutionPath value: $(Build.SourcesDirectory)// jobs: - job: build_publish_sql_sqlmoveme_dev_dev steps: - task: UseDotNet@2 displayName: Use .NET SDK vlatest inputs: packageType: 'sdk' version: '' includePreviewVersions: true - task: NuGetAuthenticate@1 displayName: 'NuGet Authenticate' - task: DotNetCoreCLI@2 displayName: dotnet build inputs: command: build projects: $(Build.SourcesDirectory)/src/sqlmoveme/*.sqlproj arguments: --configuration dev /p:NetCoreBuild=true /p:DacVersion=1.0.1 - task: SqlAzureDacpacDeployment@1 displayName: DeployReport sqlmoveme on sql-adventureworksentra-dev-cus.database.windows.net inputs: DeploymentAction: DeployReport azureSubscription: AzureDevServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-dev-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\s/src/sqlmoveme/bin/dev/sqlmoveme.dacpac AdditionalArguments: '' DeleteFirewallRule: True - task: CopyFiles@2 inputs: SourceFolder: GeneratedOutputFiles Contents: '**' TargetFolder: $(Build.SourcesDirectory)/src/sqlmoveme/bin/dev/cus - task: PublishPipelineArtifact@1 displayName: 'Publish Pipeline Artifact sqlmoveme_dev_dev ' inputs: targetPath: $(Build.SourcesDirectory)/src/sqlmoveme/bin/dev artifact: sqlmoveme_dev_dev properties: '' The end result will be similar to: (I have two environments in the screenshot below) One can see I have configured this to run a Deploy Report across each regional instance, thus the `cus` folder, of a SQL DB I do this is to identify and catch any potential schema and data issues. The Deploy Reports are the keys to tie this to the thought of deploying and managing SQL Databases like Terraform. These reports will execute when a pull request is created as part of the Build and again at Deployment to ensure changes from PR to deployment that may have occurred. For the purposes of this blog here is a deployment report indicating a schema change: This is an important artifact for organizations whose auditing policy requires documentation around deployments. This information is also available in the ADO job logs: This experience should feel similar to Terraform CI/CD...THAT'S A GOOD THING! It means we are working on developing and refining practices and principals across our tech stacks when it comes to SDLC. If this feels new to you then please read Terraform, CI/CD, Azure DevOps, and YAML Templates - John Folberth Deploy Stage We will have a deploy stage for each environment and within that stage will be a job for each region and/or database we are deploying our dacpac to. This job can be a template because, in theory, our deploying process across environments is identical. We will run a deployment report and deploy the .dacpac which was built for the specific environment and will include any and all associated pre/post scripts. Again this process has already been walked through in Deploying .dacpacs to Multiple Environments via ADO Pipelines | Microsoft Community Hub Deploy Job The deploy job will take what we built in the deployment process in Deploying .dacpacs to Multiple Environments via ADO Pipelines | Microsoft Community Hub and we will add a perquisite job to create a second Deployment Report. This process is to ensure we are aware of any changes in the deployed SQL Database that may have occurred after the original dacpac and Deployment Report were created at the time of the Pull Request. By doing this we now have a tight log identifying any changes that were being made right before we deployed the code. Next, we need to make a few changes to override the default arguments of the .dacpac publish command in order to automatically deploy changes that may result in data loss. Here is a complete list of all the available properties SqlPackage Publish - SQL Server | Microsoft Learn. The ones we are most interested in are DropObjectsNotInSource and BlockOnPossibleDataLoss. DropObjectsNotInSource is defined as: Specifies whether objects that do not exist in the database snapshot (.dacpac) file will be dropped from the target database when you publish to a database. This value takes precedence over DropExtendedProperties. This is important as it will drop and delete objects that are not defined in our source code. As I've written about previously this will drop all those instances of "Shadow Data" or copies of tables we were storing. This value, by default, is set to false as a safeguard from a destructive data action. Our intention though is to ensure our deployed database objects match our definitions in source control, as such we want to enable this. BlockOnPossibleDataLoss is defined as: Specifies that the operation will be terminated during the schema validation step if the resulting schema changes could incur a loss of data, including due to data precision reduction or a data type change that requires a cast operation. The default (True) value causes the operation to terminate regardless if the target database contains data. An execution with a False value for BlockOnPossibleDataLoss can still fail during deployment plan execution if data is present on the target that cannot be converted to the new column type. This is another safeguard that has been put in place to ensure data isn't lost in the situation of type conversion or schema changes such as dropping a column. We want this set to `true` so that our deployment will actually deploy in an automated fashion. If this is set to `false` and we are wanting to update schemas/columns then we would be creating an anti-pattern of a manual deployment to accommodate. When possible, we want to automate our deployments and in this specific case we have already taken the steps of mitigating unintentional data loss through our implementation of a Deploy Report. Again, we should have confidence in our deployment and if we have this then we should be able to automate it. Here is that same deployment process, including now the Deploy Report steps: - stage: adventureworksentra_dev_cus_dacpac_deploy jobs: - deployment: adventureworksentra_app_dev_cus environment: name: dev dependsOn: [] strategy: runOnce: deploy: steps: - task: SqlAzureDacpacDeployment@1 displayName: DeployReport sqlmoveme on sql-adventureworksentra-dev-cus.database.windows.net inputs: DeploymentAction: DeployReport azureSubscription: AzureDevServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-dev-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\sqlmoveme_dev_dev\**\*.dacpac AdditionalArguments: '' DeleteFirewallRule: False - task: CopyFiles@2 inputs: SourceFolder: GeneratedOutputFiles Contents: '**' TargetFolder: postDeploy/sql-adventureworksentra-dev-cus.database.windows.net/sqlmoveme - task: SqlAzureDacpacDeployment@1 displayName: Publish sqlmoveme on sql-adventureworksentra-dev-cus.database.windows.net inputs: DeploymentAction: Publish azureSubscription: AzureDevServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-dev-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\sqlmoveme_dev_dev\**\*.dacpac AdditionalArguments: /p:DropObjectsNotInSource=true /p:BlockOnPossibleDataLoss=false DeleteFirewallRule: True Putting it Together Let's put together all these pieces. This example will show an expanded pipeline that has the following stages and jobs Build a stage Build Dev job Build Tst job Deploy Dev stage Deploy Dev Job Deploy tst stage Deploy tst Job And here is the code: resources: repositories: - repository: templates type: github name: JFolberth/TheYAMLPipelineOne endpoint: JFolberth trigger: branches: include: - none pool: vmImage: 'windows-latest' parameters: - name: projectNamesConfigurations type: object default: - projectName: 'sqlmoveme' environmentName: 'dev' regionAbrvs: - 'cus' projectExtension: '.sqlproj' buildArguments: '/p:NetCoreBuild=true /p:DacVersion=1.0.1' sqlServerName: 'adventureworksentra' sqlDatabaseName: 'moveme' resourceGroupName: adventureworksentra ipDetectionMethod: 'AutoDetect' deployType: 'DacpacTask' authenticationType: 'servicePrincipal' buildConfiguration: 'dev' dacpacAdditionalArguments: '/p:DropObjectsNotInSource=true /p:BlockOnPossibleDataLoss=false' - projectName: 'sqlmoveme' environmentName: 'tst' regionAbrvs: - 'cus' projectExtension: '.sqlproj' buildArguments: '/p:NetCoreBuild=true /p:DacVersion=1.0' sqlServerName: 'adventureworksentra' sqlDatabaseName: 'moveme' resourceGroupName: adventureworksentra ipDetectionMethod: 'AutoDetect' deployType: 'DacpacTask' authenticationType: 'servicePrincipal' buildConfiguration: 'tst' dacpacAdditionalArguments: '/p:DropObjectsNotInSource=true /p:BlockOnPossibleDataLoss=false' - name: serviceName type: string default: 'adventureworksentra' stages: - stage: adventureworksentra_build variables: - name: solutionPath value: $(Build.SourcesDirectory)// jobs: - job: build_publish_sql_sqlmoveme_dev_dev steps: - task: UseDotNet@2 displayName: Use .NET SDK vlatest inputs: packageType: 'sdk' version: '' includePreviewVersions: true - task: NuGetAuthenticate@1 displayName: 'NuGet Authenticate' - task: DotNetCoreCLI@2 displayName: dotnet build inputs: command: build projects: $(Build.SourcesDirectory)/src/sqlmoveme/*.sqlproj arguments: --configuration dev /p:NetCoreBuild=true /p:DacVersion=1.0.1 - task: SqlAzureDacpacDeployment@1 displayName: DeployReport sqlmoveme on sql-adventureworksentra-dev-cus.database.windows.net inputs: DeploymentAction: DeployReport azureSubscription: AzureDevServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-dev-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\s/src/sqlmoveme/bin/dev/sqlmoveme.dacpac AdditionalArguments: '' DeleteFirewallRule: True - task: CopyFiles@2 inputs: SourceFolder: GeneratedOutputFiles Contents: '**' TargetFolder: $(Build.SourcesDirectory)/src/sqlmoveme/bin/dev/cus - task: PublishPipelineArtifact@1 displayName: 'Publish Pipeline Artifact sqlmoveme_dev_dev ' inputs: targetPath: $(Build.SourcesDirectory)/src/sqlmoveme/bin/dev artifact: sqlmoveme_dev_dev properties: '' - job: build_publish_sql_sqlmoveme_tst_tst steps: - task: UseDotNet@2 displayName: Use .NET SDK vlatest inputs: packageType: 'sdk' version: '' includePreviewVersions: true - task: NuGetAuthenticate@1 displayName: 'NuGet Authenticate' - task: DotNetCoreCLI@2 displayName: dotnet build inputs: command: build projects: $(Build.SourcesDirectory)/src/sqlmoveme/*.sqlproj arguments: --configuration tst /p:NetCoreBuild=true /p:DacVersion=1.0 - task: SqlAzureDacpacDeployment@1 displayName: DeployReport sqlmoveme on sql-adventureworksentra-tst-cus.database.windows.net inputs: DeploymentAction: DeployReport azureSubscription: AzureTstServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-tst-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\s/src/sqlmoveme/bin/tst/sqlmoveme.dacpac AdditionalArguments: '' DeleteFirewallRule: True - task: CopyFiles@2 inputs: SourceFolder: GeneratedOutputFiles Contents: '**' TargetFolder: $(Build.SourcesDirectory)/src/sqlmoveme/bin/tst/cus - task: PublishPipelineArtifact@1 displayName: 'Publish Pipeline Artifact sqlmoveme_tst_tst ' inputs: targetPath: $(Build.SourcesDirectory)/src/sqlmoveme/bin/tst artifact: sqlmoveme_tst_tst properties: '' - stage: adventureworksentra_dev_cus_dacpac_deploy jobs: - deployment: adventureworksentra_app_dev_cus environment: name: dev dependsOn: [] strategy: runOnce: deploy: steps: - task: SqlAzureDacpacDeployment@1 displayName: DeployReport sqlmoveme on sql-adventureworksentra-dev-cus.database.windows.net inputs: DeploymentAction: DeployReport azureSubscription: AzureDevServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-dev-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\sqlmoveme_dev_dev\**\*.dacpac AdditionalArguments: '' DeleteFirewallRule: False - task: CopyFiles@2 inputs: SourceFolder: GeneratedOutputFiles Contents: '**' TargetFolder: postDeploy/sql-adventureworksentra-dev-cus.database.windows.net/sqlmoveme - task: SqlAzureDacpacDeployment@1 displayName: Publish sqlmoveme on sql-adventureworksentra-dev-cus.database.windows.net inputs: DeploymentAction: Publish azureSubscription: AzureDevServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-dev-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\sqlmoveme_dev_dev\**\*.dacpac AdditionalArguments: /p:DropObjectsNotInSource=true /p:BlockOnPossibleDataLoss=false DeleteFirewallRule: True - stage: adventureworksentra_tst_cus_dacpac_deploy jobs: - deployment: adventureworksentra_app_tst_cus environment: name: tst dependsOn: [] strategy: runOnce: deploy: steps: - task: SqlAzureDacpacDeployment@1 displayName: DeployReport sqlmoveme on sql-adventureworksentra-tst-cus.database.windows.net inputs: DeploymentAction: DeployReport azureSubscription: AzureTstServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-tst-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\sqlmoveme_tst_tst\**\*.dacpac AdditionalArguments: '' DeleteFirewallRule: False - task: CopyFiles@2 inputs: SourceFolder: GeneratedOutputFiles Contents: '**' TargetFolder: postDeploy/sql-adventureworksentra-tst-cus.database.windows.net/sqlmoveme - task: SqlAzureDacpacDeployment@1 displayName: Publish sqlmoveme on sql-adventureworksentra-tst-cus.database.windows.net inputs: DeploymentAction: Publish azureSubscription: AzureTstServiceConnection AuthenticationType: servicePrincipal ServerName: sql-adventureworksentra-tst-cus.database.windows.net DatabaseName: sqlmoveme deployType: DacpacTask DacpacFile: $(Agent.BuildDirectory)\sqlmoveme_tst_tst\**\*.dacpac AdditionalArguments: /p:DropObjectsNotInSource=true /p:BlockOnPossibleDataLoss=false DeleteFirewallRule: True In ADO it will look like: We can see the important Deploy Report being created and can confirm that there are Deploy Reports for each environment/region combination: Conclusion With the inclusion of deploy reports we now have the ability to create Azure SQL Deployments that adhere to modern DevOps approaches. We can ensure our environments will be in sync with how we have defined them in source control. By doing this we achieve a higher level of security, confidence in our code, and reduction in shadow data. To learn more on these approaches with SQL Deployments be sure to check out my other blog articles on the topic "SQL Database Series" in "Healthcare and Life Sciences Blog" | Microsoft Community Hub and be sure to follow me on LinkedIn
j_folberth
Mar 04, 2025 Place Healthcare and Life Sciences Blog
2.6KViews
1like
0Comments
Managed SQL Deployments Like Terraform
Introduction This is the next post in our series on CI/CD for SQL projects. In this post we will challenge some long held beliefs on how we should manage SQL Deployments. Traditionally we've always had this notion that we should never drop data in any environment. Deployments should almost extensively be done via SQL scripts and manually ran to ensure completion and to prevent any type of data loss. We will challenge this and propose a solution that falls more in line with other modern DevOps tooling and practices. If this sounds appealing to you then let's dive into it. Why We've always approached the data behind our applications as the differentiating factor when it comes to Intellectual Property (IP). No one wants to hear the words that we've lost data or that the data is unrecoverable. Let me be clear and throw a disclaimer on what I am going to propose, this is not a substitute for proper data management techniques to prevent data loss. Rather we are going to look at a way to thread the needle on keeping the data that we need while removing the data that we don't. Shadow Data We've all heard about "shadow IT", well what about "shadow data"? I think every developer has been there. For example, taking a backup of a table/database to ensure we don't inadvertently drop it during a deployment. Heck sometimes we may even go a step further and backup this up into a lower environment. The caveat is that we very rarely ever go back and clean up that backup. We've effectively created a snapshot of data which we kept for our own comfort. This copy is now in an ungoverned, unmanaged, and potentially in an insecure state. This issue then gets compounded if we have automated backups or restore to QA operations. Now we keep amplifying and spreading our shadow data. Shouldn't we focus on improving the Software Delivery Lifecycle (SDLC), ensuring confidence in our data deployments? Let's take it a step further and shouldn't we invest in our data protection practice? Why should we be doing this when we have technology that backs up our SQL schema and databases? Another consideration, what about those quick "hot fixes" that we applied in production. The ones where we changed a varchar() column length to accommodate the size of a field in production. I am not advocating for making these changes in production...but when your CIO or VP is escalating since this is holding up your businesses Data Warehouse and you so happen to have the SQL admin login credentials...stuff happens. Wouldn't it be nice if SQL had a way to report back that this change needs to be accommodated for in the source schema? Again, the answer is in our SDLC process. So, where is the book of record for our SQL schemas? Well, if this is your first read in this series or if you are unfamiliar with source control I'd encourage you to read Leveraging DotNet for SQL Builds via YAML | Microsoft Community Hub where I talk about the importance of placing your database projects under source control. The TL/DR...your database schema definitions should be defined under source control, ideally as a .sqlproj. Where Terraform Comes In At this point I've already pointed out a few instances on how our production database instance can differ from what we have defined in our source project. This certainly isn't anything new in software development. So how does other software development tooling and technologies account for this? Generally, application code simply gets overwritten, and we have backup versions either via release branches, git tags, or other artifacts. Cloud infrastructure can be defined as Infrastructure as Code (IaC) and as such still follow something similar to our application code workflow. There are two main flavors of IaC for Azure: Bicep/ARM and Terraform. Bicep/ARM adheres to an incremental deployment, which has its pros and cons. The quick version is Azure Resource Manager (ARM) deployments will not delete resources that are not defined in its template. Part of this has led to Azure Deployment Stacks which can help enforce resource deletion when it's been removed from a template. If interested in understanding a Terraform workflow I will point you to one of my other posts on the topic. At a high level Terraform evaluates your IaC definition and determines what properties need to be updated, and more importantly, what resources need to be removed. Now how does Terraform do this and more importantly, how can we tell what properties will be updated and/or removed? Terraform has a concept known as a plan. This plan will run your deployment against what is known as the state file, in Bicep/ARM this is the Deployment Stack, and produce a summary of changes that will occur. This includes new resources to be created, modification of existing resources, and deletion of resources previously deployed to the same state file. Typically, I recommend running a Terraform plan across all environments at CI. This ensures one can evaluate changes being proposed across all potential environments and summarize these changes at the time of the Pull Request (PR). I then advise re-executing this plan prior to deployment as a way to confirm/re-evaluate if anything has been updated since the original plan ran. Some will argue the previous plan can be "approved" to deploy to the next environment; however, there is little overhead in running a second plan and I prefer this option. Here's the thing....SQL actually has this same functionality. Deploy Reports Via SqlPackage there is additional functionality we can leverage with our .dacpacs. We are going to dive a little more on Deploy Reports. If you have followed this series, you may know we use the SqlPackage Publish command wrapped behind the SqlAzureDacpacDeployment@1 task. More information on this can be found at Deploying .dapacs to Azure SQL via Azure DevOps Pipelines | Microsoft Community Hub . So, what is a Deploy Report? A Deploy Report is the XML representation of the changes your .dacpac will make to a database. Here is an example of one denoting that there is a risk of potential data loss: This report is the key to our whole argument for modeling a SQL Continuous Integration/Continous Delivery after one that Terraform uses. We already will have a separate .dacpac file, built from the same .sqlproj, for each environment when leveraging pre/post scripts as we saw in Deploying .dacpacs to Multiple Environments via ADO Pipelines | Microsoft Community Hub. So now we need to take each one of those and run a Deploy Report against the appropriate target. This is the same as effectively running a `tf plan` with a different variable file against each environment to determine what actions a Terraform `apply` will execute. These Deploy Reports are then what we will include in our PR approval to validate and approve any changes we will make to our SQL database. Dropping What's Not in Source Control This is the controversial part and the biggest sell in our adoption of a Terraform like approach to SQL deployments. It has long been considered a best practice to have whatever is deployed match what is under source control. This provides for a consistent experience when developing and then deploying across multiple environments. Within IaC, we have our cloud infrastructure defined in source control and deployed across environments. Typically, it is seen as a good practice to delete resources which have been removed from source control. This helps simplify the environment, reduces cost, and reduces potential security surface areas. So why not the same for databases? Typically, it is due to us having the fear of losing data. To prevent this, we should have proper data protection and recovery processes in place. Again, I am not addressing that aspect. If we have those accounted for, then by all means, our source control version of our databases should match our deployed environments. What about security and indexing? Again, this can be accounted for in Deploying .dacpacs to Multiple Environments via ADO Pipelines | Microsoft Community Hub. Where we have two different post deployment security scripts, and these scripts are under source control! How can we see if data loss will occur? Refer back to the Deploy Reports for this! There potentially is some natural hesitation as the default method for deploying a .dacpac has safeguards to prevent deployments in the event of potential data loss. This is not a bad thing as it prevents a destructive activity from automatically occurring; however, we by no means need to accept the default behavior. We will need to refer to SqlPackage Publish - SQL Server | Microsoft Learn. From this list we will be able to identify and explicitly set the value for various parameters. These will enable our package to deploy even in the event of potential data loss. Conclusion This post hopefully challenges the mindset we have when it comes to database deployments. By taking an approach that more closely relates to modern DevOps practices, we can gain confidence that our source control and database match, increased reliability and speed with our deployments, and we are closing potential security gaps in our database deployment lifecycle. This content was not designed to be technical. In our next post we will demo, provide examples, and talk through how to leverage YAML Pipelines to accomplish what we have outlined here. Be sure to follow me on LinkedIn for the latest publications. For those who are technically sound and want to skip ahead feel free to check out my code on my GitHub : https://github.com/JFolberth/cicd-adventureWorks and https://github.com/JFolberth/TheYAMLPipelineOne
j_folberth
Jan 29, 2025 Place Healthcare and Life Sciences Blog
840Views
0likes
0Comments
Deploy Kaito on AKS using Terraform
The Kubernetes AI toolchain operator (Kaito) is a Kubernetes operator that simplifies the experience of running OSS AI models like Falcon and Llama2 on your AKS cluster. You can deploy Kaito on your AKS cluster as a managed add-on for Azure Kubernetes Service (AKS). The Kubernetes AI toolchain operator (Kaito) uses Karpenter to automatically provision the necessary GPU nodes based on a specification provided in the Workspace custom resource definition (CRD) and sets up the inference server as an endpoint for your AI models. This add-on reduces onboarding time and allows you to focus on AI model usage and development rather than infrastructure setup. In this project, I will show you how to: Deploy the Kubernetes AI Toolchain Operator (Kaito) and a Workspace on Azure Kubernetes Service (AKS) using Terraform. Utilize Kaito to create an AKS-hosted inference environment for the Falcon 7B Instruct model. Develop a chat application using Python and Chainlit that interacts with the inference endpoint exposed by the AKS-hosted model. By following this guide, you will be able to easily set up and use the powerful capabilities of Kaito, Python, and Chainlit to enhance your AI model deployment and create dynamic chat applications. For more information on Kaito, see the following resources: Kubernetes AI Toolchain Operator (Kaito) Deploy an AI model on Azure Kubernetes Service (AKS) with the AI toolchain operator Intelligent Apps on AKS Ep02: Bring Your Own AI Models to Intelligent Apps on AKS with Kaito Open Source Models on AKS with Kaito The companion code for this article can be found in this GitHub repository. NOTE This article provides information on the Kubernetes AI Toolchain (Kaito) operator, which is currently in the early stages of development and undergoing frequent updates. Please note that the content of this article is applicable to Kaito version 0.2.0. It is advised to regularly check for the latest updates and changes in subsequent versions of Kaito. NOTE You can find the architecture.vsdx file used for the diagram under the visio folder. Prerequisites An active Azure subscription. If you don't have one, create a free Azure account before you begin. Visual Studio Code installed on one of the supported platforms along with the HashiCorp Terraform. Azure CLI version 2.59.0 or later installed. To install or upgrade, see Install Azure CLI. aks-preview Azure CLI extension of version 2.0.0b8 or later installed Terraform v1.7.5 or later. The deployment must be started by a user who has sufficient permissions to assign roles, such as a User Access Administrator or Owner . Your Azure account also needs Microsoft.Resources/deployments/write permissions at the subscription level. Architecture The following diagram shows the architecture and network topology deployed by the sample: This project provides a set of Terraform modules to deploy the following resources: Azure Kubernetes Service: A public or private Azure Kubernetes Service(AKS) cluster composed of a: A system node pool in a dedicated subnet. The default node pool hosts only critical system pods and services. The worker nodes have node taint which prevents application pods from beings scheduled on this node pool. A user node pool hosting user workloads and artifacts in a dedicated subnet. User-defined Managed Identity: a user-defined managed identity used by the AKS cluster to create additional resources like load balancers and managed disks in Azure. Azure Virtual Machine: Terraform modules can optionally create a jump-box virtual machine to manage the private AKS cluster. Azure Bastion Host: a separate Azure Bastion is deployed in the AKS cluster virtual network to provide SSH connectivity to both agent nodes and virtual machines. Azure NAT Gateway: a bring-your-own (BYO) Azure NAT Gateway to manage outbound connections initiated by AKS-hosted workloads. The NAT Gateway is associated to the SystemSubnet , UserSubnet , and PodSubnet subnets. The outboundType property of the cluster is set to userAssignedNatGateway to specify that a BYO NAT Gateway is used for outbound connections. NOTE: you can update the outboundType after cluster creation and this will deploy or remove resources as required to put the cluster into the new egress configuration. For more information, see Updating outboundType after cluster creation. Azure Storage Account: this storage account is used to store the boot diagnostics logs of both the service provider and service consumer virtual machines. Boot Diagnostics is a debugging feature that allows you to view console output and screenshots to diagnose virtual machine status. Azure Container Registry: an Azure Container Registry (ACR) to build, store, and manage container images and artifacts in a private registry for all container deployments. Azure Key Vault: an Azure Key Vault used to store secrets, certificates, and keys that can be mounted as files by pods using Azure Key Vault Provider for Secrets Store CSI Driver. For more information, see Use the Azure Key Vault Provider for Secrets Store CSI Driver in an AKS cluster and Provide an identity to access the Azure Key Vault Provider for Secrets Store CSI Driver. Azure Private Endpoints: an Azure Private Endpoint is created for each of the following resources: Azure Container Registry Azure Key Vault Azure Storage Account API Server when deploying a private AKS cluster. Azure Private DNDS Zones: an Azure Private DNS Zone is created for each of the following resources: Azure Container Registry Azure Key Vault Azure Storage Account API Server when deploying a private AKS cluster. Azure Network Security Group: subnets hosting virtual machines and Azure Bastion Hosts are protected by Azure Network Security Groups that are used to filter inbound and outbound traffic. Azure Log Analytics Workspace: a centralized Azure Log Analytics workspace is used to collect the diagnostics logs and metrics from all the Azure resources: Azure Kubernetes Service cluster Azure Key Vault Azure Network Security Group Azure Container Registry Azure Storage Account Azure jump-box virtual machine Azure Monitor workspace: An Azure Monitor workspace is a unique environment for data collected by Azure Monitor. Each workspace has its own data repository, configuration, and permissions. Log Analytics workspaces contain logs and metrics data from multiple Azure resources, whereas Azure Monitor workspaces currently contain only metrics related to Prometheus. Azure Monitor managed service for Prometheus allows you to collect and analyze metrics at scale using a Prometheus-compatible monitoring solution, based on the Prometheus. This fully managed service allows you to use the Prometheus query language (PromQL) to analyze and alert on the performance of monitored infrastructure and workloads without having to operate the underlying infrastructure. The primary method for visualizing Prometheus metrics is Azure Managed Grafana. You can connect your Azure Monitor workspace to an Azure Managed Grafana to visualize Prometheus metrics using a set of built-in and custom Grafana dashboards. Azure Managed Grafana: an Azure Managed Grafana instance used to visualize the Prometheus metrics generated by the Azure Kubernetes Service(AKS) cluster deployed by the Bicep modules. Azure Managed Grafana is a fully managed service for analytics and monitoring solutions. It's supported by Grafana Enterprise, which provides extensible data visualizations. This managed service allows to quickly and easily deploy Grafana dashboards with built-in high availability and control access with Azure security. NGINX Ingress Controller: this sample compares the managed and unmanaged NGINX Ingress Controller. While the managed version is installed using the Application routing add-on, the unmanaged version is deployed using the Helm Terraform Provider. You can use the Helm provider to deploy software packages in Kubernetes. The provider needs to be configured with the proper credentials before it can be used. Cert-Manager: the cert-manager package and Let's Encrypt certificate authority are used to issue a TLS/SSL certificate to the chat applications. Prometheus: the AKS cluster is configured to collect metrics to the Azure Monitor workspace and Azure Managed Grafana. Nonetheless, the kube-prometheus-stack Helm chart is used to install Prometheus and Grafana on the AKS cluster. Kaito Workspace: a Kaito workspace is used to create a GPU node and the Falcon 7B Instruct model. Workload namespace and service account: the Kubectl Terraform Provider and Kubernetes Terraform Provider are used to create the namespace and service account used by the chat applications. Azure Monitor ConfigMaps for Azure Monitor managed service for Prometheus and cert-manager Cluster Issuer are deployed using the Kubectl Terraform Provider and Kubernetes Terraform Provider.` The architecture of the kaito-chat application can be seen in the image below. The application calls the inference endpoint created by the Kaito workspace for the Falcon-7B-Instruct model. Kaito The Kubernetes AI toolchain operator (Kaito) is a managed add-on for AKS that simplifies the experience of running OSS AI models on your AKS clusters. The AI toolchain operator automatically provisions the necessary GPU nodes and sets up the associated inference server as an endpoint server to your AI models. Using this add-on reduces your onboarding time and enables you to focus on AI model usage and development rather than infrastructure setup. Key Features Container Image Management: Kaito allows you to manage large language models using container images. It provides an HTTP server to perform inference calls using the model library. GPU Hardware Configuration: Kaito eliminates the need for manual tuning of deployment parameters to fit GPU hardware. It provides preset configurations that are automatically applied based on the model requirements. Auto-provisioning of GPU Nodes: Kaito automatically provisions GPU nodes based on the requirements of your models. This ensures that your AI inference workloads have the necessary resources to run efficiently. Integration with Microsoft Container Registry: If the license allows, Kaito can host large language model images in the public Microsoft Container Registry (MCR). This simplifies the process of accessing and deploying the models. Architecture Overview Kaito follows the classic Kubernetes Custom Resource Definition (CRD)/controller design pattern. The user manages a workspace custom resource that describes the GPU requirements and the inference specification. Kaito controllers automate the deployment by reconciling the workspace custom resource. The major components of Kaito include: Workspace Controller: This controller reconciles the workspace custom resource, creates machine custom resources to trigger node auto-provisioning, and creates the inference workload (deployment or statefulset) based on the model preset configurations. Node Provisioner Controller: This controller, named gpu-provisioner in the Kaito Helm chart, interacts with the workspace controller using the machine CRD from Karpenter. It integrates with Azure Kubernetes Service (AKS) APIs to add new GPU nodes to the AKS cluster. Note that the gpu-provisioner is an open-source component maintained in the Kaito repository and can be replaced by other controllers supporting Karpenter-core APIs. Using Kaito greatly simplifies the workflow of onboarding large AI inference models into Kubernetes, allowing you to focus on AI model usage and development without the hassle of infrastructure setup. Benefits There are some significant benefits of running open source LLMs with Kaito. Some advantages include: Automated GPU node provisioning and configuration: Kaito will automatically provision and configure GPU nodes for you. This can help reduce the operational burden of managing GPU nodes, configuring them for Kubernetes, and tuning model deployment parameters to fit GPU profiles. Reduced cost: Kaito can help you save money by splitting inferencing across lower end GPU nodes which may also be more readily available and cost less than high-end GPU nodes. Support for popular open-source LLMs: Kaito offers preset configurations for popular open-source LLMs. This can help you deploy and manage open-source LLMs on AKS and integrate them with your intelligent applications. Fine-grained control: You can have full control over data security and privacy, model development and configuration transparency, and the ability to fine-tune the model to fit your specific use case. Network and data security: You can ensure these models are ring-fenced within your organization's network and/or ensure the data never leaves the Kubernetes cluster. Models At the time of this writing, Kaito supports the following models. Llama 2 Meta released Llama 2, a set of pretrained and refined LLMs, along with Llama 2-Chat, a version of Llama 2. These models are scalable up to 70 billion parameters. It was discovered after extensive testing on safety and helpfulness-focused benchmarks that Llama 2-Chat models perform better than current open-source models in most cases. Human evaluations have shown that they align well with several closed-source models. The researchers have even taken a few steps to guarantee the security of these models. This includes annotating data, especially for safety, conducting red-teaming exercises, fine-tuning models with an emphasis on safety issues, and iteratively and continuously reviewing the models. Variants of Llama 2 with 7 billion, 13 billion, and 70 billion parameters have also been released. Llama 2-Chat, optimized for dialogue scenarios, has also been released in variants with the same parameter scales. For more information, see the following resources: Llama 2: Open Foundation and Fine-Tuned Chat Models Llama 2 Project Falcon Researchers from Technology Innovation Institute, Abu Dhabi introduced the Falcon series, which includes models with 7 billion, 40 billion, and 180 billion parameters. These models, which are intended to be causal decoder-only models, were trained on a high-quality, varied corpus that was mostly obtained from online data. Falcon-180B, the largest model in the series, is the only publicly available pretraining run ever, having been trained on a dataset of more than 3.5 trillion text tokens. The researchers discovered that Falcon-180B shows great advancements over other models, including PaLM or Chinchilla. It outperforms models that are being developed concurrently, such as LLaMA 2 or Inflection-1. Falcon-180B achieves performance close to PaLM-2-Large, which is noteworthy given its lower pretraining and inference costs. With this ranking, Falcon-180B joins GPT-4 and PaLM-2-Large as the leading language models in the world. For more information, see the following resources: The Falcon Series of Open Language Models Falcon-40B-Instruct Falcon-180B Falcon-7B Falcon-7B-Instruct Mistral Mistral 7B v0.1 is a cutting-edge 7-billion-parameter language model that has been developed for remarkable effectiveness and performance. Mistral 7B breaks all previous records, outperforming Llama 2 13B in every benchmark and even Llama 1 34B in crucial domains like logic, math, and coding. State-of-the-art methods like grouped-query attention (GQA) have been used to accelerate inference and sliding window attention (SWA) to efficiently handle sequences with different lengths while reducing computing overhead. A customized version, Mistral 7B — Instruct, has also been provided and optimized to perform exceptionally well in activities requiring following instructions. For more information, see the following resources: Mistral-7B-Instruct Mistral-7B Phi-2 Microsoft introduced Phi-2, which is a Transformer model with 2.7 billion parameters. It was trained using a combination of data sources similar to Phi-1.5. It also integrates a new data source, which consists of NLP synthetic texts and filtered websites that are considered instructional and safe. Examining Phi-2 against benchmarks measuring logical thinking, language comprehension, and common sense showed that it performed almost at the state-of-the-art level among models with less than 13 billion parameters. For more information, see the following resources: Phi-2 Chainlit Chainlit is an open-source Python package that is specifically designed to create user interfaces (UIs) for AI applications. It simplifies the process of building interactive chats and interfaces, making developing AI-powered applications faster and more efficient. While Streamlit is a general-purpose UI library, Chainlit is purpose-built for AI applications and seamlessly integrates with other AI technologies such as LangChain, LlamaIndex, and LangFlow. With Chainlit, developers can easily create intuitive UIs for their AI models, including ChatGPT-like applications. It provides a user-friendly interface for users to interact with AI models, enabling conversational experiences and information retrieval. Chainlit also offers unique features, such as the ability to display the Chain of Thought, which allows users to explore the reasoning process directly within the UI. This feature enhances transparency and enables users to understand how the AI arrives at its responses or recommendations. For more information, see the following resources: Documentation Examples API Reference Cookbook Deploy Kaito using Azure CLI As stated in the documentation, enabling the Kubernetes AI toolchain operator add-on in AKS creates a managed identity named ai-toolchain-operator-<aks-cluster-name> . This managed identity is utilized by the GPU provisioner controller to provision GPU node pools within the managed AKS cluster via Karpenter. To ensure proper functionality, manual configuration of the necessary permissions is required. Follow the steps outlined in the following sections to successfully install Kaito through the AKS add-on. Register the AIToolchainOperatorPreview feature flag using the az feature register command. It takes a few minutes for the registration to complete. az feature register --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview" Verify the registration using the az feature show command. az feature show --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview" Create an Azure resource group using the az group create command. az group create --name ${AZURE_RESOURCE_GROUP} --location $AZURE_LOCATION Create an AKS cluster with the AI toolchain operator add-on enabled using the az aks create command with the --enable-ai-toolchain-operator and --enable-oidc-issuer flags. az aks create --location $AZURE_LOCATION \ --resource-group $AZURE_RESOURCE_GROUP \ --name ${CLUSTER_NAME} \ --enable-oidc-issuer \ --enable-ai-toolchain-operator AI toolchain operator enablement requires the enablement of OIDC issuer. On an existing AKS cluster, you can enable the AI toolchain operator add-on using the az aks update command as follows: az aks update --name ${CLUSTER_NAME} \ --resource-group ${AZURE_RESOURCE_GROUP} \ --enable-oidc-issuer \ --enable-ai-toolchain-operator Configure kubectl to connect to your cluster using the az aks get-credentials command. az aks get-credentials --resource-group $AZURE_RESOURCE_GROUP --name $CLUSTER_NAME Export environment variables for the MC resource group, principal ID identity, and Kaito identity using the following commands: export MC_RESOURCE_GROUP=$(az aks show --resource-group $AZURE_RESOURCE_GROUP \ --name $CLUSTER_NAME \ --query nodeResourceGroup \ -o tsv) export PRINCIPAL_ID=$(az identity show --name "ai-toolchain-operator-$CLUSTER_NAME" \ --resource-group $MC_RESOURCE_GROUP \ --query 'principalId' \ -o tsv) export KAITO_IDENTITY_NAME="ai-toolchain-operator-${CLUSTER_NAME,,}" Get the AKS OIDC Issuer URL and export it as an environment variable: export AKS_OIDC_ISSUER=$(az aks show --resource-group "${AZURE_RESOURCE_GROUP}" \ --name "${CLUSTER_NAME}" \ --query "oidcIssuerProfile.issuerUrl" \ -o tsv) Create a new role assignment for the service principal using the az role assignment create command. The Kaito user-assigned managed identity needs the Contributor role on the resource group containing the AKS cluster. az role assignment create --role "Contributor" \ --assignee $PRINCIPAL_ID \ --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID/resourcegroups/$AZURE_RESOURCE_GROUP" Create a federated identity credential between the KAITO managed identity and the service account used by KAITO controllers using the az identity federated-credential create command. az identity federated-credential create --name "Kaito-federated-identity" \ --identity-name "${KAITO_IDENTITY_NAME}" \ -g "${MC_RESOURCE_GROUP}" \ --issuer "${AKS_OIDC_ISSUER}" \ --subject system:serviceaccount:"kube-system:Kaito-gpu-provisioner" \ --audience api://AzureADTokenExchange Verify that the deployment is running using the kubectl get command: kubectl get deployment -n kube-system | grep Kaito Deploy the Falcon 7B-instruct model from the Kaito model repository using the kubectl apply command. kubectl apply -f https://raw.githubusercontent.com/Azure/Kaito/main/examples/Kaito_workspace_falcon_7b-instruct.yaml Track the live resource changes in your workspace using the kubectl get command. kubectl get workspace workspace-falcon-7b-instruct -w Check your service and get the service IP address of the inference endpoint using the kubectl get svc command. export SERVICE_IP=$(kubectl get svc workspace-falcon-7b-instruct -o jsonpath='{.spec.clusterIP}') Run the Falcon 7B-instruct model with a sample input of your choice using the following curl command: kubectl run -it --rm -n $namespace --restart=Never curl --image=curlimages/curl -- curl -X POST http://$serviceIp/chat -H "accept: application/json" -H "Content-Type: application/json" -d "{\"prompt\":\"Tell me about Tuscany and its cities.\", \"return_full_text\": false, \"generate_kwargs\": {\"max_length\":4096}}" NOTE As you track the live resource changes in your workspace, the machine readiness can take up to 10 minutes, and workspace readiness up to 20 minutes. Deploy Kaito using Terraform At the time of this writing, the azurerm_kubernetes_cluster resource in the AzureRM Terraform provider for Azure does not have a property to enable the add-on and install the Kubernetes AI toolchain operator (Kaito) on your AKS cluster. However, you can use the AzAPI Provider to deploy Kaito on your AKS cluster. The AzAPI provider is a thin layer on top of the Azure ARM REST APIs. It complements the AzureRM provider by enabling the management of Azure resources that are not yet or may never be supported in the AzureRM provider, such as private/public preview services and features. The following resources replicate the actions performed by the Azure CLI commands mentioned in the previous section. data "azurerm_resource_group" "node_resource_group" { count = var.Kaito_enabled ? 1 : 0 name = module.aks_cluster.node_resource_group depends_on = [module.node_pool] } resource "azapi_update_resource" "enable_Kaito" { count = var.Kaito_enabled ? 1 : 0 type = "Microsoft.ContainerService/managedClusters@2024-02-02-preview" resource_id = module.aks_cluster.id body = jsonencode({ properties = { aiToolchainOperatorProfile = { enabled = var.Kaito_enabled } } }) depends_on = [module.node_pool] } data "azurerm_user_assigned_identity" "Kaito_identity" { count = var.Kaito_enabled ? 1 : 0 name = local.KAITO_IDENTITY_NAME resource_group_name = data.azurerm_resource_group.node_resource_group.0.name depends_on = [azapi_update_resource.enable_Kaito] } resource "azurerm_federated_identity_credential" "Kaito_federated_identity_credential" { count = var.Kaito_enabled ? 1 : 0 name = "Kaito-federated-identity" resource_group_name = data.azurerm_resource_group.node_resource_group.0.name audience = ["api://AzureADTokenExchange"] issuer = module.aks_cluster.oidc_issuer_url parent_id = data.azurerm_user_assigned_identity.Kaito_identity.0.id subject = "system:serviceaccount:kube-system:Kaito-gpu-provisioner" depends_on = [azapi_update_resource.enable_Kaito, module.aks_cluster, data.azurerm_user_assigned_identity.Kaito_identity] } resource "azurerm_role_assignment" "Kaito_identity_contributor_assignment" { count = var.Kaito_enabled ? 1 : 0 scope = azurerm_resource_group.rg.id role_definition_name = "Contributor" principal_id = data.azurerm_user_assigned_identity.Kaito_identity.0.principal_id skip_service_principal_aad_check = true depends_on = [azurerm_federated_identity_credential.Kaito_federated_identity_credential] } Here is a description of the code above: azurerm_resource_group.node_resource_group : Retrieves the properties of the node resource group in the current AKS cluster. azapi_update_resource.enable_Kaito : Enables the Kaito add-on. This operation installs the Kaito operator on the AKS cluster and creates the related user-assigned managed identity in the node resource group. azurerm_user_assigned_identity.Kaito_identity : Retrieves the properties of the Kaito user-assigned managed identity located in the node resource group. azurerm_federated_identity_credential.Kaito_federated_identity_credential : Creates the federated identity credential between the Kaito managed identity and the service account used by the Kaito controllers in the kube-system namespace, particularly the Kaito-gpu-provisioner controller. azurerm_role_assignment.Kaito_identity_contributor_assignment : Assigns the Contributor role to the Kaito managed identity with the AKS resource group as the scope. Create the Kaito Workspace using Terraform To create the Kaito workspace, you can utilize the kubectl_manifest resource from the Kubectl Provider in the following manner. resource "kubectl_manifest" "Kaito_workspace" { count = var.Kaito_enabled ? 1 : 0 yaml_body = <<-EOF apiVersion: Kaito.sh/v1alpha1 kind: Workspace metadata: name: workspace-falcon-7b-instruct namespace: ${var.namespace} annotations: Kaito.sh/enablelb: "False" resource: count: 1 instanceType: "${var.instance_type}" labelSelector: matchLabels: apps: falcon-7b-instruct inference: preset: name: "falcon-7b-instruct" EOF depends_on = [kubectl_manifest.service_account] } To access the OpenAPI schema of the Workspace custom resource definition, execute the following command: kubectl get crd workspaces.Kaito.sh -o jsonpath="{.spec.versions[0].schema}" | jq -r Kaito Workspace Inference Endpoint Kaito creates a Kubernetes service with the same name and inside the same namespace of the workspace. This service exposes an inference endpoint that AI applications can use to call the API exposed by the AKS-hosted model. Here is an example of an inference endpoint for a Falcon model from the Kaito documentation: curl -X POST \ -H "accept: application/json" \ -H "Content-Type: application/json" \ -d '{ "prompt":"YOUR_PROMPT_HERE", "return_full_text": false, "clean_up_tokenization_spaces": false, "prefix": null, "handle_long_generation": null, "generate_kwargs": { "max_length":200, "min_length":0, "do_sample":true, "early_stopping":false, "num_beams":1, "num_beam_groups":1, "diversity_penalty":0.0, "temperature":1.0, "top_k":10, "top_p":1, "typical_p":1, "repetition_penalty":1, "length_penalty":1, "no_repeat_ngram_size":0, "encoder_no_repeat_ngram_size":0, "bad_words_ids":null, "num_return_sequences":1, "output_scores":false, "return_dict_in_generate":false, "forced_bos_token_id":null, "forced_eos_token_id":null, "remove_invalid_values":null } }' \ "http://<SERVICE>:80/chat" Here are the parameters you can use in a call: prompt : The initial text provided by the user, from which the model will continue generating text. return_full_text : If False only generated text is returned, else full text is returned. clean_up_tokenization_spaces : True/False, determines whether to remove potential extra spaces in the text output. prefix : Prefix added to the prompt. handle_long_generation : Provides strategies to address generations beyond the model's maximum length capacity. max_length : The maximum total number of tokens in the generated text. min_length : The minimum total number of tokens that should be generated. do_sample : If True, sampling methods will be used for text generation, which can introduce randomness and variation. early_stopping : If True, the generation will stop early if certain conditions are met, for example, when a satisfactory number of candidates have been found in beam search. num_beams : The number of beams to be used in beam search. More beams can lead to better results but are more computationally expensive. num_beam_groups : Divides the number of beams into groups to promote diversity in the generated results. diversity_penalty : Penalizes the score of tokens that make the current generation too similar to other groups, encouraging diverse outputs. temperature : Controls the randomness of the output by scaling the logits before sampling. top_k : Restricts sampling to the k most likely next tokens. top_p : Uses nucleus sampling to restrict the sampling pool to tokens comprising the top p probability mass. typical_p : Adjusts the probability distribution to favor tokens that are "typically" likely, given the context. repetition_penalty : Penalizes tokens that have been generated previously, aiming to reduce repetition. length_penalty : Modifies scores based on sequence length to encourage shorter or longer outputs. no_repeat_ngram_size : Prevents the generation of any n-gram more than once. encoder_no_repeat_ngram_size : Similar to no_repeat_ngram_size but applies to the encoder part of encoder-decoder models. bad_words_ids : A list of token ids that should not be generated. num_return_sequences : The number of different sequences to generate. output_scores : Whether to output the prediction scores. return_dict_in_generate : If True, the method will return a dictionary containing additional information. pad_token_id : The token ID used for padding sequences to the same length. eos_token_id : The token ID that signifies the end of a sequence. forced_bos_token_id : The token ID that is forcibly used as the beginning of a sequence token. forced_eos_token_id : The token ID that is forcibly used as the end of a sequence when max_length is reached. remove_invalid_values : If True, filters out invalid values like NaNs or infs from model outputs to prevent crashes. Deploy the Terraform modules Before deploying the Terraform modules in the project, specify a value for the following variables in the terraform.tfvars variable definitions file. name_prefix = "Anubi" location = "westeurope" domain = "babosbird.com" kubernetes_version = "1.29.2" network_plugin = "azure" network_plugin_mode = "overlay" network_policy = "azure" system_node_pool_vm_size = "Standard_D4ads_v5" user_node_pool_vm_size = "Standard_D4ads_v5" ssh_public_key = "ssh-rsa XXXXXXXXXXXXXXXXXXXXXXXXXXXXX" vm_enabled = true admin_group_object_ids = ["XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"] web_app_routing_enabled = true dns_zone_name = "babosbird.com" dns_zone_resource_group_name = "DnsResourceGroup" namespace = "Kaito-demo" service_account_name = "Kaito-sa" grafana_admin_user_object_id = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" vnet_integration_enabled = true openai_enabled = false Kaito_enabled = true instance_type = "Standard_NC12s_v3" This is the description of the parameters: name_prefix : Specifies a prefix for all the Azure resources. location : Specifies the region (e.g., westeurope) where deploying the Azure resources. domain : Specifies the domain part (e.g., subdomain.domain) of the hostname of the ingress object used to expose the chatbot via the NGINX Ingress Controller. kubernetes_version : Specifies the Kubernetes version installed on the AKS cluster. network_plugin : Specifies the network plugin of the AKS cluster. network_plugin_mode : Specifies the network plugin mode used for building the Kubernetes network. Possible value is overlay. network_policy : Specifies the network policy of the AKS cluster. Currently supported values are calico, azure and cilium. system_node_pool_vm_size : Specifies the virtual machine size of the system-mode node pool. user_node_pool_vm_size : Specifies the virtual machine size of the user-mode node pool. ssh_public_key : Specifies the SSH public key used for the AKS nodes and jumpbox virtual machine. vm_enabled : a boleean value that specifies whether deploying or not a jumpbox virtual machine in the same virtual network of the AKS cluster. admin_group_object_ids : when deploying an AKS cluster with Microsoft Entra ID and Azure RBAC integration, this array parameter contains the list of Microsoft Entra ID group object IDs that will have the admin role of the cluster. web_app_routing_enabled : Specifies whether the application routing add-on is enabled. When enabled, this add-on installs a managed instance of the NGINX Ingress Controller on the AKS cluster. dns_zone_name : Specifies the name of the Azure Public DNS zone used by the application routing add-on. dns_zone_resource_group_name : Specifies the resource group name of the Azure Public DNS zone used by the application routing add-on. namespace : Specifies the namespace of the workload application. service_account_name : Specifies the name of the service account of the workload application. grafana_admin_user_object_id : Specifies the object id of the Azure Managed Grafana administrator user account. vnet_integration_enabled : Specifies whether API Server VNet Integration is enabled. openai_enabled : Specifies whether to deploy Azure OpenAI Service or not. This sample does not require the deployment of Azure OpenAI Service. Kaito_enabled : Specifies whether to deploy the Kubernetes AI Toolchain Operator (Kaito). instance_type : Specifies the GPU node SKU (e.g. Standard_NC12s_v3 ) to use in the Kaito workspace. NOTE We suggest reading sensitive configuration data such as passwords or SSH keys from a pre-existing Azure Key Vault resource. For more information, see Referencing Azure Key Vault secrets in Terraform. Before proceeding, also make sure to run the register-preview-features.sh Bash script in the terraform folder to register any preview feature used by the AKS cluster. GPU VM-family vCPU quotas Before installing the Terraform module, make sure to have enough vCPU quotas in the selected region for the GPU VM family specified in the instance_type parameter. In case you don't have enough quota, follow the instructions described in Increase VM-family vCPU quotas. The steps for requesting a quota increase vary based on whether the quota is adjustable or non-adjustable. Adjustable quotas: Quotas for which you can request a quota increase fall into this category. Each subscription has a default quota value for each VM family and region. You can request an increase for an adjustable quota from the Azure Portal My quotas page, providing an amount or usage percentage for a given VM family in a specified region and submitting it directly. This is the quickest way to increase quotas. Non-adjustable quotas: These are quotas which have a hard limit, usually determined by the scope of the subscription. To make changes, you must submit a support request, and the Azure support team will help provide solutions. If you don't have enough vCPU quota for the selected instance type, the Kaito workspace creation will fail. You can check the error description using the Azure Monitor Activity Log, as shown in the following figure: To read the logs of the Kaito GPU provisioner pod in the kube-system namespace, you can use the following command. kubectl logs -n kube-system $(kubectl get pods -n kube-system | grep Kaito-gpu-provisioner | awk '{print $1; exit}') In case you exceeded the quota for the selected instance type, you could see an error message as follows: {"level":"INFO","time":"2024-04-04T08:42:40.398Z","logger":"controller","message":"Create","machine":{"name":"ws560b34aa2"}} {"level":"INFO","time":"2024-04-04T08:42:40.398Z","logger":"controller","message":"Instance.Create","machine":{"name":"ws560b34aa2"}} {"level":"INFO","time":"2024-04-04T08:42:40.398Z","logger":"controller","message":"createAgentPool","agentpool":"ws560b34aa2"} {"level":"ERROR","time":"2024-04-04T08:42:48.010Z","logger":"controller","message":"Reconciler error","controller":"machine.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"Machine","Machine":{"name":"ws560b34aa2"},"namespace":"","name":"ws560b34aa2","reconcileID":"b6f56170-ae31-4b05-80a6-019d3f716acc","error":"creating machine, creating instance, agentPool.BeginCreateOrUpdate for \"ws560b34aa2\" failed: PUT https://management.azure.com/subscriptions/1a45a694-af23-4650-9774-89a981c462f6/resourceGroups/AtumRG/providers/Microsoft.ContainerService/managedClusters/AtumAks/agentPools/ws560b34aa2\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Bad Request\nERROR CODE: PreconditionFailed\n--------------------------------------------------------------------------------\n{\n \"code\": \"PreconditionFailed\",\n \"details\": null,\n \"message\": \"Provisioning of resource(s) for Agent Pool ws560b34aa2 failed. Error: {\\n \\\"code\\\": \\\"InvalidTemplateDeployment\\\",\\n \\\"message\\\": \\\"The template deployment '490396b4-1191-4768-a421-3b6eda930287' is not valid according to the validation procedure. The tracking id is '1634a570-53d2-4a7f-af13-5ac157edbb9d'. See inner errors for details.\\\",\\n \\\"details\\\": [\\n {\\n \\\"code\\\": \\\"QuotaExceeded\\\",\\n \\\"message\\\": \\\"Operation could not be completed as it results in exceeding approved standardNVSv3Family Cores quota. Additional details - Deployment Model: Resource Manager, Location: eastus, Current Limit: 0, Current Usage: 0, Additional Required: 24, (Minimum) New Limit Required: 24. Submit a request for Quota increase at https://aka.ms/ProdportalCRP/#blade/Microsoft_Azure_Capacity/UsageAndQuota.ReactView/Parameters/%7B%22subscriptionId%22:%221a45a694-af23-4650-9774-89a981c462f6%22,%22command%22:%22openQuotaApprovalBlade%22,%22quotas%22:[%7B%22location%22:%22eastus%22,%22providerId%22:%22Microsoft.Compute%22,%22resourceName%22:%22standardNVSv3Family%22,%22quotaRequest%22:%7B%22properties%22:%7B%22limit%22:24,%22unit%22:%22Count%22,%22name%22:%7B%22value%22:%22standardNVSv3Family%22%7D%7D%7D%7D]%7D by specifying parameters listed in the ‘Details’ section for deployment to succeed. Please read more about quota limits at https://docs.microsoft.com/en-us/azure/azure-supportability/per-vm-quota-requests\\\"\\n }\\n ]\\n }\",\n \"subcode\": \"\"\n}\n--------------------------------------------------------------------------------\n"} Kaito Chat Application The project provides the code of a chat application using Python and Chainlit that interacts with the inference endpoint exposed by the AKS-hosted model. As an alternative, the chat application can be configured to call the REST API of an Azure OpenAI Service. For more information about how to configure the chat application with Azure OpenAI Service, see the following articles: Create an Azure OpenAI, LangChain, ChromaDB, and Chainlit chat app in AKS using Terraform (Azure Samples)(My GitHub)(Tech Community) Deploy an OpenAI, LangChain, ChromaDB, and Chainlit chat app in Azure Container Apps using Terraform (Azure Samples)(My GitHub)(Tech Community) This is the code of the sample application. # Import packages import os import sys import requests import json from openai import AsyncAzureOpenAI import logging import chainlit as cl from azure.identity import DefaultAzureCredential, get_bearer_token_provider from dotenv import load_dotenv from dotenv import dotenv_values # Load environment variables from .env file if os.path.exists(".env"): load_dotenv(override=True) config = dotenv_values(".env") # Read environment variables temperature = float(os.environ.get("TEMPERATURE", 0.9)) top_p = float(os.environ.get("TOP_P", 1)) top_k = float(os.environ.get("TOP_K", 10)) max_length = int(os.environ.get("MAX_LENGTH", 4096)) api_base = os.getenv("AZURE_OPENAI_BASE") api_key = os.getenv("AZURE_OPENAI_KEY") api_type = os.environ.get("AZURE_OPENAI_TYPE", "azure") api_version = os.environ.get("AZURE_OPENAI_VERSION", "2023-12-01-preview") engine = os.getenv("AZURE_OPENAI_DEPLOYMENT") model = os.getenv("AZURE_OPENAI_MODEL") system_content = os.getenv("AZURE_OPENAI_SYSTEM_MESSAGE", "You are a helpful assistant.") max_retries = int(os.getenv("MAX_RETRIES", 5)) timeout = int(os.getenv("TIMEOUT", 30)) debug = os.getenv("DEBUG", "False").lower() in ("true", "1", "t") useLocalLLM = os.getenv("USE_LOCAL_LLM", "False").lower() in ("true", "1", "t") aiEndpoint = os.getenv("AI_ENDPOINT", "") if not useLocalLLM: # Create Token Provider token_provider = get_bearer_token_provider( DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default", ) # Configure OpenAI if api_type == "azure": openai = AsyncAzureOpenAI( api_version=api_version, api_key=api_key, azure_endpoint=api_base, max_retries=max_retries, timeout=timeout, ) else: openai = AsyncAzureOpenAI( api_version=api_version, azure_endpoint=api_base, azure_ad_token_provider=token_provider, max_retries=max_retries, timeout=timeout, ) # Configure a logger logging.basicConfig( stream=sys.stdout, format="[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s", level=logging.INFO, ) logger = logging.getLogger(__name__) @cl.on_chat_start async def start_chat(): await cl.Avatar( name="Chatbot", url="https://cdn-icons-png.flaticon.com/512/8649/8649595.png", ).send() await cl.Avatar( name="Error", url="https://cdn-icons-png.flaticon.com/512/8649/8649595.png", ).send() await cl.Avatar( name="You", url="https://media.architecturaldigest.com/photos/5f241de2c850b2a36b415024/master/w_1600%2Cc_limit/Luke-logo.png", ).send() if not useLocalLLM: cl.user_session.set( "message_history", [{"role": "system", "content": system_content}], ) @cl.on_message async def on_message(message: cl.Message): # Create the Chainlit response message msg = cl.Message(content="") if useLocalLLM: payload = { "prompt": f"{message.content} answer:", "return_full_text": False, "clean_up_tokenization_spaces": False, "prefix": None, "handle_long_generation": None, "generate_kwargs": { "max_length": max_length, "min_length": 0, "do_sample": True, "early_stopping": False, "num_beams":1, "num_beam_groups":1, "diversity_penalty":0.0, "temperature": temperature, "top_k": top_k, "top_p": top_p, "typical_p": 1, "repetition_penalty": 1, "length_penalty": 1, "no_repeat_ngram_size":0, "encoder_no_repeat_ngram_size":0, "bad_words_ids": None, "num_return_sequences":1, "output_scores": False, "return_dict_in_generate": False, "forced_bos_token_id": None, "forced_eos_token_id": None, "remove_invalid_values": True } } headers = {"Content-Type": "application/json", "accept": "application/json"} response = requests.request( method="POST", url=aiEndpoint, headers=headers, json=payload ) # convert response.text to json result = json.loads(response.text) result = result["Result"] # remove all double quotes if '"' in result: result = result.replace('"', "") msg.content = result else: message_history = cl.user_session.get("message_history") message_history.append({"role": "user", "content": message.content}) logger.info("Question: [%s]", message.content) async for stream_resp in await openai.chat.completions.create( model=model, messages=message_history, temperature=temperature, stream=True, ): if stream_resp and len(stream_resp.choices) > 0: token = stream_resp.choices[0].delta.content or "" await msg.stream_token(token) if debug: logger.info("Answer: [%s]", msg.content) message_history.append({"role": "assistant", "content": msg.content}) await msg.send() Here's a brief explanation of each variable and related environment variable: temperature : A float value representing the temperature for Create chat completion method of the OpenAI API. It is fetched from the environment variables with a default value of 0.9. top_p : A float value representing the top_p parameter that uses nucleus sampling to restrict the sampling pool to tokens comprising the top p probability mass. top_k : A float value representing the top_k parameter that restricts sampling to the k most likely next tokens. api_base : The base URL for the OpenAI API. api_key : The API key for the OpenAI API. The value of this variable can be null when using a user-assigned managed identity to acquire a security token to access Azure OpenAI. api_type : A string representing the type of the OpenAI API. api_version : A string representing the version of the OpenAI API. engine : The engine used for OpenAI API calls. model : The model used for OpenAI API calls. system_content : The content of the system message used for OpenAI API calls. max_retries : The maximum number of retries for OpenAI API calls. timeout : The timeout in seconds. debug : When debug is equal to true , t , or 1 , the logger writes the chat completion answers. useLocalLLM : the chat application calls the inference endpoint of the local model when the parameter value is set to true. aiEndpoint : the URL of the inference endpoint. The application calls the inference endpoint using the requests.request method when the useLocalLLM environment variable is set to true . You can run the application locally using the following command. The -w flag` indicates auto-reload whenever we make changes live in our application code. chainlit run app.py -w NOTE To locally debug your application, you have two options to expose the AKS-hosted inference endpoint service. You can either use the kubectl port-forward command or utilize an ingress controller to expose the endpoint publicly. Deployment Scripts and YAML manifests You can locate the Dockerfile, Bash scripts, and YAML manifests for deploying the chat application to your AKS cluster in the companion sample under the scripts folder. Conclusions In conclusion, while it is possible to manually create a GPU-enabled agent nodes, deploy, and tune open-source large language models (LLMs) like Falcon, Mistral, or Llama 2 on Azure Kubernetes Service (AKS), using the Kubernetes AI toolchain operator (Kaito) automates these steps for you. Kaito simplifies the experience of running OSS AI models on your AKS clusters by automatically provisioning the necessary GPU nodes and setting up the inference server as an endpoint for your models. By utilizing Kaito, you can reduce the time spent on infrastructure setup and focus more on AI model usage and development. Additionally, Kaito has just been released, and new features are expected to follow, providing even more capabilities for managing and deploying AI models on AKS.
paolosalvatori
Apr 09, 2024 Place Azure Partners
5KViews
4likes
0Comments
Deploying Flask Apps to Azure Web App via Docker Hub
Learn the technical intricacies of deploying Python-based Flask apps to Azure Web App using Docker Hub. Follow the step-by-step guide, covering Flask development, Docker containerization, and Azure infrastructure setup with Terraform. Gain insights into Azure CLI installation and Web App deployment, ensuring optimal performance. Explore continuous deployment options, allowing automatic redeployment upon Docker image updates. Master the seamless integration of technologies for a live and accessible book recommendation system, 'BookBuddy,' while optimizing for Azure's robust features.
Jiechen_Li
Jan 12, 2024 Place Educator Developer Blog
2.6KViews
3likes
2Comments
How To Easily Generate Terraform or Bicep Files
Learn how to quickly and easily generate infrastructure as code templates for Bicep or Terraform by using NubesGen.
AprilYoho
Feb 28, 2023 Place ITOps Talk Blog
18KViews
3likes
0Comments
Create Your Azure Infrastructure with Cloud Development Kit for Terraform - LIVE SESSION 15th Sept
This series will teach you how to use Cloud Development Kit for Terraform (CDK-TF) and Microsoft Azure serverless services. First, you will learn how to build everything as code, and we will teach you how to solve several real-world problems with just Azure Functions and short-lived Azure container instance. All projects covered in the series are opensource.
Lee_Stott
Sep 21, 2022 Place Educator Developer Blog
6KViews
1like
0Comments