Creating a containerized build agent for Azure DevOps and Azure DevOps Server
Published Oct 21 2024 11:51 AM 727 Views
Microsoft

In this article, we'll go over creating a containerized build agent for Azure DevOps and Azure DevOps Server. This ask came from a customer who was looking to retire their VM based build agents in favor of something that required less manual patching and maintenance. The build agent needed to be injected into a VNet, so it could communicate with the customer's Azure DevOps Server (though this works perfectly well with the Azure DevOps service) and deploy into an App Service on their VNet. The build agent needed to have the ability to build both the customer's Dotnet and JavaScript projects and then deploy them to Azure. The customer was also using an Artifacts feed in Azure DevOps for their NuGet and npm packages, so the build agent needed access to these feeds.

 

Attempt #1: Windows Container

Because the customer was more familiar with Windows, we decided to use a Windows based container image, specifically Windows 2022 LTSC. To this base image, in the dockerfile I added the Dotnet 8 SDK, PowerShell 7, the Az PowerShell module, Node/npm, and AzCopy.  My first observation was the Windows 2022 container image started at 3.24 GB in size, and by the time we added the various packages it had ballooned up to 8.5 GB.  

The next step was to upload this image to a Azure Container Registry, which took quite some time since, as previously noted, the image was so large. 

 

*NOTE:  If you do not have Docker installed, you can use the "az acr build" task to build the container image from your dockerfile and push the image to your Azure Container Registry, as I'll show in a later step.

 

I chose to host this in an Azure App Service, as this supported VNet Integration and is a fairly simple hosting platform for my container.  I added the following 4 environment variables:

  • AZP_URL - the URL of the Azure DevOps Server plus the project collection (or organization for Azure DevOps service), e.g. https://devops.contoso.com/myProjectCollection
  • AZP_POOL - the Agent Pool name where the build agent will live
  • AZP_TOKEN - a PAT token that authorizes your build agent to interact with Azure DevOps. Be very careful to treat this value as a secret (consider storing it in Azure KeyVault) as it has full access to your DevOps org or collection.
  • AZP_AGENT_NAME - a friendly name which will identify this build agent.

I restarted the App Service so my container could pick up the environment variables, and checking in Azure DevOps Server, could see my build agent was registered in my agent pool.  I created a sample dotnet application and a sample Node application to test the build pipelines. Both applications built successfully with my new containerized build agent.

 

Success!!!  Or so I thought...

I turned the build agent over to my customer and they tried building their (larger and more complex) projects with the containerized build agent. The dotnet project restored and built without issue, but their Node application was dying on the "npm install" step with the following error:  "FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory." I tried several things to fix this.

 

  • Many articles recommended adjusting Node's max-old-space-size parameter (i.e. how much memory to allocate to old objects on the heap before garbage collecting). 
  • There's also a default memory limit for Windows Containers running on Azure App Service which are tied to the App Service Plan SKU.  You can update this limit with the WEBSITE_MEMORY_LIMIT_MB app setting up to the limit of the App Service Plan.
  • Finally, when all else fails, scale up the App Service Plan to the maximum (these go to eleven).

While these steps seemed to lessen the effects of the problem, we were still were having intermittent failures in the pipeline runs for the "JavaScript heap out of memory" exception. Plus, running on the highest SKU available cost more than the customer really wanted to spend.  Back to the drawing board.


Attempt #2: Linux Container

My next thought went to doing this in Linux. The Ubuntu 22.04 image is only 77.86 MB in size, a fraction of the Windows size, and even by the time we install PowerShell Core, the Dotnet 8 SDK, Azure CLI and Node JS, it's still barely 2 GB in size for the whole package, again about 25% of the size the Windows container had ballooned to.

After I'd created and built my dockerfile and pushed the container image to my Azure Container Registry, I tried running it in an Azure App Service, but noticed that the container kept failing post start with an error. The error indicated the service was not responding to health probes. This observation made a certain amount of sense; because there is no front end to the container, rather it's an agent listening for a signal from Azure DevOps.  Well, luckily in Azure we have lots of container hosting options, so I opted to switch over to using Azure Container Instances instead.

Networking and Container Instances

One thing I immediately noticed however was that while my test container running against Azure DevOps service worked just fine, my network injected container was throwing a DNS lookup error, while trying to resolve the name of the Azure Dev Ops Server. Typically, Azure services, which are injected into a VNet, inherit the DNS settings of the VNet itself. I verified the DNS settings and found the VNet had custom DNS servers specified, so what in the container is going on here???

It turns out, in order for Container Instances to use custom DNS, those custom DNS servers have to be specified at the time the Container Instance is created.  Unfortunately, the portal is somewhat limited as to what you can specify during creation, so I wrote a little bicep script to build the Container Instance. In addition to setting custom DNS, I was also able to create and assign a User Assigned Managed Identity to the Container Instance for accessing our Container Registry securely.

*As an aside, you MUST use a User Assigned, vs. System Assigned Managed Identity here if you are restricting access to your Container Registry. The reason is a bit of a "chicken/egg" problem. If you specify a User Assigned identity, you can create it and assign it access BEFORE the Container Instance is created. With a System Assigned identity, the Container Instance will attempt to pull the image as part of the deployment process and will fail before the Container Instance and the associated System Assigned identity can be created.

Once the Container Instance was deployed and running the build agent code, we were able to successfully run our build pipelines. We initially started out very small, with a single CPU and 1.5GB of RAM and did occasionally hit a "JavaScript heap out of memory" exception, but increasing the RAM eliminated this issue altogether. 

 

Screenshot 2024-09-17 162906.png

 

Microsoft Defender for Containers and self-updating the container registry

One nice thing about having our build agents as containers is that we can configure Microsoft Defender to scan the Container Registry for vulnerabilities with Defender for Containers. While we can also scan our VM based build agents with Microsoft Defender for Servers, running in a containerized fashion gives us the opportunity to actually "self-heal" our container images by periodically re-running our dockerfile and pulling updated versions of the base OS and various packages (assuming we're not pulling specific versions of software).  This can be accomplished with a couple of simple az cli commands in a pipeline.

 

az acr build . -r registryname -t imagename:latest --platform linux --file dockerfile
#Trim the number of manifests to 1 to cleanup Defender for Container results
az acr run --registry registryname --cmd 'acr purge --filter "imagename:.*" --keep 1 --untagged --ago 1d' /dev/null

 

Wrapping Up

I have placed the scripts and dockerfiles used in this blog in our GitHub repo here. This includes the dockerfile to build the Linux Agent, the bash script which installs the Azure DevOps agent code, my (failed) Windows version of the container, as well as the Container Instance bicep code to deploy a Container Instance with custom DNS and a Managed Identity. I hope this is helpful and please let me know if you run into any issues.

Co-Authors
Version history
Last update:
‎Oct 21 2024 11:49 AM
Updated by: