Managing Container Images for Private/Restricted Azure Kubernetes Clusters
Published Jun 09 2023 09:32 AM 3,448 Views
Microsoft

Managing Container images for Private/Restricted Azure Kubernetes Clusters

 

 

Introduction 

 

Recently I am having an increasing number of conversations with customers on how best to manage container images in disconnected or highly restricted environments. Cluster security is becoming increasingly more important, a 2023 report by Red Hat showed that 90% of enterprises surveyed had experienced a security incident in their Kubernetes environments in the last 12 months.

 

There is certainly a trend from customers recently to harden environments making the most of Azure Services such as Azure Firewall to manage egress traffic or Network security groups to restrict traffic within an Azure Subscription. When doing this a problem arises regarding how to manage public container images that may be utilised such as NGINX. In this article I will discuss three ways to assist in securing your deployment and managing your images.  

 

Azure Firewall for Cluster Egress. 

 

With Azure Kubernetes Service you are able to control all cluster egress traffic by setting up user defined routing. This allows for outbound requests originating from any agent node to follow the UDR set on the subnet the cluster is deployed into. This allows for changes to the default routing method for all traffic. For internal traffic a UDR may be configured to forward traffic to an internal load balancer, for outbound traffic a hop to your Azure Firewall instance can be created.

 

I won't go through the steps required to configure and setup Azure Firewall for AKS Egress in this post as it is laid out step by step in our documentation here. If your organisation has a policy implementing strict firewall policies you may experience an error when deploying your Kubernetes resources.

 

The error will state that the image could not be copied. It will start with something along the lines of: 

 

 

 

 

 

 

 

 

failed to copy: httpReadSeeker

 

 

 

 

 

 

 

 

This very non-descript error is the result of your firewall not allowing the outbound request to be processed. At this point we have two options depending on the degree of security required for your deployment. We could pull all images into our own Azure Container Registry which has multiple benefits covered later in this post or we can create a firewall policy group to allowlist some container registries. 

 

Although the former is certainly advisable best practice for customers running production workloads for cost conscious users running small scale clusters or isolated development environments allowlisting the repositories may be a valid approach. We can do this by creating a new firewall policy rule collection and associating it with the firewall. We can specify that traffic, originating from our AKS nodes subnet is allowed to travel to certain IP addresses. We also have the ability to enable DNS Proxy on our firewall and create firewall rules for fully qualified domain names. 

 

Some container registries take the liberty of providing the IP's required to allowlist upfront such as Github. Other container registries are less forthcoming and require some investigative work. This can be done using nslookup alongside the domain name you would like the IP addresses for:

 

 

 

 

 

 

 

nslookup quay.io
Server:  UnKnown
Address:  192.168.1.1

Non-authoritative answer:
Name:    quay.io
Addresses:  2600:1f18:483:cf01:33f6:d7d4:a9b1:e3ce
          2600:1f18:483:cf01:6456:d087:d0c8:2d01
          2600:1f18:483:cf01:d5f4:94ce:f190:aeaf
          2600:1f18:483:cf00:521a:3ba6:1b12:e130
          2600:1f18:483:cf02:e10b:8bbe:7d4c:1871
          2600:1f18:483:cf02:68bb:2416:361c:5b8a
          2600:1f18:483:cf00:1735:7dca:f93a:907f
          2600:1f18:483:cf00:3003:c77b:dfa8:94fc
          52.206.40.42
          3.87.166.194
          54.166.80.25
          34.237.27.205
          3.210.148.47
          52.44.162.2
          3.224.204.235
          34.237.31.230

 

 

 

 

 

 

 

 

Two important things to point out here are: 

1. IPV6 IP addresses are not yet supported in Azure Firewall.

2. Using this method you must be aware of each possible URL for your registries for example quay has 4 separate CDN urls that also have to be allowlisted. (See below for an example) 

 

 

 

 

 

 

 

 

nslookup cdn01.quay.io
Server:  UnKnown
Address:  192.168.1.1

Non-authoritative answer:
Name:    cdn01.quay.io
Addresses:  2600:9000:225c:fa00:e:ac7b:9cc0:93a1
          2600:9000:225c:a000:e:ac7b:9cc0:93a1
          2600:9000:225c:5000:e:ac7b:9cc0:93a1
          2600:9000:225c:cc00:e:ac7b:9cc0:93a1
          2600:9000:225c:7800:e:ac7b:9cc0:93a1
          2600:9000:225c:5a00:e:ac7b:9cc0:93a1
          2600:9000:225c:3000:e:ac7b:9cc0:93a1
          2600:9000:225c:2000:e:ac7b:9cc0:93a1
          216.137.44.65
          216.137.44.98
          216.137.44.45
          216.137.44.105

 

 

 

 

 

 

As your pull operations are TLS based it may be easier to use Application rules in Azure Firewall. Application rules are evaluated after network rules but allow for easier allow listing over both IP addresses and DNS proxy if the protocol is HTTP, HTTPS or MSSQL.  


See below for an example

of using an application rule to allow outbound traffic to google. 

IMG_0248.png

Limitations

 

For some use cases this method may be suitable however it is worth noting for some registries it is not possible. For registry.k8s.io (where NGINX images are stored among many others) allowlisting IP addresses is not advisable. This is for two reasons the first is that it is a free volunteer managed registry which is available on a best-effort basis. The second is that as a result of this it is implicitly stated "API endpoints, IP addresses, and backing services used are subject to change at anytime as new resources become available or as otherwise necessary."


Even when using application rules in Azure Firewall we are still relaying directly on the public registry. 

 

So the question for production environments where these types of risks are not acceptable, becomes how do we mitigate these issues?

 

Azure Container Registry

 

For those unfamiliar Azure Container Registry allows you to build, store, scan, replicate and manage container images and artifacts. Azure Container Registry comes with all of the great features that you would expect of an Azure service such as Private Network integration with PLS,granular permissions and access through managed identities, Azure RBAC and Geo-Replication for significant resilience and performance benefits . However it also includes some key features specifically for the problem faced in restricted environments.

 

1. Azure Container Registry Tasks - ACR has a feature called tasks. Tasks are a suite of features within ACR that enable users to build push and run images through the command line. Not only that but ACR Tasks also make it incredibly easy to import images from other registries into your own ACR. For example below we can use the import command to pull the NGINX controller image into our own private ACR. 

 

 

 

 

 

 

 

az acr import --name croowzerotrustfpbldpbu6w6bu --source registry.k8s.io/ingress-nginx/controller:v1.5.1 --image ingress-nginx-controller:latest 

 

 

 

 

 

 

 

This is a great way to simplify the task of copying images from public registries into our own managed registries. We can also add authentication to these calls should we want to pull from a private registry. 

 

Naturally once of your first thoughts as a security concerned Kubernetes user may be "How can I trust where these import or build jobs are actually running?!" Well we now have in public preview another layer of security that allows ACR tasks to be run in your own dedicated agent pools. Allowing for full control and autonomy over your image build lifecycle while maintaining the convince of ACR Tasks. 

 

We can even schedule these tasks and treat them like a cronjob to periodically import the latest version of an image.

 

This is also great because we continue to be able to really lock down our cluster egress without massive allow lists as the agent pool of ACR is making all the public outbound requests for our images. 

 

Although a drawback is that we are then continually polling registries for updates that may only occur every week, month etc. Is there another way to keep our images synced? 

 

2. Azure Container Registry - Cache (Preview)

 

Although it is likely that in production environments you will want to pin your image versions in your deployment manifests for reliability there is still associated overhead with upgrading, importing and managing images as new versions are released. Now with Cache for Azure Container Registry in Preview we can avoid scheduling jobs  and leverage the concept of a pull through cache. A pull through cache provides the benefits mentioned previously such as high speed pulls and private networking as well as the ability to ensure upstream content is delivered to your registry. This means no more scheduled tasks as we can create caching rules for each image and even include credentials! Currently this is limited only to Microsoft Artifact Registry and DockerHub but soon will have capabilities expanded and allow for even easier image management for restricted enviroments.

 

As a side note for truly disconnected environments using Azure Stack Hub ACR is now supported too! 

 

Conclusion

 

In conclusion there are multiple approaches and ways to manage using public container images in restricted environments with investments continually being made to ensure the security and availability of the images your critical workloads rely on.

 

What additional security features would you like to see added to Azure Container Registry? Feel free to comment below.

Co-Authors
Version history
Last update:
‎Jun 10 2023 05:13 AM
Updated by: