Blog Post

ITOps Talk Blog

12 MIN READ

Infrastructure as Code (IaC): Comparing the Tools

Microsoft

Feb 24, 2022

When you go to deploy a server or any part of our infrastructure manually, how long does it take you? Can you do a manual deployment end to end without any mistakes? Now, how do you scale that? This is where automation comes in, more specifically Infrastructure as Code (IaC).

In many of the companies I've worked for it would take days for a server to be deployed, why? Because there was a ‘process’ and a physical paper checklist that had to be followed, signed off, and checked again. Each person had to complete their task(s) and get them signed off. To get a server deployed, you'd have to configure the VM and the host (networking, storage, etc), each server required an image to be deployed, patch the OS, harden the deployment, then install/configure an application. Once that was all done… the server was ready for sign off and handed over to the customer. That took 3 days.

In some of the environments I managed, I could automate most of a complete server/infrastructure deployment in a few hours, it was still a very manual process, mistakes were often made. This is when I discovered Infrastructure as Code, many ask, where do I begin? With all the are various choices when it comes to choosing the right tool for the job, which one is best?

Let’s first begin with defining what is Infrastructure as Code. Infrastructure as Code (IaC) is the management of infrastructure (networks, virtual machines, load balancers, and connection topology) in a descriptive model, using version control to store the files. You can also watch this awesome one minute video from the great Abel Wang, What is Infrastructure as Code?

There are a huge number of benefits to using IaC, to name just a few:

Your infrastructure can be stored in a source code repository (GitHub, Azure Repos, etc) adding in governance, versioning and increasing collaboration.
Infrastructure becomes reproducible and you can introduce life cycling into your deployments (implementing CI/CD – Continuous Integration/Continuous Deployment)
Scalability
Removes human error
Increase speed and consistency of your infrastructure deployments, lowering infrastructure administration costs.
Increasing productivity of your teams

Declarative vs Imperative Methods

When writing you infrastructure as code it is important to understand the difference between these two methods so that you understand the difference in the types of templates that can be written and the way in which you will write them.

Declarative languages define the desired state of the target, the system executes what needs to happen to achieve the desired state. Effectively you define the end state of the infrastructure, adding the resources that you need, along with their configuration and the IaC tool will figure the rest out.

Imperative languages define the specific commands that must executed and in the specific order the commands must run to achieve the desired state.

A declarative example would be: ‘Can I have a cup of coffee on my desk after lunch?’

Whereas an imperative example would be: ‘Go to the coffee machine, add 1 scoop of freshly ground beans and 400ml of water into the correct reservoir, press the start button, allow the coffee to fill the cup. Add in 50ml of fresh 2% milk to the cup and then deliver to my desk at precisely 1pm...’ You get the idea.

An imperative language requires more specific input and can fail during the process if one of the steps is not fulfilled properly for any reason.

A declarative style is great when you need to update your infrastructure or make any changes to it. Whereas the imperative is good for a deploy and forget model, but that isn’t always great if you’re looking to be an agile organization or have a changing infrastructure. The choice really comes down to personal preference and which situation fits best for your team.

IaC Tooling: So many Choices!

There are numerous tools that can be used for IaC, there are some questions that I would ask yourself and your team:

What skillsets are already present in the team around specific languages (i.e., C#, Golang, JSON, Typescript or none of the above – also a valid answer)?
What platform are we deploying onto (on-prem, Azure, AWS, etc)?

Does an imperative or declarative language make sense?
Are you looking to provision and manage configuration? Or just provision infrastructure?

I’ve listed some of the tools below, I’ll go through each one and describe some pros and cons, hopefully leading you to pick the one that suits you and your team the best.

Azure Resource Manager (ARM) Templates

ARM Templates are designed specifically for deployments into Microsoft Azure. If you are looking for a tool for on-premises environments or multiple cloud providers, this isn’t it. ARM is the native IaC templating option for Azure. You can deploy a resource in Azure using the Azure Portal, then download your template so that you can do it again and repeat the process. That is an easier way to get started, but there are some drawbacks.

First, you need to learn JSON, which could be your first hurdle. Also, when you export an ARM template there is quite a bit of boilerplate code that you need. ARM, for many people, can be difficult to learn. There is not a way to really know if what you’re deploying is what will get deployed (there isn’t a ‘what-if’ usage or ‘plan’ output that shows you what is about to be deployed). ARM has other limitations when it comes to writing IaC, such as when you get a validation or syntax error, it can be painful to troubleshoot with ARM. ARM templates can also grow to be very large and sometimes unwieldly. In an environment that needs repeatability and scalability, it can cause some issues.

On the other hand, there are some great learning resources for ARM templates if that is the path you choose:

ARM QuickStart Templates

Microsoft Learn – Create and deploy ARM templates

Pros:

Easy to export a working template from something that has been deployed from the Azure Portal
Fantastic (free) learning resources and QuickStart templates
Azure native so it supports all Azure services from day 0

Cons:

Learning curve around JSON
Templates can get long and unruly
Doesn’t manage state of your infrastructure, changes can be breaking

Bicep

Bicep is the Domain Specific Language (DSL) that allows for declarative deployment of Azure resources, so yes, this is an IaC tool that is native to Azure. Anything that you can do with an ARM template you can do with Bicep (and more!). As soon as a new resource is added into Azure, it is immediately supported by Bicep. Bicep requires a lot less syntax than ARM templates, you can compare the template syntax differences here.

Bicep vs ARM Templates

Bicep allows for the use of modules, which means you create a module for each grouping of resources, creating much more manageable and readable files. It keeps your IaC from getting too big and unruly. Bicep is integrated into the Azure CLI, making the Azure deployment experience really seamless.

One of my favorite features of Bicep is the ‘What-if’ operation. When you pass the argument, it checks your current deployment and what changes would be applied before you make them, allowing you to confirm those changes before it applies them. Knowing what you’re about to deploy before you push the button to deploy it is a great way to validate and ensure your results without having to deploy it first.

Pros:

Syntax improvements, much simpler than ARM for writing templates
Modules – allows you to create more complex templates much more easily
Resource dependency management is better managed with Bicep, it will automatically detect resource dependencies.

Cons:

Used only with deployments to Azure

Limitations still exist in its capability compared to other tools

Great learning resources with Bicep:

Get started with Bicep

Write your first Bicep Module with Microsoft Learn (and other free learning paths around Bicep)

Barbara Forbes' Blog for Bicep Learnings

Terraform:

Terraform is an open-source tool that uses HCL (Hashicorp Configuration Language), which is based on Golang, which many people find one of the most easily learned IaC languages. Terraform comes with a lot of benefits that makes it a popular choice.

Terraform can be used with any cloud and on-prem resources. While it requires a different template, you can use the same language and formatting to deliver IaC to any environment. The reality is most organization are multi-cloud and configured in a hybrid model, this is where Terraform shines.

terraform {
  required_version = ">=0.12"
  
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = "~>2.0"
    }
  }
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "vmss" {
 name     = var.resource_group_name
 location = var.location
 tags     = var.tags
}

resource "random_string" "fqdn" {
 length  = 6
 special = false
 upper   = false
 number  = false
}

resource "azurerm_virtual_network" "vmss" {
 name                = "vmss-vnet"
 address_space       = ["10.0.0.0/16"]
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name
 tags                = var.tags
}

resource "azurerm_subnet" "vmss" {
 name                 = "vmss-subnet"
 resource_group_name  = azurerm_resource_group.vmss.name
 virtual_network_name = azurerm_virtual_network.vmss.name
 address_prefixes       = ["10.0.2.0/24"]
}

resource "azurerm_public_ip" "vmss" {
 name                         = "vmss-public-ip"
 location                     = var.location
 resource_group_name          = azurerm_resource_group.vmss.name
 allocation_method            = "Static"
 domain_name_label            = random_string.fqdn.result
 tags                         = var.tags
}

resource "azurerm_lb" "vmss" {
 name                = "vmss-lb"
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name

 frontend_ip_configuration {
   name                 = "PublicIPAddress"
   public_ip_address_id = azurerm_public_ip.vmss.id
 }

 tags = var.tags
}

resource "azurerm_lb_backend_address_pool" "bpepool" {
 loadbalancer_id     = azurerm_lb.vmss.id
 name                = "BackEndAddressPool"
}

resource "azurerm_lb_probe" "vmss" {
 resource_group_name = azurerm_resource_group.vmss.name
 loadbalancer_id     = azurerm_lb.vmss.id
 name                = "ssh-running-probe"
 port                = var.application_port
}

resource "azurerm_lb_rule" "lbnatrule" {
   resource_group_name            = azurerm_resource_group.vmss.name
   loadbalancer_id                = azurerm_lb.vmss.id
   name                           = "http"
   protocol                       = "Tcp"
   frontend_port                  = var.application_port
   backend_port                   = var.application_port
   backend_address_pool_id        = azurerm_lb_backend_address_pool.bpepool.id
   frontend_ip_configuration_name = "PublicIPAddress"
   probe_id                       = azurerm_lb_probe.vmss.id
}

resource "azurerm_virtual_machine_scale_set" "vmss" {
 name                = "vmscaleset"
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name
 upgrade_policy_mode = "Manual"

 sku {
   name     = "Standard_DS1_v2"
   tier     = "Standard"
   capacity = 2
 }

 storage_profile_image_reference {
   publisher = "Canonical"
   offer     = "UbuntuServer"
   sku       = "16.04-LTS"
   version   = "latest"
 }

 storage_profile_os_disk {
   name              = ""
   caching           = "ReadWrite"
   create_option     = "FromImage"
   managed_disk_type = "Standard_LRS"
 }

 storage_profile_data_disk {
   lun          = 0
   caching        = "ReadWrite"
   create_option  = "Empty"
   disk_size_gb   = 10
 }

 os_profile {
   computer_name_prefix = "vmlab"
   admin_username       = var.admin_user
   admin_password       = var.admin_password
   custom_data          = file("web.conf")
 }

 os_profile_linux_config {
   disable_password_authentication = false
 }

 network_profile {
   name    = "terraformnetworkprofile"
   primary = true

   ip_configuration {
     name                                   = "IPConfiguration"
     subnet_id                              = azurerm_subnet.vmss.id
     load_balancer_backend_address_pool_ids = [azurerm_lb_backend_address_pool.bpepool.id]
     primary = true
   }
 }

 tags = var.tags
}

resource "azurerm_public_ip" "jumpbox" {
 name                         = "jumpbox-public-ip"
 location                     = var.location
 resource_group_name          = azurerm_resource_group.vmss.name
 allocation_method            = "Static"
 domain_name_label            = "${random_string.fqdn.result}-ssh"
 tags                         = var.tags
}

resource "azurerm_network_interface" "jumpbox" {
 name                = "jumpbox-nic"
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name

 ip_configuration {
   name                          = "IPConfiguration"
   subnet_id                     = azurerm_subnet.vmss.id
   private_ip_address_allocation = "dynamic"
   public_ip_address_id          = azurerm_public_ip.jumpbox.id
 }

 tags = var.tags
}

resource "azurerm_virtual_machine" "jumpbox" {
 name                  = "jumpbox"
 location              = var.location
 resource_group_name   = azurerm_resource_group.vmss.name
 network_interface_ids = [azurerm_network_interface.jumpbox.id]
 vm_size               = "Standard_DS1_v2"

 storage_image_reference {
   publisher = "Canonical"
   offer     = "UbuntuServer"
   sku       = "16.04-LTS"
   version   = "latest"
 }

 storage_os_disk {
   name              = "jumpbox-osdisk"
   caching           = "ReadWrite"
   create_option     = "FromImage"
   managed_disk_type = "Standard_LRS"
 }

 os_profile {
   computer_name  = "jumpbox"
   admin_username = var.admin_user
   admin_password = var.admin_password
 }

 os_profile_linux_config {
   disable_password_authentication = false
 }

 tags = var.tags
}

Terraform builds resources, makes changes and can call existing resources using a state file. Terraform is easily readable and uses modules to easily configure your code and call your resources. While Terraform is a declarative language, it does call the state file to know what it is supposed to deploy. Managing the state file does introduce other topics (security, access, etc), but is very much achieved using the documentation in place. Learn more about Terraform state files here.

Terraform has great features built in to validate your code, run a ‘plan’ so you know exactly what elements are going to change before they change, and traceability of what was deployed. Terraform shines when you want to continuously deploy your infrastructure, it even has the ability to deploy to different environments using workspaces.

Pros:

Multi-cloud capability
Easy to write and understand syntax, while also easy to setup and deploy
Built in features to show what is deploying before it is deployed, as well as validation and formatting.

Cons:

New services in Azure aren’t always available to deploy using Terraform
Declarative languages require the use of dependency mapping when deploying (example: deploying a VM without networking first, will error out)

Terraform on Azure Video

Terraform on Azure Blog - covering the basics into modules and state files

Generate your first Terraform template with NubesGen

Terraform on Azure Documentation

Pulumi:

Pulumi is another IaC tool that uses a declarative format to deploy your infrastructure, the biggest differentiator with Pulumi is that it allows you to write your IaC in the language that your organization or team knows best. Pulumi support TypeScript, JavaScript, Python, Go and C#, which means that you write your templates in the language that you are comfortable with.

Adding in another bonus, you can use the testing tools native to that language to test your code. Testing is crucial. We not only want to deploy our infrastructure as code to automate tasks and increase our velocity, but we also need to reduce our human error. This is where testing is a crucial part of the development and deployment lifecycle.

Pulumi, like Terraform supports ANY cloud. It has another huge benefit: It can coexist or convert your existing templates from Terraform, ARM, Helm/YAML, etc into Pulumi.

Pros:

Pulumi allows for easy adoption with a more familiar language and allows the conversion of existing templates

Allows for IaC adoption in a language that works for your team

Cons:

If you don’t have ANY experience in any languages, you will need to choose one and skill up.

Video on deploying to Azure using Pulumi

Ansible

Ansible an imperative IaC tool, while it not only provisions your infrastructure, but it also manages the configuration of your services. The other services above do not, another 3rd party tool would be required. Ansible relies heavily on YAML files to define your infrastructure in the form of Ansible Playbooks and Python for its written language. These describe your automation tasks form deployment to ongoing state, it’s an all-in-one solution.

Ansible does not maintain state, it does not keep track of dependencies. Ansible is fairly easy to get started with but does have less of a community feel when looking for troubleshooting tips or self-help.

Pros:

Simple to learn as it’s written in an easily understood Python language, while the Playbooks are written in YAML
Ansible is agentless, decreasing maintenance and performance degradations

Cons:

Lack of state meaning it doesn’t track dependencies. It will execute tasks sequentially, stopping when the task finishes, fails or encounters an error.
Lack of enterprise support and community feel for troubleshooting.

Chef:

Chef is an open source IaC tool that can run on multiple platforms (Windows, Linux, AWS, Azure, etc) and uses cookbooks and recipes to define not only your deployment templates, but also your configuration of your environment. Chef uses Ruby DSL, requiring a dedicated set of programming skills to learn the language. Chef requires an infrastructure to run on, so that is a consideration when looking at it, there is a licensing and infrastructure cost associated to this. This also means that Chef runs on a dedicated environment, requiring an agent on every machine that you are deploying to.

Due to the fact that Chef requires a lot of other considerations outside of just the capability of the product I am going to list the pros and cons, it very much requires much more consideration outside of just infrastructure as code.

Pros:

Scalable, easily handles a large infrastructure

Extensive collections of configuration and module recipes

Cons:

Requirement to learn Ruby, be ready for a steep learning curve
Complexity and overhead management, difficult to install

Puppet

Puppet and Chef often get roped together when comparing IaC as they’ve both been around for some time. Puppet uses its own declarative language to deploy and maintain system configuration, it uses manifests and modules in the form of PuppetDSL.

Puppet also requires an infrastructure to run on, deploying agents on every machine that you are deploying and managing. As Puppet also requires a lot of other considerations outside of just the capability of the product, it’s not one that is as popular in Azure when there are more cost-effective options.

Pros:

Scalable, easily handles a large infrastructure
Well-established support community
Powerful reporting capabilities

Cons:

Requirement to learn PuppetDSL
Complexity and overhead management, difficult to install

In Summary

Choosing an Infrastructure as Code tool is decision that requires thought, along with comparing the pros and cons for every organization. There is no one-size-fits-all solution for anyone nor any company. Take your time, read through the options and find the best solution for you. Once you choose your preferred IaC tool, make sure you start looking at how to automate not only your infrastructure, but also your delivery process with a solid continuous integration/continuous delivery (CI/CD) tool.

Happy coding!

Updated Mar 24, 2022

Version 3.0

April Edwards

azure

AprilYoho

Microsoft

Joined September 24, 2018

View Profile

ITOps Talk Blog

Follow this blog board to get notified when there's new activity

17 Comments

AprilYoho
Microsoft
Feb 01, 2023
Aaron_Cutchin your point is valid and I think my wording is incorrect. Terraform does in fact deploy it properly, it's the Azure API that is unreliable. I so so often have to rely on 'depends_on' for dependencies, especially around networking. Azure itself requires networking to be deployed before the VM, if you do a deployment with Terraform you can obtain inconsistent deployments. It's happened to me on SO many projects. This is down to the way Azure handles infra builds. Even deploying a simple VM with specific networking requirements can cause a failure in deployment.

I understand you work in AWS, but from an Azure standpoint there are very Azure specific issues we have to face, that aren't always down to the design. I wrote a further blog about 'depends_on' because it's been a requirement in a lot of projects that I deploy onto Azure. https://azapril.dev/2020/05/12/terraform-depends_on/
Aaron_Cutchin
Copper Contributor
Jan 31, 2023
CloudMakerCEO :

All bets are off if you are dealing with poorly supported software. I have found Terraform providers to be very well supported, and bugs are fixed quickly, but I have never used an Azure provider. The Terraform AWS provider handles dependencies between CloudFront distributions and their related resources (cache policies, for example) without manual configuration.

As I understand it, modern microservice-oriented architecture should obviate the need for specific VM build or service deployment order. However, there are edge cases and legacy architectures, so sometimes you might need to manually create a dependency. In the thousands of resources we manage with Terraform, I think there are a small handful of such cases. I make heavy use of Terraform's "-target=" option to manage groups of resources in Terraform modules independently.

Terraform states can indeed become large and slow. If they do, they should be broken up, which requires more complexity and different tooling. However, we have not experienced difficulty "managing" them. We put them in S3, keep confidential secrets out of them, and they just work for years, even with frequent additions and changes and Terraform and provider version updates.

I have never encountered a circular dependency. It seems to me that if you did, you should redesign your infra until it goes away.

If your infrastructure is of any significant size (or significant importance), you should certainly have it designed and managed by an experienced cloud/infra engineer, or software developer with similar experience. I don't see how you could or why you should desire to escape that requirement. I have an intermediate software engineering skillset, and it has never been seriously challenged by the lightweight conditional/iteration syntax available in HCL/Terraform. Pulumi, OTOH & IME, requires a deep understanding of complex programming concepts (such as asynchronous programming) to manage even simple resources.
CloudMakerCEO
Copper Contributor
Jan 31, 2023
Aaron_Cutchin - Until dependencies can't be inferred whether due to a poor resource provider (Origin Groups/Origins in Azure Front Door requiring a manual dependency for instance) or where you have specific build orders for VMs etc. Also circular dependencies can be tricky too as your code evolves. Let's not even get into how complex TF state can become to manage.

IaC is super powerful but in most cases you need to be a very experienced cloud engineer with a software engineering skillset to leverage IaC to its maximum potential. The unicorns of the IT industry.
Aaron_Cutchin
Copper Contributor
Jan 31, 2023
Your second "Con" point under the Terraform section is false.

"Declarative languages require the use of dependency mapping when deploying (example: deploying a VM without networking first, will error out)"

Dependency mapping is automatic in Terraform and does not need to be manually configured. For example, deploying a VM (EC2 instance) requires assigning the VM to a VPC subnet. Presuming the subnet is configured in the same Terraform state as the VM, Terraform will implicitly create the subnet before creating the VM, or delete the VM before deleting the subnet, even if these resources are configured in different modules.

This extends even into very complex environments with thousands of resources. Terraform will automatically figure out the entire dependency tree (you can even print it out), and create|modify|delete resources in the order required.
rbrownSQLBuild
Copper Contributor
Aug 11, 2022
Thanks AprilYoho I think IaC tools are currently in their infancy, it has only been a few years really since they have taken off and I agree the initial push is to get people to just use it. I think there are 2 areas where all the current IaC tools lack:

1. there is a huge market for IaC tools for people who don't have the skill or time to code, low code or even nocode versions of IaC. For large enterprises that have developer teams its easy, but even for small companies that have engineers who could do some code. I call it the need for Infrastructure by Description. If you aren't a coder then you want to use yaml or sql or maybe appconfig or something else other than code to describe your infrastructure. Things like Terraform, Bicep, etc are great, but they still have a pretty big learning curve. For many they just dont have the time or capacity to take on these new skills.

2. people don't just want tools that do exactly they say, because then you need to know all the details to tell it. Many people want the IaC tools to be more intelligent, they want inherent knowledge in the product. For example, people don't want a method in an Iac tool to deploy a SQL box of a certain version, they want a tool that deploys a SQL box which is installed and configured to industry standards, may CIS security levels or another level. (That maybe 100 different configuration changes and settings). If you arent a DBA the only option is a vanilla install.

I think over the years a new breed of lowcode/nocode IaC tool will emerge that contain a lot of built in knowledge to ease cloud deployment and migrations.
AprilYoho
Microsoft
Aug 11, 2022
Hi rbrownSQLBuild, I appreciate the feedback. The article focused mainly on deployment and the types of IaC available. Configuration is huge, the first is getting folks to deploy using IaC and 'where to start', which is usually the biggest hurdle. I'd absolutely write a separate article about maintaining state and configuration. Maintaining configuration (and especially data) is a whole other element. Your article on database maintenance is great, keep it going 🙂

Yes, there is an expectation to write code. Looking at most sysadmins and ITPro's there are huge use cases for using Bash and PowerShell, so whenever there is scope to use those it helps. But yes, we're back to everything as code 🙂
rbrownSQLBuild
Copper Contributor
Aug 11, 2022
This is a great article, however provisioning and configuration is a big part of deployment, depending on who you talk too, some people include CaC as part of IaC, others treat it separately. Shame this wasn't included. Often many tools are used together to achieve both, e.g. Terraform and Ansible or Bicep and DAC. Over my many years in this area, I just found that there were no real products for non developers and all the tools expect a level of coding knowledge. Also databases are just different and I found tools didn't really support the type of database activities I needed in post deployment. So for those 2 main reasons and a few others, I decided to write something, it is work in progress (it will never be finished) project can be found at https://sqlbuild.com and I welcome any feedback/requests.
alexandrenedelec
Brass Contributor
May 08, 2022
Nice article. Interesting to give the pros and cons of each solution. For those hesitating between Terraform and Pulumi, I recently wrote an article "Why will I choose Pulumi over Terraform for my next project" https://www.techwatching.dev/posts/pulumi-vs-terraform.
AprilYoho
Microsoft
Mar 16, 2022
erudinsky Many of the other tools offer a great deal of other features. Bicep is an awesome tool and Bicep is still being developed into, so there are more features to come. Other tools offer a greater ability to test your code before deploying. Bicep does have linting capability, there isn't a way to test what you're deploying, before it deploys. It is also Azure specific, so it means learning a language that won't translate to another cloud or potentially another tool. They have published a list of known limitations: https://github.com/Azure/bicep#known-limitations. It's a great tool, it will be exciting to watch it develop over the next few years
erudinsky
MCT
Mar 16, 2022
Great article AprilYoho! Is it possible to share "Limitations still exist in its capability compared to other tools"?