Terraform & Azure DevOps – configuration of Jobs & Tasks

With this post, we continue & finish the series on managing Azure infrastructure with Terraform, in detail now the steps – among others – for implementation of security scanners, approval gates and the resource deployments. After covering the fundamentals you need to consider in general for an Azure DevOps pipeline in the last blog, we here show the concrete tasks and job configurations to run the pipeline. As a recap, there three main jobs in our demo:

  • Validation
  • Approval gate
  • Deployment

Pipeline Jobs and Tasks:

Job Terraform prepare

The pipeline job for preparation involves six steps. The first task TerraformInstaller from the DevOps Extension would not be absolutely necessary for managed agents, Terraform is already installed on them. However, if you want to ensure that you always use the latest version, it makes sense. The update may not yet be installed on the agent images. At this point, it is important not to mix several jobs like in the example. Otherwise, the first job will work with version 1.6.4 and the second one will fail with an unupdated 1.6.3.

The next two tasks are used to check the Terraform configuration for security vulnerabilities, one of which would be that the http protocol is still allowed for storage as in the example. This is done using Prisma Cloud’s Checkov scanner (formerly Bridgecrew), a common practice in the field of static code analysis. Checkov checks the Terraform source code based on stored best practices and security specifications of international standards. The scanner can be individually configured with some options and can also process ARM templates, for example. The output of a TF plan in JSON format could also be checked.

To execute it, a Docker run command is executed using a simple Bash script, a finished Checkov image is loaded from Docker Hub[1] and applied to the working directory of the agent VM with the code. The definition of the output in junitxml format and the following PublishTestResults  task then ensure that the test results of the scan are published in the “Tests” tab of Azure DevOps. Among other things, the recommendation to enable only https can then be seen. It is plausible to outsource this step to a pull request pipeline or to use it there additionally. Basically, of course, in the spirit of the shifting left paradigm, this should be done as early as possible. The later a security vulnerability is discovered, the more complex and expensive it becomes to close it.

In terms of the flow in our pipeline, it makes sense to do the check only after the Terraform Init – then loaded external modules can also be checked. If Checkov tests are not successful, depending on the configuration of the DevOps task, the pipeline fails completely or the result is used for audit purposes. As an alternative to the Docker image, Checkov can also be installed directly and executed with commands in the pipeline[2].

You can configure Checkov quite individually, choosing only picked configurations scan or skip some of the available policies.

Checkov test results overview in DevOps:

Terraform is then initialized. For the commands such as init, plan, etc., the TerraformCLI task, also from the Marketplace extension, is used. The code directory, the service connection and the remote backend configuration for the statefile are required as inputs so that it can be created or edited by the pipeline. The plan step is carried out below. Again, we need directory and service connection as well as the variable values as input, which do not have a standard value. We’ll go into more detail about the options for variable inclusion later. Furthermore  , the commandOptions of the task can be used to pass the out parameter with file name so that the plan can be saved separately and referenced later. A speculative plan without this step is not advisable in the DevOps context to avoid unwanted surprises. The publishPlanResults attribute can be used to address another tab (Terraform Plan) in the Azure DevOps pipeline overview so that the planned changes can be easily viewed.

The last three tasks in this job are used to prepare for the next job, the Approval Gate, and to secure artifacts. The background is the following: One result of the plan step may well be that resources are to be deleted. This is often not done intentionally and it is definitely important to record this. There are approaches that then stop the deployment or pipeline directly. However, since it can also be intentional, it is advisable to install an intermediate step in the form of a gate, where the continuation of the pipeline must be confirmed manually. To do this, we first need the show command  in a TerraformCLI task with the indication of where the plan file is located. This sets the environment variable TERRAFORM_PLAN_HAS_DESTROY_CHANGES the pipeline to true if the task finds a destroy operation in the plan. The next step, again a simple bash script, is needed to have the variable available across jobs. Without the addition isOutput=true this would not be the case[3] and in the gate we want to use the value.

For the same reason, a publish task is then used to save the created plan file as an artifact and to be able to load it again later. Otherwise, it would be lost on managed agents, as the job is finished after that.

Job Approval Gate

Now let’s move on to the next job, which represents the gate and actually only contains one step. It is a ManualValidation task[4] with the purpose of stopping the pipeline on a terraform destroy and notifying users. This can then either be confirmed or canceled. The dependsOn regulates that the completion of the first job is waited for, otherwise there would be parallelism in the job execution, which would be wrong here. We also use a condition to check the destroy variable. If this is false, the pipeline is simply allowed to continue running. The minute timeout ensures that the task does not run forever if no one takes care of it. A small detail of this job ensures that several jobs and additional tasks in the pipeline are needed to pass on intermediate results: The ManualValidation task does not run on an agent like the others, but on an Azure DevOps Server, and it cannot be mixed within a job. Without the gate, it would only take a job, not tasks such as publishing the plan.

Job Deploy

The third and last job is dedicated to the actual deployment of the Terraform infrastructure. Due to the job changes, a little more is needed here, as the agent is running on a new VM again. The dependencies on the previous jobs are configured again, as well as the conditions of the gate, which was either manually continued and skipped. We get the published plan file of the pipeline from the preparation job via download task and CopyFiles pushes the artifact into the working directory of the agent. For the consistency of the Terraform version, there is again an installation step. The initiation of Terraform must also be repeated with the same configuration as in the first job so that appropriate provider modules etc. are available. Finally, the resources are deployed via another TerraformCLI task and terraform apply. In addition to ServiceConnection and directory, all you have to do is specify that deployment is based on the plan file. This means that the (hopefully green) pipeline is ready. The complete sample pipeline can be viewed as a yaml file at[5] .

Variable options for the pipeline

Variables and parameters are a central part of DevOps pipelines and can be used in various forms for Terraform deployments. The TF variables must be implemented in the HCL files as known and the question arises as to how the values can be entered. As a fallback, of course, default values continued to work in the Terraform code. On the one hand, from the outside, it works basically exactly like in a local development: Within the plan step, individual variable values or the path to a tfvars file can be passed as parameters of the command, or the file follows the auto-convention. The order of the evaluation is as known. In our case, the transfer is again done via the commandOptions of the task:

A somewhat special construct also works in DevOps pipelines if the naming is right: If environment variables of the agents follow the pattern: TF_VAR_<VARIABLENAME> the  Terraform Engine can inject them directly and without any concrete assignment, the Terraform variable itself only has the actual name in the HCL. This can be done, for example, in the yaml file via the env: command or via the graphical pipeline editor in DevOps as a variable declaration. But be careful, variables written in the editor are later environment variables with capital letters – the Terraform variable would also have to be capitalized in this case for the injection to work.

Testing with Terratest

There are also ways to test the created infrastructure by code, e.g. with Terratest from Gruntwork. This is a testing framework written in Go. In principle, the well-known steps from init to destroy are run through and, for example, after the apply step, some of the helper methods are used, compiled in _test.go files. Similar to unit tests, conditions or results are checked with asserts. Be it the presence of a resource group, a specific property or a simulating http call – tests can be defined for all of this. Gruntwork has a large module library of test methods for this purpose[6] , not only for Azure.

In the pipeline, all you really need is a simple Bash task, Go packages for the executions are now also installed on the managed agents.

With regard to the benefits or applications found in practice, the added value is rather limited from my own experience. If you want additional automated security about the deployed infrastructure and its configuration, Terratest is a comprehensive framework for support.

Small pitfalls

Finally, there is a short description of two events that occur from time to time and may raise a few question marks.

An occasional error related to variable usage is associated with the fact that the input for a variable has not been set correctly – in this case, the pipeline hangs or runs without progress because the Terraform Engine is waiting for the input via console. At least with managed agents, which are headless, however, this is not possible and usually not wanted. The pipeline must then be aborted and the variable assignment fixed.

The other problem is related to statefile handling. To prevent several deployments from writing to the file at the same time, it is automatically locked in the case of the remote backend variant in Azure Storage. Technically, this works via a blob lease. An error message “Error acquiring the state lock” due to a failed lock attempt can therefore be normal and correct on the one hand if deployments are close in time. The “first” deployment should then automatically release the statefile again. However, this state can also occur if there are connection problems in the request between Terraform CLI and the file. Then, for example, the pipeline fails and the lock on the statefile remains. As a rule, however, this is quickly remedied by manually resetting the lease on the file, e.g. via the Azure portal in the corresponding storage container.

Summary

For IaC provisioning with Terraform, there is also no way around automation through CI/CD pipelines. To a certain extent, it is legitimate to test and deploy the HCL configuration locally. However, when a project is at an advanced stage, the deployments of resources should be done through pipelines, whether via Azure DevOps, Github Actions or other representatives. Above all, this increases the security and reliability of the environments, even if it takes a little more time.

Azure DevOps Pipelines provide a mature way to automate infrastructure provisioning with Terraform. DevOps is very much represented in the enterprise context and will most likely continue to be so in the future. This blog series – which comes to the end – has used concrete examples and background information to show what such a pipeline can look like, how the individual tasks work and what is needed in the context of Azure DevOps to prepare for it.

Happy Coding!


[1] https://hub.docker.com/r/bridgecrew/checkov

[2] https://www.checkov.io/7.Scan%20Examples/Terraform%20Plan%20Scanning.html

[3] https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-variables-scripts?view=azure-devops&tabs=bash

[4] https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/manual-validation-v0?view=azure-pipelines

[5]  https://github.com/thomash0815/wd-tfseries-v2

[6] https://pkg.go.dev/github.com/gruntwork-io/terratest/modules/azure

Leave a comment

Your email address will not be published. Required fields are marked *