Terraform: A gentle Introduction

By Sebastian Günther

—

11th May, 2023

—

Terraform is an infrastructure configuration language. It supports the declarative, stateful definition of abstractions ranging from compute resources, network components, server configuration, and user accounts and permissions. With a wide range of provisioners - environments and scripts that facilitate the creation of these abstraction - it has a strong position in DevOps operations.

This article is starting an introduction to Terraform blog series. In this post, you will learn about Terraforms role in infrastructure as code tools, then see how to install it and create a project. The remainder of the article is about the core constructs of Terraform projects: providers, resources, and data management.

The technical context for this article is Terraform v1.4.6, but it is applicable to newer versions as well.

Infrastructure as Code Tools Overview

Infrastructure as Code tools can be categorized into the following groups:

Infrastructure Provisioning: Provisioning of infrastructure encompasses the initial creation and configuration of infrastructure objects, including bare-metal server or VMs, networks, storage, and specialized domains such as databases, firewall and other appliances. Example tools are Terraform and CloudFormation
Configuration Management: Server operating systems and their applications need to be configured properly to ensure that they work correctly. Configuration management tools provide all capabailites to coherently and realible setup and reconfigure systems, system software, applications and containereized applications. This is provided by Tools such as Ansible, Puppet, Chef, and SaltStack. Typically, these tools try to provide idempotency, which means whatever the current state of the system/application is, it will be converged to the defined state.
Application Containers: Fully functional applications that include all required libraries or dependencies. Tools include Docker, Packer, Vagrant. Like the infrastructure provisioning tools, these also try to provide immutability of the provide infrastructure.

Terraform Tool & Installation

Terraform is an infrastructure provisioning tool at its core, yet it offers some capabilities for configuration management and application deployment as well.

Terraform uses the Hashicorp Configuration Language (HCL). This language is declarative in nature, and it provides types, numerical and boolean expressions, as well as string, mathematical, and application-specific functions (filesystem access, IP, crypto).

To start using Terraform, you just need a single binary compatible with your operating system. The central download page lists all the options.

On a Linux system, just execute the following commands to download the binary and copy it to a directory that is access when running executable files:

cd ~
wget https://releases.hashicorp.com/terraform/1.3.1/terraform_1.3.1_linux_amd64.zip
unzip terraform_1.3.1_linux_amd64.zip
chmod +x terraform
mv terraform /usr/local/bin
export PATH=$PATH:/usr/local/bin

Now you can invoke terraform and start exploring its operations.

Project Initialization

When you invoke terraform CMD in a directory, it will consume all *.tf files, and then apply the provided command. So technically, you can include everything in just one file, but its better to separate aspects of a Terraform project into different files.

I recommend to create a basic, best practiced based file/directory layout that distinguishes all terraform entities by their files. Run the following command:

$> mkdir terraform
$> tffiles=('terraform' 'variables' 'main' 'data' 'outputs'); for file in "${tffiles[@]}" ; do touch terraform/"$file".tf; done

These files serve the following purpose:

terraform.tf: Core configuration for a Terraform projects, including which provider versions to use and their initial configuration.
variables.tf: Variables represent any information that you repeatedly reference in other resource files. Include them at a central place to keep having an overview.
main.tf: Contains the resources, the managed abstractions, that are defined and configured.
data.tf: This file holds external data sources that should be included in the state.
outputs.tf: When resources are created, updated, or deleted, and this information needs to be accessible from non-Terraform programs, you can persist them here.

To better understand these individual files, the following section give brief examples for creating AWS compute instances.

Providers

Providers implement abstractions from different domains, like managing computer resources, user accounts and access rights, as well as for different environments, like cloud computing in Google or Amazon. Technically, a provider encapsulates an upstream API and makes it accessible for the Terraform configuration language. Using Terraform commands to manage resource translates to several API calls. And this is one of the strengths of Terraform: You do not need to remember the details of an API, but use a uniform interface, the configuration language.

There are a multitude of providers for all kinds of resources and providers, including cloud computing resources (AWS, Google, Azure), network resource management (DNS, certificates), user management (user accounts, policies, SSH keys) and much more. All providers are documented at registry.terraform.io. They are categorized into three distinct group: official, verified and community.

When you want to use a provider, you actually do not need any configuration at all: Terraform will detect which providers you use and download the latest version. However, it is best practice to use version pinning in the terraform block as shown here:

terraform {
  required_privders {
    aws {
      source = "hashicorp/aws"
      version = "~> 4.33.0"
    }
  }
}

Version specification follows a semantic versioning scheme. Following constraints can be expressed:

=4.29.0 Exact match
!=4.29.0 Exclude match
> 4.29.0 - range match, prefer newer version
~> 4.29.0 - pessimistic constraint, use 4.29.0 and patch versions
> 4.29.0, >= 4.33.0 - combined range match 4.30.0 and 4.33

In addition to the version specification, you need to configure the provider itself. For the running example of using AWS, we need to define the region in which resources should be created.

provider "aws" {
  region = "us-east-1"
  access_key = "REDACTED"
  secret_key = "REDACTED"
}

Variables

When you manage a large number of resources, repeating the same value over and over in different places is tedious and makes changes brittle. To avoid this, you can use variables.

Variables, more specifically input variables, can be of three basic types string, number, bool, or of collection types list, map, set, object, tuple.

The following examples show a variable of type string to store the value of a concrete AMI, and an object with three fields that categorize the server size dependent on the workload that the server will process.

variable "debian11-ami" {
    default = "ami-06e2e69dffd648fa2"
    type = string
}

variable "workload-type" {
    type = object
    default = {
        small = "t2.small"
        medium = "t2.medium"
        large = "t2.large"
    }

    description = "Default EC2 Instance categories"
}

Variables are addressed inside resources with reference expressions. To reference a variable inside a configuration block, you use the notation var.VARIBLE_NAME for a value assignment, or you can embed the reference inside a string as an interpolation expression like this: tag = "This AMI ID is ${var.debian11-ami}".

Here is the definition of an AWS server resource that uses variables for its values:

resource "aws_instance" "controller" {
  ami = var.debian11-ami
  instance_type = var.workload-type.medium
}

Resources

Resources are concrete, managed entities that represent infrastructure objects: servers, cloud computing services (IP, DNS, storage), applications (database, webserver), and configuration files.

Resources are defined with the syntax <resource> <provider_resourcetype> <name> { <configuration_block> }.

Here is an example for creating an AWS server instance.

resource "aws_instance" "controller" {
  ami = "ami-06e2e69dffd648fa2"
  instance_type = "t2.medium"
}

Configuration blocks define the properties of a resource. Syntactically, they are expressed as attributes and values. Name and meaning of each attribute is entirely specific to the particular resource type that you are creating. All available options are contained in the providers documentation, for example the AWS instance.

In configuration blocks, many meta arguments are available, such as depends_on for declaring explicit dependencies, or lifecycle to influence the resources behavior during creation, updates and deletes. These more advanced topics will be covered in a future article.

Data Sources

Data sources are used to read data values from managed infrastructure objects.

Their syntax is similar to resources: <data> <provider_resourcetype> <name> { <configuration_block> }. Similar to resources, the concrete name and meaning of each data attribute is dependent on the provider. For example, the documentation for an AWS key pair show that the objects has a name, a key type, and a public key.

The following expression shows how to define a data object for an AWS key pair, and then extract its public key to an output variable.

data "aws_key_pair" "server_key" {
    key_name = "primary-key"
}

output "aws_key_public_key" {
  value = aws_key_pair.server_key.public_key
}

In cases where many data sources exist, and you need only a very specific one, you can use the filter option:

data "aws_key_pair" "server-key" {
   filter {
      name = "tag:project"
      values = "[primary-key]"
   }
}

Output

The final Terraform concept in this article are output, or more specifically output variable. When a Terraform commands creates or updates information, output variables can be used to capture object attributes for various reasons: To capture operation execution similar to a log file, to reference specific information, or for aliasing data.

Output variables are defined with the syntax output NAME. Being variables themselves, you can use them just as you would use variables in value assignments or interpolation expression.

For example, once the AWS server instance is created, you record its IP address:

output "controller_public_ip" {
    type = string
    value = aws_instance.controller.public_ip
}

Outputs will be printed separately during plan or apply, or by running terraform output.

Conclusion

Terraform is a stateful infrastructure configuration language supporting a wide range of providers for different computing environments such as concrete servers, storage buckets or databases, user accounts, certificates, and application configuration. With Terraform, you provide declarative specifications of resources in the Terraform configuration language. Applying a Terraform command means that these expressions will be turned into API calls for the specific cloud provider.

This introduction ARY article explained the installation and setup of a Terraform project as well as the core concepts: a) providers implement abstractions from different domains to work with, b) variables eliminate the tedious repetition of the same information over and over again, c) resources are abstractions over infrastructure objects that you want to provision, d) data sources reference upstream information of objects. and e) output variables capture specific resource attributes as dedicated data.

Previous: Raspberry Pi Tank Kit: Assembly & Installation on Ubuntu Server 20.04

Next: Terraform: Standard Workflow and State Facilitation