How to Manage Infrastructure as Code with Terraform
How to Manage Infrastructure as Code with Terraform
Manual infrastructure management breaks at scale. You document your server configuration in a wiki, but the wiki is outdated within a week because three people made emergency changes that never got documented. You try to replicate production for a staging environment, but subtle differences creep in—different security group rules, slightly different instance types, a database parameter that someone changed months ago and nobody remembers why.
Infrastructure as Code solves this by treating infrastructure like software: you write configuration files describing your desired state, version them in git, review changes through pull requests, and automatically apply them to create actual infrastructure. Terraform is the tool that makes this practical across multiple cloud providers. This article walks through the complete Terraform workflow from writing your first configuration through managing multi-environment production infrastructure.
You'll learn Terraform's core concepts (resources, providers, state), the workflow commands (init, plan, apply, destroy), how to structure projects for real-world use, and the patterns that prevent the most common operational issues. These aren't abstract examples—they're configurations you can deploy today.
Why Terraform for Infrastructure as Code
Infrastructure as Code isn't a new concept—tools like AWS CloudFormation and Azure Resource Manager existed before Terraform. Terraform's advantage is multi-cloud support through a provider plugin system. You write configuration in HCL (HashiCorp Configuration Language), and providers translate that into API calls for AWS, Azure, GCP, Kubernetes, and hundreds of other platforms.
This matters more than it initially sounds. Modern applications rarely use one cloud provider exclusively. Your core application runs on AWS, but you use Cloudflare for DNS, DataDog for monitoring, and PagerDuty for on-call management. CloudFormation can't configure Cloudflare DNS records. Terraform manages all of these from one tool with one state file, giving you a complete view of your infrastructure.
The second advantage is declarative configuration. You describe what you want (a VPC with these CIDR blocks, an RDS instance with these parameters), not how to create it. Terraform figures out the steps: create the VPC, wait for it to be ready, create subnets in the VPC, create the RDS instance in those subnets. If resources already exist, Terraform updates them to match your configuration rather than trying to create duplicates.
Core Terraform Concepts
Terraform operates on three core concepts: configuration (what you want), state (what you have), and the provider (how to create what you want). Understanding these concepts explains every Terraform behavior.
Configuration: Defining Desired State
Terraform configuration files use HCL, a language designed for infrastructure. Here's a complete example that creates an EC2 instance:
# main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
}
}
This configuration defines one resource: an EC2 instance of type t3.micro using a specific AMI. The resource type is aws_instance (from the AWS provider), and we've named this specific instance "web" for reference within Terraform. The tags block adds metadata visible in the AWS console.
State: Tracking Real Resources
When you run terraform apply, Terraform creates the EC2 instance and records its details in a state file (terraform.tfstate). This file maps the resource "aws_instance" "web" in your configuration to the actual instance ID (i-1234567890abcdef0) in AWS.
State is critical because Terraform must know what it created previously to calculate what changes are needed. If you change instance_type from t3.micro to t3.small, Terraform compares configuration to state, sees the instance type differs, and modifies the existing instance rather than creating a new one. Without state, Terraform would try to create a duplicate instance on every apply.
Providers: Resource Creation Logic
Providers are plugins that know how to create, read, update, and delete resources on specific platforms. The AWS provider translates aws_instance resources into AWS EC2 API calls. The Kubernetes provider translates kubernetes_deployment resources into Kubernetes API calls. Providers handle authentication, API versioning, rate limiting, and retry logic.
# Using multiple providers in one configuration
provider "aws" {
region = "us-east-1"
}
provider "cloudflare" {
api_token = var.cloudflare_token
}
# Create AWS resources
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
}
# Configure Cloudflare DNS pointing to AWS instance
resource "cloudflare_record" "web" {
zone_id = var.cloudflare_zone_id
name = "www"
value = aws_instance.web.public_ip
type = "A"
}
This configuration uses two providers simultaneously. The Cloudflare DNS record references the AWS instance's public IP through aws_instance.web.public_ip, creating a dependency—Terraform must create the AWS instance before the Cloudflare record because the record needs the IP address.
The Terraform Workflow
Terraform operations follow a consistent workflow: initialize (download providers), plan (show what will change), apply (make changes), and optionally destroy (delete everything). This workflow is the same whether you're managing one resource or thousands.
Step 1: Initialize
# terraform init downloads providers and prepares the working directory
terraform init
# Output shows provider installation
Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
- Installing hashicorp/aws v5.31.0...
- Installed hashicorp/aws v5.31.0
Terraform has been successfully initialized!
terraform init downloads the AWS provider plugin (and any other required providers) and creates a .terraform directory containing the provider binaries. Run init once when setting up a new project and again whenever you add new providers or change provider versions. It's safe to run init repeatedly—it only downloads what's missing.
Step 2: Plan
# terraform plan shows what will change without making changes
terraform plan
# Output shows proposed changes
Terraform will perform the following actions:
# aws_instance.web will be created
+ resource "aws_instance" "web" {
+ ami = "ami-0c55b159cbfafe1f0"
+ instance_type = "t3.micro"
+ id = (known after apply)
+ public_ip = (known after apply)
# ... more attributes
}
Plan: 1 to add, 0 to change, 0 to destroy.
Plan output uses symbols to indicate actions: + means create, - means delete, ~ means modify in-place, -/+ means replace (delete then create). The summary line "Plan: 1 to add, 0 to change, 0 to destroy" gives you the change count at a glance. Always run plan before apply to verify Terraform will do what you expect.
Step 3: Apply
# terraform apply makes the planned changes
terraform apply
# Terraform shows the plan again and asks for confirmation
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
aws_instance.web: Creating...
aws_instance.web: Still creating... [10s elapsed]
aws_instance.web: Creation complete after 45s [id=i-1234567890abcdef0]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Apply creates the actual infrastructure. Terraform shows the plan output again (giving you one last chance to review), then waits for confirmation. After you type "yes", Terraform makes API calls to create resources. The process is idempotent—running apply multiple times with unchanged configuration makes no changes.
Step 4: Destroy (When Needed)
# terraform destroy deletes all managed resources
terraform destroy
# Shows what will be deleted
Terraform will perform the following actions:
# aws_instance.web will be destroyed
- resource "aws_instance" "web" {
- ami = "ami-0c55b159cbfafe1f0"
- instance_type = "t3.micro"
# ...
}
Plan: 0 to add, 0 to change, 1 to destroy.
# Type 'yes' to confirm deletion
Destroy deletes all resources in your state file. This is useful for temporary environments (CI test environments, demo systems) but dangerous for production. Many teams prevent accidental destruction by requiring additional flags or disabling destroy entirely in CI/CD pipelines.
Variables and Outputs
Hardcoding values in configuration files makes them rigid. Variables parameterize configuration, allowing the same code to deploy to multiple environments with different settings. Outputs expose information about created resources for use by other systems or Terraform modules.
Input Variables
# variables.tf
variable "instance_type" {
type = string
description = "EC2 instance type"
default = "t3.micro"
}
variable "environment" {
type = string
description = "Environment name (dev, staging, production)"
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
# main.tf
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = var.instance_type
tags = {
Name = "${var.environment}-web-server"
Environment = var.environment
}
}
# terraform.tfvars (variable values)
instance_type = "t3.small"
environment = "production"
Variables have types (string, number, bool, list, map), optional defaults, and validation rules. The validation block prevents invalid values—trying to set environment to "prod" fails with a clear error message. Variable values come from .tfvars files, environment variables (TF_VAR_instance_type), or command-line flags (-var instance_type=t3.small).
Output Values
# outputs.tf
output "instance_id" {
value = aws_instance.web.id
description = "EC2 instance ID"
}
output "public_ip" {
value = aws_instance.web.public_ip
description = "Public IP address"
}
output "private_ip" {
value = aws_instance.web.private_ip
description = "Private IP address"
sensitive = true
}
# After terraform apply, outputs are displayed
Outputs:
instance_id = "i-1234567890abcdef0"
public_ip = "203.0.113.42"
private_ip =
# Query outputs programmatically
terraform output instance_id
# i-1234567890abcdef0
terraform output -json
# {"instance_id":{"value":"i-1234567890abcdef0"},...}
Outputs make information available after apply completes. Use them to display connection information (IP addresses, DNS names), pass data to other Terraform configurations, or integrate with external automation. The sensitive flag prevents values from appearing in plan/apply output and logs.
Modules: Reusable Infrastructure Components
As configurations grow, duplicating resource definitions becomes tedious and error-prone. Modules package related resources into reusable components. A VPC module might include the VPC, subnets, route tables, and internet gateway—everything needed for a complete network, exposed through a simple interface.
Creating a Module
# modules/web-server/main.tf
variable "instance_type" {
type = string
}
variable "subnet_id" {
type = string
}
variable "name" {
type = string
}
resource "aws_security_group" "web" {
name_prefix = "${var.name}-web-"
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux_2.id
instance_type = var.instance_type
subnet_id = var.subnet_id
vpc_security_group_ids = [aws_security_group.web.id]
tags = {
Name = var.name
}
}
data "aws_ami" "amazon_linux_2" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# modules/web-server/outputs.tf
output "instance_id" {
value = aws_instance.web.id
}
output "public_ip" {
value = aws_instance.web.public_ip
}
Using a Module
# main.tf
module "web_server_1" {
source = "./modules/web-server"
instance_type = "t3.small"
subnet_id = aws_subnet.public_1.id
name = "web-server-1"
}
module "web_server_2" {
source = "./modules/web-server"
instance_type = "t3.small"
subnet_id = aws_subnet.public_2.id
name = "web-server-2"
}
# Reference module outputs
output "web_server_1_ip" {
value = module.web_server_1.public_ip
}
output "web_server_2_ip" {
value = module.web_server_2.public_ip
}
Modules encapsulate complexity. The web-server module handles security group creation, AMI lookup, and instance configuration. Users only provide instance_type, subnet_id, and name. This reduces duplication—both web servers use the same security group configuration without copying the resource definition.
| Module Source | When to Use | Example |
|---|---|---|
| Local path | Modules in same repository | ./modules/vpc |
| Git repository | Shared modules across teams | git::https://github.com/org/modules.git//vpc |
| Terraform Registry | Public community modules | terraform-aws-modules/vpc/aws |
State Management and Collaboration
Local state files work for solo experiments but break when teams collaborate. Two people running terraform apply simultaneously corrupt state by overwriting each other's changes. Remote state with locking solves this by storing state in a shared location (S3, Terraform Cloud) and preventing concurrent modifications.
Configuring Remote State
# backend.tf
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/infrastructure.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
# After adding backend configuration, run:
terraform init
# Terraform prompts to migrate existing local state to S3
Initializing the backend...
Do you want to copy existing state to the new backend?
Enter a value: yes
Successfully configured the backend "s3"!
The S3 backend stores state in a specified bucket and path. The dynamodb_table enables state locking—when someone runs terraform apply, Terraform acquires a lock in DynamoDB, preventing others from running concurrent operations. After the operation completes, Terraform releases the lock. This prevents race conditions that would corrupt state.
State File Organization
Large infrastructures benefit from splitting state across multiple files rather than one monolithic state. Separate state files reduce blast radius (changes to networking don't require loading database state) and enable parallel workflows (networking team and database team can work simultaneously):
# Network infrastructure (backend.tf in network directory)
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/network.tfstate"
region = "us-east-1"
}
}
# Compute infrastructure (backend.tf in compute directory)
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/compute.tfstate"
region = "us-east-1"
}
}
# Data layer (backend.tf in data directory)
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/data.tfstate"
region = "us-east-1"
}
}
Managing Existing Infrastructure
Most teams don't start with empty AWS accounts—you have existing infrastructure created manually or through other tools. Terraform can manage existing resources through the import process, which adds resources to state without modifying them.
Importing Existing Resources
# Step 1: Write configuration matching the existing resource
resource "aws_instance" "legacy_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.small"
tags = {
Name = "legacy-server"
}
}
# Step 2: Import the existing instance into Terraform state
terraform import aws_instance.legacy_server i-1234567890abcdef0
# Step 3: Run plan to verify configuration matches reality
terraform plan
# If plan shows changes, update configuration until plan shows no changes
Importing adds the resource to state but doesn't generate configuration automatically. You must write the resource block yourself, matching the existing resource's attributes. After import, terraform plan shows what Terraform thinks needs to change. Adjust your configuration until plan shows "No changes" - then your configuration accurately describes the existing resource.
Bulk Import with terraform import
For many resources, manually importing each one is tedious. Tools like terraformer can scan your AWS account and generate both configuration and import commands for existing resources:
# Install terraformer
# https://github.com/GoogleCloudPlatform/terraformer
# Generate Terraform configuration from existing AWS resources
terraformer import aws --resources=vpc,subnet,instance --regions=us-east-1
# Terraformer creates:
# - generated/aws/vpc/*.tf (configuration files)
# - terraform.tfstate (state file with imported resources)
Terraformer's generated configuration often needs cleanup—it includes every attribute, including computed values that shouldn't be in configuration. Use it as a starting point, then refactor into properly structured modules.
Version Control and CI/CD Integration
Infrastructure code belongs in version control like application code. Every change goes through git commits, branches, and pull requests. This provides audit trails (who changed what and when), rollback capability (git revert problematic commits), and peer review (catch errors before they reach production).
Git Repository Structure
terraform/
├── .gitignore
├── README.md
├── modules/
│ ├── vpc/
│ ├── compute/
│ └── database/
├── environments/
│ ├── production/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.tf
│ │ └── terraform.tfvars
│ └── development/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── backend.tf
│ └── terraform.tfvars
└── .terraform/ (excluded by .gitignore)
CI/CD Pipeline for Terraform
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
working-directory: ./environments/production
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Plan
run: terraform plan -out=tfplan
working-directory: ./environments/production
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Post Plan to PR
uses: actions/github-script@v7
if: github.event_name == 'pull_request'
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('environments/production/tfplan', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '## Terraform Plan\n```\n' + plan + '\n```'
});
apply:
needs: plan
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
working-directory: ./environments/production
- name: Terraform Apply
run: terraform apply -auto-approve
working-directory: ./environments/production
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
This workflow runs terraform plan on every pull request and posts the plan as a comment, giving reviewers visibility into infrastructure changes. On merge to main, it automatically applies changes to production. For extra safety, add a manual approval step before apply using GitHub Environments.
Common Terraform Patterns
Certain patterns appear repeatedly in production Terraform configurations. These solve common problems around resource dependencies, conditional resources, and dynamic configuration.
Conditional Resource Creation
# Create NAT gateway only in production
resource "aws_nat_gateway" "main" {
count = var.environment == "production" ? 1 : 0
allocation_id = aws_eip.nat[0].id
subnet_id = aws_subnet.public[0].id
}
# Reference conditionally created resources
resource "aws_route" "private" {
count = var.environment == "production" ? 1 : 0
route_table_id = aws_route_table.private.id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[0].id
}
The count parameter set to 0 or 1 based on a condition creates or skips resources. This pattern lets you share configuration across environments while adjusting resource presence (NAT gateways in production but not staging) or quantity (3 instances in production, 1 in staging).
Dynamic Blocks for Repetitive Configuration
# Without dynamic blocks (repetitive)
resource "aws_security_group" "web" {
name = "web"
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
# With dynamic blocks (cleaner)
variable "ingress_ports" {
type = list(number)
default = [80, 443]
}
resource "aws_security_group" "web" {
name = "web"
dynamic "ingress" {
for_each = var.ingress_ports
content {
from_port = ingress.value
to_port = ingress.value
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
}
Dynamic blocks generate multiple nested blocks from a list or map. This reduces repetition when you need similar blocks with slight variations—multiple ingress rules, multiple environment variables, multiple volume mounts.
FAQ
How does Terraform compare to CloudFormation?
CloudFormation is AWS-specific and uses JSON/YAML templates. Terraform supports multiple cloud providers and uses HCL, which is more readable and expressive. CloudFormation integrates deeply with AWS services (StackSets, Change Sets), while Terraform's strength is multi-cloud environments. For AWS-only infrastructure, both work well; for multi-cloud, Terraform is the clear choice.
What happens if I delete the state file?
Losing state means Terraform forgets about all resources it created. Running terraform apply would try to create duplicates. Always use remote state with versioning (S3 with versioning enabled) to prevent state loss. If state is lost, you can import existing resources back into state, but this is time-consuming.
Can I use Terraform with existing infrastructure?
Yes, through the terraform import command. Import adds existing resources to state without modifying them. You write configuration matching the existing resources, import them into state, then Terraform manages them going forward. Tools like terraformer can automate bulk imports.
How do I handle secrets in Terraform?
Never commit secrets to version control. Use encrypted .tfvars files (gitignored), environment variables, or pull secrets from a secret manager (AWS Secrets Manager, Vault). Mark variables as sensitive = true to prevent Terraform from printing them in logs. For production, integrate with your organization's secret management system.
What's the difference between terraform plan and terraform apply?
Plan shows what would change without making changes—it's a dry run. Apply makes actual changes. Always run plan first to verify Terraform will do what you expect. You can save plans (terraform plan -out=tfplan) and apply that exact plan (terraform apply tfplan) to ensure what you reviewed is what executes.
How do I manage multiple AWS accounts?
Use separate backend configurations (different S3 keys) for each account. Configure the AWS provider with different credentials or profiles per environment. Terraform Cloud/Enterprise provides workspace-level AWS credential management if you use those services.
Can Terraform manage existing resources without downtime?
Usually yes. Terraform detects when changes require resource replacement versus in-place modification. For replacements, Terraform uses create_before_destroy to create the new resource before deleting the old one, minimizing downtime. Some changes (like changing an RDS instance class) happen in-place but may require brief downtime.
How do I roll back a bad Terraform change?
Revert the git commit that introduced the problematic configuration and run terraform apply with the reverted config. Terraform brings infrastructure back to the previous state. For state corruption, restore state from backup (S3 versioning). Unlike some tools, Terraform doesn't have built-in "rollback"—you apply previous configuration.
Should I use modules for everything?
Use modules when you need reusability (same pattern deployed multiple times) or encapsulation (hide complexity). Don't create modules prematurely—extract them when duplication appears, not before. Over-modularization creates unnecessary indirection and makes configurations harder to understand.
How do I test Terraform configurations?
Use terraform validate for syntax checking, terraform fmt for formatting, and terraform plan to see proposed changes. Tools like terraform-compliance test policy compliance, and terratest (Go library) enables automated testing that actually deploys infrastructure. For critical infrastructure, deploy to staging first and verify before production.
Conclusion
Infrastructure as Code with Terraform transforms infrastructure management from manual, error-prone operations to versioned, reviewed, automated deployments. The core workflow—write configuration, run plan, apply changes—stays consistent whether managing one resource or thousands. The patterns that matter most are remote state with locking (prevents state corruption), modules for reuse (reduces duplication), and version control integration (enables review and rollback).
Start small: manage one non-critical component with Terraform before migrating critical infrastructure. Learn the workflow, understand state management, experience the benefits of infrastructure-as-code firsthand. Then expand incrementally—add more resources, introduce modules, implement CI/CD automation. Terraform's value compounds as your infrastructure grows.
The goal isn't perfect infrastructure code on day one. It's infrastructure that's repeatable, reviewable, and recoverable—properties that become more valuable as your systems scale.