Infrastructure as Code
Overview
Infrastructure as Code (IaC) is the process of managing and provisioning computing infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. IaC treats infrastructure with the same rigor as application code, enabling version control, testing, and automated deployment of infrastructure resources.
Understanding Infrastructure as Code
Definition and Principles
Infrastructure as Code involves using code to define, provision, and manage infrastructure resources. Instead of manually configuring servers, networks, and other infrastructure components, IaC allows these resources to be described in configuration files that can be versioned, tested, and deployed automatically.
Core Principles:
- Declarative Approach: Describe the desired end state of infrastructure
- Version Control: Store infrastructure code in version control systems
- Reproducibility: Create identical environments across different stages
- Automation: Automatically provision and configure resources
- Testing: Validate infrastructure changes before deployment
- Idempotency: Operations produce the same result regardless of how many times they're executed
Benefits of IaC:
- Consistency: Eliminate configuration drift between environments
- Efficiency: Reduce manual setup time and human error
- Scalability: Easily replicate and scale infrastructure
- Auditability: Track infrastructure changes over time
- Disaster Recovery: Quickly recreate infrastructure from code
- Cost Management: Better visibility and control over resource costs
IaC vs Traditional Infrastructure Management
Traditional Infrastructure Management:
- Manual Configuration: Servers configured manually through UIs or command-line
- Documentation Drift: Infrastructure documentation often becomes outdated
- Configuration Drift: Environments diverge over time
- Slow Provisioning: Time-consuming manual setup processes
- Limited Reproducibility: Difficult to recreate exact environments
- Human Error: Prone to configuration mistakes and inconsistencies
Infrastructure as Code:
- Code-Defined: Infrastructure defined in version-controlled files
- Automated Provisioning: Resources provisioned automatically
- Environment Parity: Identical environments across development, testing, and production
- Fast Deployment: Rapid infrastructure provisioning and updates
- Version History: Complete audit trail of infrastructure changes
- Reduced Errors: Automated processes minimize human mistakes
IaC Tools and Technologies
Terraform
Terraform is a popular IaC tool that provides infrastructure management across multiple cloud providers and services.
Terraform Configuration Structure:
# main.tf - Main Terraform configuration
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Backend configuration for state management
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "terraform/state"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
# Provider configuration
provider "aws" {
region = var.aws_region
}
# Data sources
data "aws_availability_zones" "available" {}
# Variables
variable "aws_region" {
description = "AWS region for resources"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
default = "dev"
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
# Local values
locals {
common_tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = "MyApplication"
}
}
# Resources
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(local.common_tags, {
Name = "${var.environment}-vpc"
})
}
resource "aws_subnet" "public" {
count = length(data.aws_availability_zones.available.names)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = merge(local.common_tags, {
Name = "${var.environment}-public-subnet-${count.index + 1}"
Type = "Public"
})
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = merge(local.common_tags, {
Name = "${var.environment}-igw"
})
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = merge(local.common_tags, {
Name = "${var.environment}-public-rt"
})
}
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_security_group" "web" {
name_prefix = "${var.environment}-web-sg-"
description = "Security group for web servers"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = local.common_tags
}
# Outputs
output "vpc_id" {
description = "ID of the created VPC"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = aws_subnet.public[*].id
}
output "security_group_id" {
description = "ID of the web security group"
value = aws_security_group.web.id
}Terraform Modules:
# modules/ec2-instance/main.tf
variable "instance_count" {
description = "Number of EC2 instances to create"
type = number
default = 1
}
variable "ami_id" {
description = "AMI ID for the instances"
type = string
}
variable "instance_type" {
description = "Instance type for the EC2 instances"
type = string
default = "t3.micro"
}
variable "key_name" {
description = "Name of the SSH key pair"
type = string
}
variable "subnet_id" {
description = "Subnet ID for the instances"
type = string
}
variable "security_group_ids" {
description = "List of security group IDs"
type = list(string)
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
resource "aws_instance" "web" {
count = var.instance_count
ami = var.ami_id
instance_type = var.instance_type
key_name = var.key_name
subnet_id = var.subnet_id
vpc_security_group_ids = var.security_group_ids
tags = merge(
var.tags,
{
Name = "${var.tags["Environment"]}-web-${count.index + 1}"
}
)
lifecycle {
create_before_destroy = true
}
}
output "instance_ids" {
description = "IDs of the created instances"
value = aws_instance.web[*].id
}
output "public_ips" {
description = "Public IPs of the created instances"
value = aws_instance.web[*].public_ip
}# modules/ec2-instance/variables.tf
# Variables already defined in main.tf for this simple moduleUsing Terraform Modules:
# main.tf - Using the EC2 module
module "web_servers" {
source = "./modules/ec2-instance"
instance_count = 3
ami_id = data.aws_ami.ubuntu.id
instance_type = "t3.small"
key_name = aws_key_pair.deployer.key_name
subnet_id = aws_subnet.public[0].id
security_group_ids = [aws_security_group.web.id]
tags = local.common_tags
}
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
resource "aws_key_pair" "deployer" {
key_name = "${var.environment}-key"
public_key = var.ssh_public_key
tags = local.common_tags
}
variable "ssh_public_key" {
description = "SSH public key for EC2 instances"
type = string
}CloudFormation
AWS CloudFormation provides infrastructure management specifically for AWS resources.
CloudFormation Template:
# cloudformation-template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'VPC with public and private subnets for application deployment'
Parameters:
Environment:
Type: String
Default: 'dev'
AllowedValues:
- 'dev'
- 'staging'
- 'prod'
Description: 'Environment name'
VPCCIDR:
Type: String
Default: '10.0.0.0/16'
Description: 'CIDR block for VPC'
PublicSubnetCIDR:
Type: String
Default: '10.0.1.0/24'
Description: 'CIDR block for public subnet'
PrivateSubnetCIDR:
Type: String
Default: '10.0.2.0/24'
Description: 'CIDR block for private subnet'
InstanceType:
Type: String
Default: 't3.micro'
AllowedValues:
- 't3.micro'
- 't3.small'
- 't3.medium'
Description: 'EC2 instance type'
Mappings:
RegionMap:
us-east-1:
AMI: 'ami-0c02fb55956c7d316' # Ubuntu 22.04 LTS
us-west-2:
AMI: 'ami-0dc2d3e48b0c7b06d' # Ubuntu 22.04 LTS
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VPCCIDR
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: !Sub '${Environment}-vpc'
- Key: Environment
Value: !Ref Environment
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Sub '${Environment}-igw'
- Key: Environment
Value: !Ref Environment
AttachGateway:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
PublicSubnet:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: !Ref PublicSubnetCIDR
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${Environment}-public-subnet'
- Key: Environment
Value: !Ref Environment
RouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub '${Environment}-public-rt'
- Key: Environment
Value: !Ref Environment
DefaultRoute:
Type: AWS::EC2::Route
DependsOn: AttachGateway
Properties:
RouteTableId: !Ref RouteTable
DestinationCidrBlock: '0.0.0.0/0'
GatewayId: !Ref InternetGateway
SubnetRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnet
RouteTableId: !Ref RouteTable
WebSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub '${Environment}-web-sg'
GroupDescription: 'Security group for web servers'
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: '0.0.0.0/0'
Description: 'HTTP'
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: '0.0.0.0/0'
Description: 'HTTPS'
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: '10.0.0.0/8'
Description: 'SSH'
SecurityGroupEgress:
- IpProtocol: '-1'
CidrIp: '0.0.0.0/0'
Description: 'All traffic out'
Tags:
- Key: Name
Value: !Sub '${Environment}-web-sg'
- Key: Environment
Value: !Ref Environment
WebServer:
Type: AWS::EC2::Instance
Properties:
ImageId: !FindInMap [RegionMap, !Ref 'AWS::Region', AMI]
InstanceType: !Ref InstanceType
SubnetId: !Ref PublicSubnet
SecurityGroupIds:
- !Ref WebSecurityGroup
Tags:
- Key: Name
Value: !Sub '${Environment}-web-server'
- Key: Environment
Value: !Ref Environment
UserData:
Fn::Base64: !Sub |
#!/bin/bash
apt-get update
apt-get install -y nginx
systemctl start nginx
systemctl enable nginx
Outputs:
VPCId:
Description: 'VPC ID'
Value: !Ref VPC
Export:
Name: !Sub '${AWS::StackName}-VPCId'
PublicSubnetId:
Description: 'Public Subnet ID'
Value: !Ref PublicSubnet
Export:
Name: !Sub '${AWS::StackName}-PublicSubnetId'
WebServerPublicIP:
Description: 'Web Server Public IP'
Value: !GetAtt WebServer.PublicIp
Export:
Name: !Sub '${AWS::StackName}-WebServerPublicIP'Ansible
Ansible is an automation tool that can be used for infrastructure configuration management.
Ansible Playbook:
# playbook.yml
---
- name: Deploy and configure web application infrastructure
hosts: localhost
connection: local
gather_facts: false
vars:
environment: "{{ env | default('dev') }}"
region: "{{ aws_region | default('us-east-1') }}"
vpc_cidr: "{{ vpc_cidr_block | default('10.0.0.0/16') }}"
tasks:
- name: Create VPC
amazon.aws.ec2_vpc_net:
name: "{{ environment }}-vpc"
cidr_block: "{{ vpc_cidr }}"
region: "{{ region }}"
tags:
Name: "{{ environment }}-vpc"
Environment: "{{ environment }}"
ManagedBy: "Ansible"
register: vpc
- name: Create internet gateway
amazon.aws.ec2_vpc_igw:
vpc_id: "{{ vpc.vpc.id }}"
region: "{{ region }}"
tags:
Name: "{{ environment }}-igw"
Environment: "{{ environment }}"
register: igw
- name: Create public subnet
amazon.aws.ec2_vpc_subnet:
vpc_id: "{{ vpc.vpc.id }}"
cidr: "{{ vpc_cidr | ipsubnet(8, 1) }}"
az: "{{ region }}a"
region: "{{ region }}"
tags:
Name: "{{ environment }}-public-subnet"
Environment: "{{ environment }}"
Type: "Public"
register: public_subnet
- name: Create route table
amazon.aws.ec2_vpc_route_table:
vpc_id: "{{ vpc.vpc.id }}"
region: "{{ region }}"
subnets:
- "{{ public_subnet.subnet.id }}"
routes:
- dest: 0.0.0.0/0
gateway_id: "{{ igw.gateway_id }}"
tags:
Name: "{{ environment }}-public-rt"
Environment: "{{ environment }}"
- name: Create security group
amazon.aws.ec2_security_group:
name: "{{ environment }}-web-sg"
description: "Security group for web servers"
vpc_id: "{{ vpc.vpc.id }}"
region: "{{ region }}"
rules:
- proto: tcp
ports:
- 80
- 443
cidr_ip: 0.0.0.0/0
rule_desc: "HTTP and HTTPS"
- proto: tcp
ports: 22
cidr_ip: "{{ vpc_cidr }}"
rule_desc: "SSH from within VPC"
tags:
Name: "{{ environment }}-web-sg"
Environment: "{{ environment }}"
register: security_group
- name: Launch EC2 instance
amazon.aws.ec2_instance:
name: "{{ environment }}-web-server"
image_id: "{{ ami_id }}"
instance_type: t3.micro
key_name: "{{ ssh_key_name }}"
security_groups:
- "{{ security_group.group_id }}"
vpc_subnet_id: "{{ public_subnet.subnet.id }}"
region: "{{ region }}"
wait: true
tags:
Name: "{{ environment }}-web-server"
Environment: "{{ environment }}"
Role: "WebServer"
register: ec2_instance
- name: Wait for SSH to be available
ansible.builtin.wait_for:
host: "{{ ec2_instance.instances[0].public_ip_address }}"
port: 22
delay: 60
timeout: 300
- name: Configure web server
delegate_to: "{{ ec2_instance.instances[0].public_ip_address }}"
become: yes
block:
- name: Update package cache
ansible.builtin.apt:
update_cache: yes
- name: Install nginx
ansible.builtin.apt:
name: nginx
state: present
- name: Start and enable nginx
ansible.builtin.systemd:
name: nginx
state: started
enabled: yes
- name: Configure firewall
ansible.builtin.ufw:
rule: allow
port: "{{ item }}"
loop:
- 80
- 443
- 22
- name: Enable firewall
ansible.builtin.ufw:
state: enabled
policy: deny
- name: Display instance information
ansible.builtin.debug:
msg: |
Instance launched successfully!
Public IP: {{ ec2_instance.instances[0].public_ip_address }}
Private IP: {{ ec2_instance.instances[0].private_ip_address }}
Instance ID: {{ ec2_instance.instances[0].instance_id }}Ansible Inventory:
# inventory/production.ini
[webservers]
web1 ansible_host=3.123.45.67
web2 ansible_host=3.123.45.68
web3 ansible_host=3.123.45.69
[databases]
db1 ansible_host=10.0.1.10
db2 ansible_host=10.0.1.11
[webservers:vars]
ansible_user=ubuntu
ansible_ssh_private_key_file=~/.ssh/id_rsa
[databases:vars]
ansible_user=ec2-user
ansible_ssh_private_key_file=~/.ssh/id_rsaIaC Best Practices
Version Control for Infrastructure
Git Workflow for Infrastructure Code:
# Example Git workflow for infrastructure changes
# 1. Create a feature branch for infrastructure changes
git checkout -b feature/new-infrastructure-components
# 2. Make infrastructure code changes
# Edit terraform files, cloudformation templates, etc.
# 3. Validate the changes locally
terraform validate
terraform fmt
# 4. Create a detailed commit message
git add .
git commit -m "feat: Add new VPC with public/private subnets
- Create VPC with CIDR 10.0.0.0/16
- Add public and private subnets in multiple AZs
- Configure NAT gateways for private subnet internet access
- Add security groups for web and application tiers
Fixes #1234"
# 5. Push the branch
git push origin feature/new-infrastructure-components
# 6. Create a pull request with detailed description
# Include:
# - What infrastructure is being added/changed
# - Why these changes are needed
# - Impact assessment
# - Rollback plan if neededInfrastructure Code Review Checklist:
# Infrastructure Code Review Checklist
## Security
- [ ] Are security groups properly configured?
- [ ] Are IAM roles and policies least-privilege?
- [ ] Are secrets properly managed (not hardcoded)?
- [ ] Is encryption enabled where needed?
## Performance & Scalability
- [ ] Are resources sized appropriately?
- [ ] Are there auto-scaling configurations?
- [ ] Are load balancers configured correctly?
## Reliability
- [ ] Are there proper health checks?
- [ ] Are backup and recovery procedures defined?
- [ ] Are multi-AZ deployments used where needed?
## Cost Optimization
- [ ] Are resources appropriately sized?
- [ ] Are reserved instances considered?
- [ ] Are unused resources cleaned up?
## Documentation
- [ ] Are variables properly documented?
- [ ] Are outputs clearly defined?
- [ ] Are there usage examples?
## Testing
- [ ] Have changes been tested in staging?
- [ ] Are there automated tests for infrastructure?
- [ ] Has the plan been reviewed for unintended changes?Testing Infrastructure Code
Infrastructure Testing Strategy:
# terraform/testing/variables.tf
variable "aws_region" {
description = "AWS region for testing"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Test environment"
type = string
default = "test"
}# terraform/testing/main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
null = {
source = "hashicorp/null"
version = "~> 3.0"
}
}
}
provider "aws" {
region = var.aws_region
}
# Test resources
resource "aws_vpc" "test" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-test-vpc"
Environment = var.environment
}
}
resource "aws_security_group" "test" {
name_prefix = "${var.environment}-test-sg-"
description = "Test security group"
vpc_id = aws_vpc.test.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.environment}-test-sg"
Environment = var.environment
}
}
# Null resource for testing purposes
resource "null_resource" "test_validation" {
triggers = {
vpc_id = aws_vpc.test.id
}
provisioner "local-exec" {
command = <<EOT
# Test VPC connectivity
echo "Testing VPC: ${aws_vpc.test.id}"
echo "Testing Security Group: ${aws_security_group.test.id}"
# Add actual tests here
# For example: check if VPC has expected tags
aws ec2 describe-vpcs --vpc-ids ${aws_vpc.test.id} --query 'Vpcs[0].Tags' --output text | grep -q "Environment=${var.environment}"
if [ $? -eq 0 ]; then
echo "✓ VPC has correct tags"
else
echo "✗ VPC missing expected tags"
exit 1
fi
EOT
}
}Terratest Example:
// test/infrastructure_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVPC(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../examples/vpc",
Vars: map[string]interface{}{
"environment": "test",
"vpc_cidr": "10.0.0.0/16",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcID := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcID)
publicSubnetIDs := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
assert.Equal(t, 2, len(publicSubnetIDs))
securityGroupID := terraform.Output(t, terraformOptions, "security_group_id")
assert.NotEmpty(t, securityGroupID)
}
func TestEC2Instance(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../examples/ec2",
Vars: map[string]interface{}{
"environment": "test",
"instance_count": 2,
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
instanceIDs := terraform.OutputList(t, terraformOptions, "instance_ids")
assert.Equal(t, 2, len(instanceIDs))
publicIPs := terraform.OutputList(t, terraformOptions, "public_ips")
for _, ip := range publicIPs {
assert.NotEmpty(t, ip)
}
}State Management
Terraform State Management:
# backend.hcl - Backend configuration
bucket = "my-terraform-state-bucket"
key = "terraform/state"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"#!/bin/bash
# scripts/terraform-workflow.sh
# Terraform workflow script
set -e
ENVIRONMENT=$1
ACTION=$2
if [ -z "$ENVIRONMENT" ] || [ -z "$ACTION" ]; then
echo "Usage: $0 <environment> <action>"
echo "Example: $0 prod apply"
exit 1
fi
case $ENVIRONMENT in
dev|staging|prod)
echo "Operating in $ENVIRONMENT environment"
;;
*)
echo "Invalid environment: $ENVIRONMENT"
exit 1
;;
esac
# Select the appropriate configuration
cp terraform/$ENVIRONMENT/backend.hcl .
# Initialize Terraform with the backend configuration
terraform init -backend-config=backend.hcl
case $ACTION in
plan)
echo "Creating execution plan..."
terraform plan -var-file="terraform/$ENVIRONMENT/variables.tfvars"
;;
apply)
echo "Applying changes..."
terraform apply -auto-approve -var-file="terraform/$ENVIRONMENT/variables.tfvars"
;;
destroy)
echo "Destroying infrastructure..."
terraform destroy -auto-approve -var-file="terraform/$ENVIRONMENT/variables.tfvars"
;;
*)
echo "Invalid action: $ACTION"
exit 1
;;
esacState Locking and Concurrency:
# terraform/prod/main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-prod-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# Lock table ensures only one person can modify state at a time
}
}
# Use workspaces for environment separation
# terraform workspace new prod
# terraform workspace select prodSecurity in IaC
Secure Secret Management:
# terraform/modules/secure-app/main.tf
variable "db_password" {
description = "Database password (should be passed securely)"
type = string
sensitive = true # Mark as sensitive to hide in output
}
variable "ssl_certificate_arn" {
description = "SSL certificate ARN from AWS Certificate Manager"
type = string
}
# Use AWS Secrets Manager for sensitive data
resource "aws_secretsmanager_secret" "database_credentials" {
name = "${var.environment}-database-credentials"
tags = {
Environment = var.environment
}
}
resource "aws_secretsmanager_secret_version" "database_credentials" {
secret_id = aws_secretsmanager_secret.database_credentials.id
secret_string = jsonencode({
username = var.db_username
password = var.db_password
})
}
# Reference the secret in application configuration
resource "aws_ssm_parameter" "app_config" {
name = "/${var.environment}/app/config"
type = "SecureString"
value = jsonencode({
database_url = "postgresql://${var.db_username}:${var.db_password}@${aws_db_instance.main.address}:5432/${var.db_name}"
ssl_cert_arn = var.ssl_certificate_arn
})
tags = {
Environment = var.environment
}
}IAM Policy Management:
# terraform/modules/iam-policies/main.tf
# Define IAM policies as code
resource "aws_iam_policy" "s3_read_only" {
name = "${var.environment}-s3-read-only"
description = "S3 read-only access for ${var.environment}"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:ListBucket"
]
Resource = [
"arn:aws:s3:::${var.bucket_name}",
"arn:aws:s3:::${var.bucket_name}/*"
]
}
]
})
}
# Attach policies to roles
resource "aws_iam_role_policy_attachment" "s3_read_only_attach" {
role = aws_iam_role.app_role.name
policy_arn = aws_iam_policy.s3_read_only.arn
}
# Create roles with least-privilege principle
resource "aws_iam_role" "app_role" {
name = "${var.environment}-app-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
Action = "sts:AssumeRole"
}
]
})
tags = {
Environment = var.environment
}
}Advanced IaC Patterns
Multi-Environment Management
Environment-Specific Configurations:
# terraform/environments/dev/variables.auto.tfvars
environment = "dev"
instance_type = "t3.micro"
instance_count = 1
enable_monitoring = false# terraform/environments/staging/variables.auto.tfvars
environment = "staging"
instance_type = "t3.small"
instance_count = 2
enable_monitoring = true# terraform/environments/prod/variables.auto.tfvars
environment = "prod"
instance_type = "t3.medium"
instance_count = 3
enable_monitoring = true
enable_backup = true
multi_az = trueConditional Resource Creation:
# main.tf - Conditional resources based on environment
variable "environment" {
description = "Environment name"
type = string
}
variable "enable_monitoring" {
description = "Enable CloudWatch monitoring"
type = bool
default = false
}
variable "enable_backup" {
description = "Enable automated backups"
type = bool
default = false
}
# Conditionally create monitoring resources
resource "aws_cloudwatch_log_group" "app_logs" {
count = var.enable_monitoring ? 1 : 0
name = "/${var.environment}/app"
retention_in_days = 14
tags = {
Environment = var.environment
}
}
resource "aws_cloudwatch_alarm" "high_cpu" {
count = var.enable_monitoring ? 1 : 0
alarm_name = "${var.environment}-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "80"
alarm_description = "This metric monitors high CPU utilization"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
InstanceId = aws_instance.web.id
}
}
# Conditionally create backup resources
resource "aws_db_option_group" "postgres_options" {
count = var.enable_backup ? 1 : 0
option_group_name = "${var.environment}-postgres-options"
engine_name = "postgres"
major_engine_version = "15"
option {
option_name = "PG_STAT_STATEMENTS"
}
}Modular Architecture
Reusable Module Structure:
terraform/
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── README.md
│ ├── ec2-instance/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── user-data.sh
│ └── rds/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── security.tf
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ └── variables.auto.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ └── variables.auto.tfvars
│ └── prod/
│ ├── main.tf
│ └── variables.auto.tfvars
└── examples/
└── simple-vpc/
├── main.tf
└── README.mdVPC Module Example:
# terraform/modules/vpc/main.tf
variable "name" {
description = "Name to be used on all resources as prefix"
type = string
}
variable "cidr_block" {
description = "CIDR block for the VPC"
type = string
}
variable "azs" {
description = "List of availability zones"
type = list(string)
}
variable "public_subnets" {
description = "List of public subnet CIDR blocks"
type = list(string)
}
variable "private_subnets" {
description = "List of private subnet CIDR blocks"
type = list(string)
}
variable "enable_nat_gateway" {
description = "Enable NAT gateways for private subnets"
type = bool
default = true
}
variable "single_nat_gateway" {
description = "Single NAT gateway for all private subnets"
type = bool
default = false
}
variable "enable_dns_hostnames" {
description = "Enable DNS hostnames for the VPC"
type = bool
default = true
}
variable "enable_dns_support" {
description = "Enable DNS support for the VPC"
type = bool
default = true
}
variable "tags" {
description = "Additional tags to apply to resources"
type = map(string)
default = {}
}
# VPC resource
resource "aws_vpc" "this" {
cidr_block = var.cidr_block
enable_dns_hostnames = var.enable_dns_hostnames
enable_dns_support = var.enable_dns_support
tags = merge(
{
Name = var.name
},
var.tags
)
}
# Internet Gateway
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = merge(
{
Name = "${var.name}-igw"
},
var.tags
)
}
# Public Subnets
resource "aws_subnet" "public" {
count = length(var.public_subnets)
vpc_id = aws_vpc.this.id
cidr_block = var.public_subnets[count.index]
availability_zone = var.azs[count.index % length(var.azs)]
map_public_ip_on_launch = true
tags = merge(
{
Name = "${var.name}-public-subnet-${count.index + 1}"
Type = "Public"
},
var.tags
)
}
# Private Subnets
resource "aws_subnet" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.this.id
cidr_block = var.private_subnets[count.index]
availability_zone = var.azs[count.index % length(var.azs)]
tags = merge(
{
Name = "${var.name}-private-subnet-${count.index + 1}"
Type = "Private"
},
var.tags
)
}
# NAT Gateway and EIP
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
domain = "vpc"
tags = merge(
{
Name = "${var.name}-nat-eip-${count.index + 1}"
},
var.tags
)
depends_on = [aws_internet_gateway.this]
}
resource "aws_nat_gateway" "this" {
count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.private_subnets)) : 0
allocation_id = aws_eip.nat[var.single_nat_gateway ? 0 : count.index].id
subnet_id = aws_subnet.public[count.index % length(aws_subnet.public)].id
tags = merge(
{
Name = "${var.name}-nat-gateway-${count.index + 1}"
},
var.tags
)
depends_on = [aws_internet_gateway.this]
}
# Route tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
tags = merge(
{
Name = "${var.name}-public-rt"
},
var.tags
)
}
resource "aws_route_table" "private" {
count = var.enable_nat_gateway ? length(var.private_subnets) : 0
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.this[0].id : aws_nat_gateway.this[count.index].id
}
tags = merge(
{
Name = "${var.name}-private-rt-${count.index + 1}"
},
var.tags
)
}
# Route table associations
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.this.id
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of the private subnets"
value = aws_subnet.private[*].id
}
output "internet_gateway_id" {
description = "ID of the internet gateway"
value = aws_internet_gateway.this.id
}IaC in CI/CD Pipelines
Terraform in CI/CD:
# .github/workflows/terraform.yml
name: Terraform CI/CD
on:
push:
branches: [ main, develop ]
paths:
- 'terraform/**'
pull_request:
branches: [ main ]
paths:
- 'terraform/**'
env:
TF_VAR_environment: ${{ github.ref == 'refs/heads/main' && 'prod' || 'staging' }}
TF_LOG: INFO
jobs:
terraform-validate:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./terraform/${{ github.ref == 'refs/heads/main' && 'prod' || 'staging' }}
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
terraform_wrapper: false
- name: Terraform Format
run: terraform fmt -check
continue-on-error: true
- name: Terraform Init
run: terraform init -backend=false
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan
run: terraform plan -var-file="variables.auto.tfvars"
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
terraform-apply:
needs: terraform-validate
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
defaults:
run:
working-directory: ./terraform/prod
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
terraform_wrapper: false
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Terraform Init
run: terraform init
- name: Terraform Plan
id: plan
run: terraform plan -var-file="variables.auto.tfvars" -out=tfplan
- name: Terraform Apply
run: terraform apply tfplanInfrastructure Testing Pipeline:
# .github/workflows/infrastructure-test.yml
name: Infrastructure Tests
on:
pull_request:
branches: [ main ]
paths:
- 'terraform/**'
jobs:
test-infrastructure:
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging]
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: '1.21'
- name: Install Terratest dependencies
run: |
go mod init test
go get github.com/gruntwork-io/terratest/modules/terraform
go get github.com/stretchr/testify/assert
- name: Run Terraform validation tests
run: |
cd terraform/${{ matrix.environment }}
terraform init -backend=false
terraform validate
terraform plan -detailed-exitcode
- name: Run Infrastructure Tests
run: |
cd test
go test -v ./...Common IaC Challenges and Solutions
State Management Challenges
Challenge: State Drift
Problem: Infrastructure changes made outside of IaC cause state drift. Solution: Implement drift detection and remediation.
#!/bin/bash
# scripts/drift-detection.sh
# Compare actual infrastructure with Terraform state
echo "Checking for infrastructure drift..."
# Import manually created resources
terraform import aws_instance.manual_created i-1234567890abcdef0
# Show differences between state and actual
terraform plan
# Alternative: Use drift detection tools
# Checkov for security compliance
checkov -d terraform/
# TFLint for best practices
tflint --init
tflintComplexity Management
Challenge: Complex Infrastructure
Problem: Large infrastructure configurations become difficult to manage. Solution: Modularize and use composition patterns.
# terraform/modules/complete-app-stack/main.tf
module "vpc" {
source = "../vpc"
name = var.name
cidr_block = var.vpc_cidr
azs = var.availability_zones
public_subnets = var.public_subnets
private_subnets = var.private_subnets
tags = var.tags
}
module "database" {
source = "../rds"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
security_groups = [module.vpc.database_security_group_id]
db_name = var.db_name
db_user = var.db_user
db_password = var.db_password
tags = var.tags
}
module "application" {
source = "../ecs"
vpc_id = module.vpc.vpc_id
public_subnets = module.vpc.public_subnet_ids
private_subnets = module.vpc.private_subnet_ids
image_uri = var.image_uri
db_endpoint = module.database.endpoint
tags = var.tags
}
output "vpc_id" {
value = module.vpc.vpc_id
}
output "db_endpoint" {
value = module.database.endpoint
}
output "app_url" {
value = module.application.load_balancer_dns
}Conclusion
Infrastructure as Code is a fundamental practice that enables organizations to manage infrastructure with the same discipline as application code. By treating infrastructure as code, organizations can achieve greater consistency, reliability, and efficiency in their infrastructure management.
The key to successful IaC implementation is to start with foundational practices like version control, testing, and security, then gradually adopt more advanced patterns like modularization and multi-environment management. Organizations that master IaC gain significant advantages in infrastructure management, including faster deployment, better security posture, and improved operational resilience.
In the next article, we'll explore monitoring and observability in DevOps, covering how to implement comprehensive monitoring solutions that provide insights into application and infrastructure performance.