Terraform governing with OPA

When managing infrastructure with Terraform, enforcing standards across teams and environments could be essential. When working alone or in small team I can say these policies might not be needed but as soon as it became hard to manage all engineers and commits it is useful to have some restrictions in place.
It is very easy for someone to accidentally deploy a large and costly RDS instance type or EC2 instance in test environment and forget to delete it after tests are done.
Here are some of the examples where Open Policy Agent (OPA) can play huge role:

  • Enforce tagging standards (e.g., all resources must have Environment, Owner and CostCenter tags). The rule would obviously deny all resources without these tags.
  • Restrict expensive instance types(e.g., engineer cannot deploy instance type larger than t3.)
  • Prevent public exposure (e.g., deny cidr_block = [“0.0.0.0/0”] or other rules to prevent security misconfigurations)
  • Enforce region (e.g., you want to allow only eu-central-1 and deny all other regions)
  • Other best practices

In this post, we’ll explore how to enforce a simple Terraform policy that ensures AWS RDS instances are limited to the db.t3.micro type.

Why use policy as code?

As you know Terraform is used for defining infrastructure but not for governing it, meaning you can set locals, variables but Terraform itself won’t stop someone from changing variable to db.m5.2xlarge.
There’s where OPA steps in and evaluates Terraform plans before apply. It ensures that every change complies with your organization’s rules and restrictions.
Policy as code brings security and governance into your workflow because rules are version controlled and could be automated via CI/CD pipeline.

Let’s see on the RDS example how to use it and how it works.

resource "aws_db_instance" "postgres" {
  identifier             = var.rds_config.db_identifier
  engine                 = "postgres"
  engine_version         = var.rds_config.engine_version
  instance_class         = var.rds_config.instance_class
  allocated_storage      = var.rds_config.allocated_storage
  max_allocated_storage  = var.rds_config.max_allocated_storage
  storage_type           = var.rds_config.storage_type
  db_name                = var.rds_credentials.dbname
  username               = var.rds_credentials.user
  password               = var.rds_credentials.password
  vpc_security_group_ids = [aws_security_group.rds_sg.id]
  db_subnet_group_name   = aws_db_subnet_group.db_subnets.name
  skip_final_snapshot    = true
  publicly_accessible    = var.rds_config.publicly_accessible
  multi_az               = var.rds_config.multi_az
  storage_encrypted      = true

  tags = merge({
    Name = "test-rds",
  }, local.common_tags)
}

Your rds_config variable could look like this:

variable "rds_config" {
  description = "Non-sensitive RDS configuration"
  type = object({
    db_identifier         = string
    instance_class        = string
    engine_version        = string
    allocated_storage     = number
    max_allocated_storage = number
    storage_type          = string
    publicly_accessible   = bool
    multi_az              = bool
  })
}

When applying Terraform configuration engineer could make a type and initialize this variable with larger instance so we want to prevent that from happening.

Step 1: Generate a Terraform Plan in JSON

First you will need a plan and export it to JSON format which will OPA analyze and output you the result of the test.

terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json

Terraform will save plan in binary format because it is faster and prevents full internal details for later use. As OPA requires JSON format we need to transform it.

Step 2: Create an OPA Policy:

Second step would be to create policy itself. If you plan to have multiple files then create a folder policies for example and store policy file inside. For showcase I will create file rds_limit.rego and this suffix is required by OPA.

package terraform.policies

# Allowed instance types
allowed_types = {"db.t3.micro"}

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_db_instance"

  instance_type := resource.change.after.instance_class
  not allowed_types[instance_type]

  msg := sprintf("RDS instance type %s is not allowed for %s", [instance_type, resource.address])
}
How it works:
  • package terraform.policies defines the namespace where the policy lives, like a folder name for rules
  • allowed_types = {“db.t3.micro”} defines a set of allowed RDS instance types (that is what I need for this example)
  • deny[msg] starts a rule that collects messages when something violates the policy
  • resource := input.resource_changes[_] loops through all Terraform resources in the plan
  • resource.type == “aws_db_instance” filters only RDS resources
  • instance_type := resource.change.after.instance_class extracts the instance type being planned
  • not allowed_types[instance_type] checks if the instance type is not in the allowed set
  • msg := sprintf(…) builds a readable message describing what failed
Step 3: Test OPA rule

To test our OPA rule I will ran this command:

opa eval --format pretty --data policies/rds_limit.rego --input tfplan.json "data.terraform.policies.deny"

This command runs policy policies/rds_limit.rego against Terraform plan –input tfplan.json and prints out any “deny” message. So “data.terraform.policies.deny” specifies which rule to evaluate from the policy package and to keep output clean.
If in *.tfvars file we define value for rds_config variable:

rds_config = {
  instance_class        = "db.t3.micro"
  engine_version        = "15.13"
  allocated_storage     = 20
  max_allocated_storage = 20
  storage_type          = "gp2"
  publicly_accessible   = false
  multi_az              = false
}

Since this is allowed instance type our test is returning empty output:

> opa eval --format pretty --data policies/rds_limit.rego --input tfplan.json "data.terraform.policies.deny"

{}

But as soon as you try to pass different value:

rds_config = {
  instance_class        = "db.t3.medium"
  engine_version        = "15.13"
  allocated_storage     = 20
  max_allocated_storage = 20
  storage_type          = "gp2"
  publicly_accessible   = false
  multi_az              = false
}

Of course after change don’t forget to update tfplan.json file otherwise it won’t have an idea that you changed instance type so run again commands from step 1.
Now test will fail and output is:

opa eval --format pretty --data policies/rds_limit.rego --input tfplan.json "data.terraform.policies.deny"

{
  "RDS instance type db.t3.medium is not allowed for aws_db_instance.postgres": true
}

As you can see we got message: “RDS instance type db.t3.medium is not allowed for aws_db_instance.postgres”: true and therefore we prevented deploying bigger size.

Step 4: Integrate into CI/CD

You can easily add this check into your pipeline.
Here’s an example for GitHub Actions:

- name: Terraform plan
  run: terraform plan -out=tfplan.binary

- name: Generate plan JSON
  run: terraform show -json tfplan.binary > tfplan.json

- name: Run OPA policy checks
  run: |
    opa eval --format pretty \
      --data policies/rds_limit.rego \
      --input tfplan.json \
      "data.terraform.policies.deny"

Now every pull request automatically verifies your infrastructure policies before applying.

Conclusion

Terraform governing with OPA will add an extra layer of safety and consistency to infra management. It helps teams avoid costly mistakes and enforce good practices since. Having rules in large teams will help your infra to stay predictable and manageable. Even though example from blog is focused on preventing overprovisioning and capacity waste OPA rules can be applied in such a way to enforce best practices.

If this blog saved you time, support me with a coffee!

Thanks to everyone who’s supported!