Managing RDS snapshot restore with Terraform

Backup and restore strategies are something we usually configure it just in case anything bad happen but we never actually test it and try it out to see in more details how it works.
In AWS environment when using RDS cluster for example Aurora PostgreSQL, the challenge is not only having backups, but understanding how different restore mechanisms behave when managed through Terraform.

Very easy and without a clear approach, restoring can become unpredictable and clusters get recreated unexpectedly, endpoints change or recovery timelines get messed up.

In this post, we’ll break down two core restore strategies and how they behave when driven by Terraform:

Snapshot-based restore
Point-in-Time Recovery (PITR)

Understanding the Restore Strategy

At a high level, RDS provides two fundamentally different recovery models:

Snapshot Restore

Recreates a cluster from a fixed backup point (could be every day at 2am for example)

Point-in-Time Recovery (PITR)

Replays changes up to a specific timestamp (when we want to restore seconds before some incident ocurred)

They may sound similar, but operationally they behave differently.

Restore from Snapshot

Snapshot restore is controlled via a single parameter. Check out Terraform doc for more details.

  snapshot_identifier = local.aurora_policy[var.env].snapshot_identifier

  snapshot_identifier = local.aurora_policy[var.env].snapshot_identifier

Example:

  aurora_policy = {
    dev = {
      deletion_protection                 = false
      skip_final_snapshot                 = true
      final_snapshot_identifier           = null
      snapshot_identifier                 = "rds:aurora-demo-dev-2026-04-06-01-29"
      iam_database_authentication_enabled = true
      performance_insights_enabled        = true
    }
  }

  aurora_policy = {
    dev = {
      deletion_protection                 = false
      skip_final_snapshot                 = true
      final_snapshot_identifier           = null
      snapshot_identifier                 = "rds:aurora-demo-dev-2026-04-06-01-29"
      iam_database_authentication_enabled = true
      performance_insights_enabled        = true
    }
  }

This is just an example of how you can forward the values. Personally it is more convenient to create locals in Terraform and then just forward values. With this approach you can add different configurations per environment and then just pass them to RDS cluster resource block or to RDS Terraform module depending on what you use.

What actually happens if you apply this Terraform?

Terraform does not “update” your cluster.

It performs a full replacement:

Existing cluster is destroyed
New cluster is created from snapshot
Writer and reader instances are recreated
Endpoint remains the same
Total duration: ~30–40 minutes

Basically this is not a patch operation it’s a controlled rebuild. You can utilize AWS CLI command to check what is your cluster reader endpoint for example, save it somewhere and then after restore has been completed run it again to double check if endpoint has changed.

 aws rds describe-db-clusters \                                                                                             
  --db-cluster-identifier <cluster-id> \
  --query "DBClusters[0].ReaderEndpoint"

 aws rds describe-db-clusters \                                                                                             
  --db-cluster-identifier <cluster-id> \
  --query "DBClusters[0].ReaderEndpoint"

To properly test snapshot restore I recommend you to use query editor feature within RDS service, connect to your DB in dev (or sandbox) account and run simple SQL query to create new table. After that change run terraform apply and wait for process to be completed then connect again to DB and check if table you created is there or not.

Trigger Behavior

The restore is triggered only when the value changes which means if .

Same snapshot → no action
New snapshot → full restore
null → normal deployment

snapshot_identifier = "rds:aurora-demo-dev-2026-04-06-01-29" -> no action
snapshot_identifier = "rds:aurora-demo-dev-2026-04-07-01-29" -> full restore
snapshot_identifier = null -> normal deployment

snapshot_identifier = "rds:aurora-demo-dev-2026-04-06-01-29" -> no action
snapshot_identifier = "rds:aurora-demo-dev-2026-04-07-01-29" -> full restore
snapshot_identifier = null -> normal deployment

Snapshot Types

RDS supports two types of snapshots:

Automated Backups

Retention: 0–35 days
Managed by AWS
Found under system snapshots in AWS console

Manual Snapshots

No expiration
Persist until deleted

The choice here is operational — not technical. Automated backups are convenient, but manual snapshots give you control over long-term recovery points.

Point-in-Time Recovery (PITR)

Concept

PITR allows you to restore a database to an exact moment in time. Instead of restoring a snapshot, RDS replays transaction logs up to a timestamp.

I’ve decided to use Terraform dynamic for this one because point in time recovery is optional block that module call for RDS cluster might or might not have.

  dynamic "restore_to_point_in_time" {
    for_each = var.restore_to_point_in_time != null ? [var.restore_to_point_in_time] : []

    content {
      source_cluster_identifier  = lookup(restore_to_point_in_time.value, "source_cluster_identifier", null)
      source_cluster_resource_id = lookup(restore_to_point_in_time.value, "source_cluster_resource_id", null)
      restore_type               = lookup(restore_to_point_in_time.value, "restore_type", null)
      use_latest_restorable_time = lookup(restore_to_point_in_time.value, "use_latest_restorable_time", null)
      restore_to_time            = lookup(restore_to_point_in_time.value, "restore_to_time", null)
    }
  }

  dynamic "restore_to_point_in_time" {
    for_each = var.restore_to_point_in_time != null ? [var.restore_to_point_in_time] : []

    content {
      source_cluster_identifier  = lookup(restore_to_point_in_time.value, "source_cluster_identifier", null)
      source_cluster_resource_id = lookup(restore_to_point_in_time.value, "source_cluster_resource_id", null)
      restore_type               = lookup(restore_to_point_in_time.value, "restore_type", null)
      use_latest_restorable_time = lookup(restore_to_point_in_time.value, "use_latest_restorable_time", null)
      restore_to_time            = lookup(restore_to_point_in_time.value, "restore_to_time", null)
    }
  }

This will configure point in time recovery only if input for variable restore_to_point_in_time is not null and use single-item list to create exactly one instance of the block and lookup to safely access optional attributes.

The input could look like this:

restore_to_point_in_time = {
  source_cluster_identifier = module.aurora.cluster_id
  restore_to_time           = "2026-04-08T11:18:31Z"
}

restore_to_point_in_time = {
  source_cluster_identifier = module.aurora.cluster_id
  restore_to_time           = "2026-04-08T11:18:31Z"
}

However we do not pass this one in same Terraform cluster configuration but rather in different one.

This creates:

A completely new cluster
Based on the original cluster
Restored to a specific time

What you can do is if you use resource block to create RDS or module, just copy it and rename it mark it as restored version and add this restore_to_point_in_time configuration.

If you are not fully sure what time you can pick for PITR there are two options either use AWS CLI command or check in AWS console. That is important because PITR works only in valid time window.

aws rds describe-db-clusters \
  --db-cluster-identifier <cluster-id> \
  --query "DBClusters[0].{Earliest:EarliestRestorableTime,Latest:LatestRestorableTime}"

aws rds describe-db-clusters \
  --db-cluster-identifier <cluster-id> \
  --query "DBClusters[0].{Earliest:EarliestRestorableTime,Latest:LatestRestorableTime}"

This command will return: EarliestRestorableTime and LatestRestorableTime. Nevertheless if you pick wrong time you will get an error that looks like this:

Key Insight

PITR history starts when the cluster is created or restored.

If you:

Restore from snapshot
→ You reset the timeline

That means:

Old history is gone
New PITR window starts from that moment

This is one of the most common misunderstandings in production incidents. What this means if you try to restore from the automated snapshot and that went successfully but after that you realized that instead you need to restore it at specific time, and you change restore_to_point_in_time not to be null and run terraform apply the earliest time will be the time when the previous snapshot was restored as it is “new” cluster.

New Endpoint for PITR

PITR always creates a new cluster:

aurora-demo-restore-dev.cluster-xyz.eu-central-1.rds.amazonaws.com

aurora-demo-restore-dev.cluster-xyz.eu-central-1.rds.amazonaws.com

This means:

No automatic traffic switch
No implicit failover

Conditional Parameters

Certain parameters must not be defined.

master_username = var.restore_to_point_in_time == null ? var.master_username : null
master_password = var.restore_to_point_in_time == null ? var.master_password : null
database_name   = var.restore_to_point_in_time == null ? var.database_name : null
snapshot_identifier = var.restore_to_point_in_time == null ? var.snapshot_identifier : null

master_username = var.restore_to_point_in_time == null ? var.master_username : null
master_password = var.restore_to_point_in_time == null ? var.master_password : null
database_name   = var.restore_to_point_in_time == null ? var.database_name : null
snapshot_identifier = var.restore_to_point_in_time == null ? var.snapshot_identifier : null

When performing restore, Aurora does not create a “new” database from scratch.

Instead, it reconstructs the cluster from an existing source, including:

credentials
database name
internal state

If you try to provide these values manually:

Terraform will attempt to override them
AWS will reject the configuration
The restore will fail

!One important note: when I tried to perform PITR with master_username and master_password being configured I got and error therefore I added support to the RDS cluster module that if restore_to_point_in_time is not null then set those to null. However when performing automated snapshot I did not get an error if username and password were configured. Terraform documentation says:

master_username – (Required unless a snapshot_identifier or replication_source_identifier is provided or unless a global_cluster_identifier is provided when the cluster is the “secondary” cluster of a global database

Failure Behavior for Automated & Manual snapshots

If you have restored version already but you want to apply some modifications and then mid apply something fails restored version could be fully lost. Since it recreates entire RDS cluster and its instances if it destroy and then create new one but something happened in the meantime it could lost your restored version.

Conclusion

Managing RDS snapshot restore with Terraform is less about configuration, and more about understanding behavior. On the surface, restore operations look simple – set a snapshot or timestamp and apply but in reality you definitely should try it out yourself. Also since it can take a lot of time, do not forget to make a change to DB prior to restore so that test make sense.