Managing RDS snapshot restore with Terraform
Backup and restore strategies are something we usually configure it just in case anything bad happen but we never actually test it and try it out to see in more details how it works.
In AWS environment when using RDS cluster for example Aurora PostgreSQL, the challenge is not only having backups, but understanding how different restore mechanisms behave when managed through Terraform.
Very easy and without a clear approach, restoring can become unpredictable and clusters get recreated unexpectedly, endpoints change or recovery timelines get messed up.
In this post, we’ll break down two core restore strategies and how they behave when driven by Terraform:
- Snapshot-based restore
- Point-in-Time Recovery (PITR)
Understanding the Restore Strategy
At a high level, RDS provides two fundamentally different recovery models:
Snapshot Restore
- Recreates a cluster from a fixed backup point (could be every day at 2am for example)
Point-in-Time Recovery (PITR)
- Replays changes up to a specific timestamp (when we want to restore seconds before some incident ocurred)
They may sound similar, but operationally they behave differently.
Restore from Snapshot
Snapshot restore is controlled via a single parameter. Check out Terraform doc for more details.
snapshot_identifier = local.aurora_policy[var.env].snapshot_identifierExample:
aurora_policy = {
dev = {
deletion_protection = false
skip_final_snapshot = true
final_snapshot_identifier = null
snapshot_identifier = "rds:aurora-demo-dev-2026-04-06-01-29"
iam_database_authentication_enabled = true
performance_insights_enabled = true
}
}This is just an example of how you can forward the values. Personally it is more convenient to create locals in Terraform and then just forward values. With this approach you can add different configurations per environment and then just pass them to RDS cluster resource block or to RDS Terraform module depending on what you use.
What actually happens if you apply this Terraform?
Terraform does not “update” your cluster.
It performs a full replacement:
- Existing cluster is destroyed
- New cluster is created from snapshot
- Writer and reader instances are recreated
- Endpoint remains the same
- Total duration: ~30–40 minutes
Basically this is not a patch operation it’s a controlled rebuild. You can utilize AWS CLI command to check what is your cluster reader endpoint for example, save it somewhere and then after restore has been completed run it again to double check if endpoint has changed.
aws rds describe-db-clusters \
--db-cluster-identifier <cluster-id> \
--query "DBClusters[0].ReaderEndpoint"To properly test snapshot restore I recommend you to use query editor feature within RDS service, connect to your DB in dev (or sandbox) account and run simple SQL query to create new table. After that change run terraform apply and wait for process to be completed then connect again to DB and check if table you created is there or not.
Trigger Behavior
The restore is triggered only when the value changes which means if .
- Same snapshot → no action
- New snapshot → full restore
null→ normal deployment
snapshot_identifier = "rds:aurora-demo-dev-2026-04-06-01-29" -> no action
snapshot_identifier = "rds:aurora-demo-dev-2026-04-07-01-29" -> full restore
snapshot_identifier = null -> normal deploymentSnapshot Types
RDS supports two types of snapshots:
Automated Backups
- Retention: 0–35 days
- Managed by AWS
- Found under system snapshots in AWS console
Manual Snapshots
- No expiration
- Persist until deleted
The choice here is operational — not technical. Automated backups are convenient, but manual snapshots give you control over long-term recovery points.
Point-in-Time Recovery (PITR)
Concept
PITR allows you to restore a database to an exact moment in time. Instead of restoring a snapshot, RDS replays transaction logs up to a timestamp.
I’ve decided to use Terraform dynamic for this one because point in time recovery is optional block that module call for RDS cluster might or might not have.
dynamic "restore_to_point_in_time" {
for_each = var.restore_to_point_in_time != null ? [var.restore_to_point_in_time] : []
content {
source_cluster_identifier = lookup(restore_to_point_in_time.value, "source_cluster_identifier", null)
source_cluster_resource_id = lookup(restore_to_point_in_time.value, "source_cluster_resource_id", null)
restore_type = lookup(restore_to_point_in_time.value, "restore_type", null)
use_latest_restorable_time = lookup(restore_to_point_in_time.value, "use_latest_restorable_time", null)
restore_to_time = lookup(restore_to_point_in_time.value, "restore_to_time", null)
}
}This will configure point in time recovery only if input for variable restore_to_point_in_time is not null and use single-item list to create exactly one instance of the block and lookup to safely access optional attributes.
The input could look like this:
restore_to_point_in_time = {
source_cluster_identifier = module.aurora.cluster_id
restore_to_time = "2026-04-08T11:18:31Z"
}However we do not pass this one in same Terraform cluster configuration but rather in different one.
This creates:
- A completely new cluster
- Based on the original cluster
- Restored to a specific time
What you can do is if you use resource block to create RDS or module, just copy it and rename it mark it as restored version and add this restore_to_point_in_time configuration.
If you are not fully sure what time you can pick for PITR there are two options either use AWS CLI command or check in AWS console. That is important because PITR works only in valid time window.
aws rds describe-db-clusters \
--db-cluster-identifier <cluster-id> \
--query "DBClusters[0].{Earliest:EarliestRestorableTime,Latest:LatestRestorableTime}"This command will return: EarliestRestorableTime and LatestRestorableTime. Nevertheless if you pick wrong time you will get an error that looks like this:

Key Insight
PITR history starts when the cluster is created or restored.
If you:
- Restore from snapshot
→ You reset the timeline
That means:
- Old history is gone
- New PITR window starts from that moment
This is one of the most common misunderstandings in production incidents. What this means if you try to restore from the automated snapshot and that went successfully but after that you realized that instead you need to restore it at specific time, and you change restore_to_point_in_time not to be null and run terraform apply the earliest time will be the time when the previous snapshot was restored as it is “new” cluster.
New Endpoint for PITR
PITR always creates a new cluster:
aurora-demo-restore-dev.cluster-xyz.eu-central-1.rds.amazonaws.comThis means:
- No automatic traffic switch
- No implicit failover
Conditional Parameters
Certain parameters must not be defined.
master_username = var.restore_to_point_in_time == null ? var.master_username : null
master_password = var.restore_to_point_in_time == null ? var.master_password : null
database_name = var.restore_to_point_in_time == null ? var.database_name : null
snapshot_identifier = var.restore_to_point_in_time == null ? var.snapshot_identifier : nullWhen performing restore, Aurora does not create a “new” database from scratch.
Instead, it reconstructs the cluster from an existing source, including:
- credentials
- database name
- internal state
If you try to provide these values manually:
- Terraform will attempt to override them
- AWS will reject the configuration
- The restore will fail
!One important note: when I tried to perform PITR with master_username and master_password being configured I got and error therefore I added support to the RDS cluster module that if restore_to_point_in_time is not null then set those to null. However when performing automated snapshot I did not get an error if username and password were configured. Terraform documentation says:
master_username– (Required unless asnapshot_identifierorreplication_source_identifieris provided or unless aglobal_cluster_identifieris provided when the cluster is the “secondary” cluster of a global database
Failure Behavior for Automated & Manual snapshots
If you have restored version already but you want to apply some modifications and then mid apply something fails restored version could be fully lost. Since it recreates entire RDS cluster and its instances if it destroy and then create new one but something happened in the meantime it could lost your restored version.
Conclusion
Managing RDS snapshot restore with Terraform is less about configuration, and more about understanding behavior. On the surface, restore operations look simple – set a snapshot or timestamp and apply but in reality you definitely should try it out yourself. Also since it can take a lot of time, do not forget to make a change to DB prior to restore so that test make sense.







