Flattening data in Terraform
A deep-dive on manipulating data in Terraform, to prepare it for use in resource blocks
Updated April, 2024
This is part of a collection of articles covering an Introduction to Advanced Terraform, it is deliberately of a certain style, you can find out more in the first post.
Introduction
My most common use case for flattening data in Terraform is when I need to add a count to a resource and then have that consumed by a for_each
, in case you're not aware, you cannot have a count
and for_each
on the same resource block. for_each
lets you add conditions in your resource blocks, like if value.region == "eu-west-1"
then use the provider in this block or if value.engine == "aurora"
then create this RDS cluster (and not an RDS instance which is a different thing).
A problem I've found when you start engaging in flattening is, it tends to lead to more flattening, you flatten the local.ec2
to allow the data to be used in a resource block, then you need to flatten the attached disks so you can attach them to the correct EC2 instance.
In a particularly convoluted example, when dealing with things like router or firewall appliances in the cloud, you want to throw some routes in the routing table at the network interface of an EC2 instance, all while keeping the input data in the locals files clean and easy to read.
It's achievable, I wouldn't call it particularly elegant, but it keeps the resource definitions looking nice for less "In the weeds" users to come along and use, which is the whole point of this series of articles.
Example: Flattening EC2 Configuration
The following is an example of flattening some data in terraform, it will not create any resources on its own, so it won't cost you anything to run a plan or apply.
In the flattened_ec2
block we're saying, step into locals
> ec2
and then var.environment
which is dev
in this case, here it finds the example-app
key. The next line says "Take note of the count
value", in this case 2
, and merge what follows with the other key, value pairs you find.
variable "environment" {
default = "dev"
}
locals {
ec2 = {
dev = {
example-app = {
region = "eu-west-1"
vpc = "arryw-dev-dublin"
count = 2
az = ["eu-west-1a", "eu-west-1b"]
key_pair = "arryw"
policies = ["rds"]
}
}
}
flattened_ec2 = flatten([
for k, v in local.ec2[var.environment] : [
for i in range(v.count) : merge(
v,
{
ec2_key = k
index = i
indexed_key = "${k}_${i}"
indexed_name = "${k}-${i}"
az = v["az"][i % length(v["az"])]
}
)
]
])
ec2_map = {
for ec2 in local.flattened_ec2 : ec2.indexed_key => ec2
}
}
output "flattened_ec2" {
value = local.flattened_ec2
}
output "ec2_map" {
value = local.ec2_map
}
The output blocks are just there to show you what the manipulated data looks like, below is the output of a terraform plan
command.
You can see the indexed_key
and indexed_name
are created and merged with other values in the ec2
block, the az
(availability zone) is set to cycle through the az
list. The indexed key is created because that will match the terraform resource reference, so it will be easier to refer back to it from other resource blocks, the indexed name is just a nicer way to display it if you want to use it for naming resources.
Changes to Outputs:
+ flattened_ec2 = [
+ {
+ az = "eu-west-1a"
+ count = 2
+ ec2_key = "example-app"
+ index = 0
+ indexed_key = "example-app_0"
+ indexed_name = "example-app-0"
+ key_pair = "arryw"
+ policies = [
+ "rds",
]
+ region = "eu-west-1"
+ vpc = "arryw-dev-dublin"
},
+ {
+ az = "eu-west-1b"
+ count = 2
+ ec2_key = "example-app"
+ index = 1
+ indexed_key = "example-app_1"
+ indexed_name = "example-app-1"
+ key_pair = "arryw"
+ policies = [
+ "rds",
]
+ region = "eu-west-1"
+ vpc = "arryw-dev-dublin"
},
]
+ ec2_map = {
+ example-app_0 = {
+ az = "eu-west-1a"
+ count = 2
+ ec2_key = "example-app"
+ index = 0
+ indexed_key = "example-app_0"
+ indexed_name = "example-app-0"
+ key_pair = "arryw"
+ policies = [
+ "rds",
]
+ region = "eu-west-1"
+ vpc = "arryw-dev-dublin"
}
+ example-app_1 = {
+ az = "eu-west-1b"
+ count = 2
+ ec2_key = "example-app"
+ index = 1
+ indexed_key = "example-app_1"
+ indexed_name = "example-app-1"
+ key_pair = "arryw"
+ policies = [
+ "rds",
]
+ region = "eu-west-1"
+ vpc = "arryw-dev-dublin"
}
}
- The data in the
local.ec2
block is an Object, a sort of nested map, in this form, it is an efficient and user-friendly way to define the resources you want to create. - In the
flattened_ec2
output, the flattening has created a list of maps, if we created resources from this data, terraform wouldn't have anything to refer to them as, so it would just use their index, i.e. 0, 1, 2, etc. This can lead to problems down the line and I would probably just avoid it if at all possible. - In the
ec2_map
output it has created a map of maps, the key for each defined resource is back so that terraform refers to them as something readable instead of an index. This is effectively a long and inefficient form of the originallocal.ec2
block, but it's consumable by resource blocks and it happens in the background, so you don't have to worry about it.
Going a step further
Let's look at another example of flattening data and show how you can use it in a resource block, referring back to the convoluted route table example I mentioned earlier.
In this example we define the VPC & subnets in a tuple in the locals.vpc[var.environment]
block, for this use case it is an elegant way to lay things out, but when it comes to creating subnets, it means we either create separate resource blocks for each subnet (that refer to each.value[2[0]]
and each.value[2[1]]
etc.) or we manipulate the data again. When we flatten the subnets, we can use the same flattened output to create the route tables.
locals {
vpc = {
dev = {
# vpc_key = [region, vpc_cidr, [subnet_cidr1, subnet_cidr2, subnet_cidr3]]
arryw-dev-dublin = ["eu-west-1", "10.0.0.0/23", ["10.0.0.0/26", "10.0.0.64/26", "10.0.0.128/26"]]
}
}
public_routes = {
dev = {
some_public_eni_route = ["eu-west-1", "arryw-dev-dublin", "eni", "10.2.0.0/23"]
some_public_route = ["eu-west-1", "arryw-dev-dublin", "igw", "8.8.8.8/32"]
}
}
# Flatten the structure for subnet creation
flattened_subnets = flatten([
for k, v in local.vpc[var.environment] : [
for idx, subnet in v[2] : {
vpc_key = k
region = v[0]
cidr_block = subnet
az = "${v[0]}${element(["a", "b", "c", "d"], idx)}"
}
]
])
# Make maps of the subnets for easy reference
subnet_map = {
for subnet in local.flattened_subnets : "${subnet.vpc_key}-${subnet.az}" => subnet
}
}
resource "aws_vpc" "vpc_dub" {
for_each = {
for k, v in local.vpc[var.environment] : k => v
if v.0 == "eu-west-1"
}
provider = aws.dublin
cidr_block = each.value[1]
enable_dns_support = true
enable_dns_hostnames = true
tags = merge(local.global_tags, {
Name = each.key
})
}
resource "aws_subnet" "public_subnet_dub" {
for_each = {
for idx, subnet in local.subnet_map : idx => subnet
if subnet.region == "eu-west-1"
}
provider = aws.dublin
vpc_id = aws_vpc.vpc_dub[each.value.vpc_key].id
cidr_block = each.value.cidr_block
availability_zone = each.value.az
}
resource "aws_route_table" "public_dub" {
for_each = {
for idx, rt in local.subnet_map : idx => rt
if rt.region == "eu-west-1"
}
provider = aws.dublin
vpc_id = aws_vpc.vpc_dub[each.value.vpc_key].id
}
resource "aws_route_table_association" "public_dub" {
for_each = {
for idx, rt in local.subnet_map : idx => rt
if rt.region == "eu-west-1"
}
provider = aws.dublin
subnet_id = aws_subnet.public_subnet_dub[each.key].id
route_table_id = aws_route_table.public_dub[each.key].id
}
Now, the resource blocks for the VPC, Subnets, Route Tables and Route Table Associations are all created from data we've already supplied, in one form or another. After this we move on to adding routes to the correct route tables, and this is where it starts to get complex.
If we want to throw some routes down to the network interfaces on the EC2 instances, we have to tie the EC2 instances, the route tables & the routes together, only then can we create the routes while being sure to reference the correct ENI (Elastic Network Interface).
Up until now, quite conveniently, all data manipulation has happened on a global scale, i.e., we didn't care about the region, we could filter on that at the last possible stage. Now that we need to reference route tables and ENIs directly, we would have to keep data manipulation regional.
locals {
# Assign an index to route tables
rt_dub_index = {
for k in sort(keys(aws_route_table.public_dub)) : k => {
id = aws_route_table.public_dub[k].id
index = index(sort(keys(aws_route_table.public_dub)), k)
az = try(regex(".*(eu-west-1[abcd])", k)[0], "default-value-or-empty-string")
vpc_id = aws_route_table.public_dub[k].vpc_id
}
}
# Flatten route tables and routes
flattened_rt_routes_dub = local.rt_dub_index != null ? flatten([
for rt_key, rt in local.rt_dub_index : [
for route_key, route in local.public_routes[var.environment] : merge(
rt,
{
vpc_id = rt.vpc_id
vpc_key = route[1]
rt_key = rt_key
route_key = route_key
route_region = route[0]
route_target = route[2]
route_cidr = route[3]
az = rt.az
}
)
if route[0] == "eu-west-1"
]
]) : []
# Convert to maps with readable, immutable keys
rt_routes_map_dub = {
for idx, route in local.flattened_rt_routes_dub : "${route.rt_key}-${route.route_key}" => route
}
# Flatted route tables, routes & ec2 instances so we can send a route to the ENI
flattened_rt_routes_ec2_dub = length(local.rt_dub_index) > 0 ? flatten([
for rt_key, rt in local.flattened_rt_routes_dub : [
for ec2_key, ec2 in local.ec2_map : {
rt_key = rt.rt_key
rt_index = rt.index
route_key = rt.route_key
route_name = "${rt.route_key}-${rt.az}-${ec2_key}"
route_target = rt.route_target
route_cidr = rt.route_cidr
az = rt.az
ec2_index = ec2.index
ec2_key = ec2_key
ec2_az = ec2.az
ec2_eni_route = try(ec2.eni_route, null)
}
if ec2.index == rt.index
]
]) : []
# Convert to maps with readable, immutable keys
rt_routes_ec2_map_dub = {
for idx, route in local.flattened_rt_routes_ec2_dub : route.route_name => route
}
}
output "rt_routes_ec2_map_dub" {
value = local.rt_routes_ec2_map_dub
}
- The
flattened_rt_routes_dub
block brings route tables and routes together, but we filter it on the region, this leaves us 3(subnets) x 2(routes) = 6 routes. this is important to know because further flattening steps can cause duplicates, which will cause a plan to fail. - The
flattened_rt_routes_ec2_dub
block brings the routes, the route tables and the ec2 instances together. We use the index that was created earlier to match the route table to the ec2 instance, this is how we prevent duplicates, we only want to create a route for each ec2 instance and there are only 2 ec2 instances in this example.
The output of rt_routes_ec2_map_dub
now looks like this:
Changes to Outputs:
+ rt_routes_ec2_map_dub = {
+ some_public_eni_route-eu-west-1a-example-app_0 = {
+ az = "eu-west-1a"
+ ec2_az = "eu-west-1a"
+ ec2_eni_route = null
+ ec2_index = 0
+ ec2_key = "example-app_0"
+ route_cidr = "10.2.0.0/23"
+ route_key = "some_public_eni_route"
+ route_name = "some_public_eni_route-eu-west-1a-example-app_0"
+ route_target = "eni"
+ rt_index = 0
+ rt_key = "arryw-dev-dublin-eu-west-1a"
}
+ some_public_eni_route-eu-west-1b-example-app_1 = {
+ az = "eu-west-1b"
+ ec2_az = "eu-west-1b"
+ ec2_eni_route = null
+ ec2_index = 1
+ ec2_key = "example-app_1"
+ route_cidr = "10.2.0.0/23"
+ route_key = "some_public_eni_route"
+ route_name = "some_public_eni_route-eu-west-1b-example-app_1"
+ route_target = "eni"
+ rt_index = 1
+ rt_key = "arryw-dev-dublin-eu-west-1b"
}
+ some_public_route-eu-west-1a-example-app_0 = {
+ az = "eu-west-1a"
+ ec2_az = "eu-west-1a"
+ ec2_eni_route = null
+ ec2_index = 0
+ ec2_key = "example-app_0"
+ route_cidr = "8.8.8.8/32"
+ route_key = "some_public_route"
+ route_name = "some_public_route-eu-west-1a-example-app_0"
+ route_target = "igw"
+ rt_index = 0
+ rt_key = "arryw-dev-dublin-eu-west-1a"
}
+ some_public_route-eu-west-1b-example-app_1 = {
+ az = "eu-west-1b"
+ ec2_az = "eu-west-1b"
+ ec2_eni_route = null
+ ec2_index = 1
+ ec2_key = "example-app_1"
+ route_cidr = "8.8.8.8/32"
+ route_key = "some_public_route"
+ route_name = "some_public_route-eu-west-1b-example-app_1"
+ route_target = "igw"
+ rt_index = 1
+ rt_key = "arryw-dev-dublin-eu-west-1b"
}
}
The key of each map is now a combination of the route key, the az and the ec2 instance key. We do this because each key fed into a resource block needs to be unique, if a plan fails because of a duplicate key, it will indicate a problem with the data.
We can now create routes with the data, like this:
resource "aws_route" "public_igw_route_dub" {
for_each = {
for idx, route in local.rt_routes_map_dub : idx => route
if route.route_target == "igw" && route.route_region == "eu-west-1"
}
provider = aws.dublin
route_table_id = aws_route_table.public_dub[each.value.rt_key].id
destination_cidr_block = each.value.route_cidr
gateway_id = aws_internet_gateway.igw_dub[each.value.vpc_key].id
}
resource "aws_route" "public_eni_route_dub" {
for_each = {
for idx, route in local.flattened_rt_routes_ec2_dub : route.route_name => route
if route.route_target == "eni" && route.ec2_eni_route == true
}
provider = aws.dublin
route_table_id = aws_route_table.public_dub[each.value.rt_key].id
destination_cidr_block = each.value.route_cidr
network_interface_id = aws_instance.ec2_dub[each.value.ec2_key].primary_network_interface_id
}
Conclusion
This got quite complex, I'm sorry about that, but it's a good example of how you can manipulate data in terraform to be consumed by resources. It's not always necessary to go this far, but when you do, it's good to know how to do it. Please let me know if you have any questions or if you'd like me to cover a specific topic in more detail.