Flattening data in Terraform

A deep-dive on manipulating data in Terraform, to prepare it for use in resource blocks

Updated April, 2024

This is part of a collection of articles covering an Introduction to Advanced Terraform, it is deliberately of a certain style, you can find out more in the first post.

Introduction

My most common use case for flattening data in Terraform is when I need to add a count to a resource and then have that consumed by a for_each, in case you're not aware, you cannot have a count and for_each on the same resource block. for_each lets you add conditions in your resource blocks, like if value.region == "eu-west-1" then use the provider in this block or if value.engine == "aurora" then create this RDS cluster (and not an RDS instance which is a different thing).

A problem I've found when you start engaging in flattening is, it tends to lead to more flattening, you flatten the local.ec2 to allow the data to be used in a resource block, then you need to flatten the attached disks so you can attach them to the correct EC2 instance.

In a particularly convoluted example, when dealing with things like router or firewall appliances in the cloud, you want to throw some routes in the routing table at the network interface of an EC2 instance, all while keeping the input data in the locals files clean and easy to read.

It's achievable, I wouldn't call it particularly elegant, but it keeps the resource definitions looking nice for less "In the weeds" users to come along and use, which is the whole point of this series of articles.

Example: Flattening EC2 Configuration

The following is an example of flattening some data in terraform, it will not create any resources on its own, so it won't cost you anything to run a plan or apply.

In the flattened_ec2 block we're saying, step into locals > ec2 and then var.environment which is dev in this case, here it finds the example-app key. The next line says "Take note of the count value", in this case 2, and merge what follows with the other key, value pairs you find.

variable "environment" {
  default = "dev"
}

locals {
  ec2 = {
    dev = {
      example-app = {
        region   = "eu-west-1"
        vpc      = "arryw-dev-dublin"
        count    = 2
        az       = ["eu-west-1a", "eu-west-1b"]
        key_pair = "arryw"
        policies = ["rds"]
      }
    }
  }

  flattened_ec2 = flatten([
    for k, v in local.ec2[var.environment] : [
      for i in range(v.count) : merge(
        v,
        {
          ec2_key      = k
          index        = i
          indexed_key  = "${k}_${i}"
          indexed_name = "${k}-${i}"
          az           = v["az"][i % length(v["az"])]
        }
      )
    ]
  ])

  ec2_map = {
    for ec2 in local.flattened_ec2 : ec2.indexed_key => ec2
  }
}

output "flattened_ec2" {
  value = local.flattened_ec2
}
output "ec2_map" {
  value = local.ec2_map
}

The output blocks are just there to show you what the manipulated data looks like, below is the output of a terraform plan command.

You can see the indexed_key and indexed_name are created and merged with other values in the ec2 block, the az (availability zone) is set to cycle through the az list. The indexed key is created because that will match the terraform resource reference, so it will be easier to refer back to it from other resource blocks, the indexed name is just a nicer way to display it if you want to use it for naming resources.

Changes to Outputs:
  + flattened_ec2 = [
      + {
          + az           = "eu-west-1a"
          + count        = 2
          + ec2_key      = "example-app"
          + index        = 0
          + indexed_key  = "example-app_0"
          + indexed_name = "example-app-0"
          + key_pair     = "arryw"
          + policies     = [
              + "rds",
            ]
          + region       = "eu-west-1"
          + vpc          = "arryw-dev-dublin"
        },
      + {
          + az           = "eu-west-1b"
          + count        = 2
          + ec2_key      = "example-app"
          + index        = 1
          + indexed_key  = "example-app_1"
          + indexed_name = "example-app-1"
          + key_pair     = "arryw"
          + policies     = [
              + "rds",
            ]
          + region       = "eu-west-1"
          + vpc          = "arryw-dev-dublin"
        },
    ]
  + ec2_map = {
      + example-app_0 = {
          + az           = "eu-west-1a"
          + count        = 2
          + ec2_key      = "example-app"
          + index        = 0
          + indexed_key  = "example-app_0"
          + indexed_name = "example-app-0"
          + key_pair     = "arryw"
          + policies     = [
              + "rds",
            ]
          + region       = "eu-west-1"
          + vpc          = "arryw-dev-dublin"
        }
      + example-app_1 = {
          + az           = "eu-west-1b"
          + count        = 2
          + ec2_key      = "example-app"
          + index        = 1
          + indexed_key  = "example-app_1"
          + indexed_name = "example-app-1"
          + key_pair     = "arryw"
          + policies     = [
              + "rds",
            ]
          + region       = "eu-west-1"
          + vpc          = "arryw-dev-dublin"
        }
    }

The data in the local.ec2 block is an Object, a sort of nested map, in this form, it is an efficient and user-friendly way to define the resources you want to create.
In the flattened_ec2 output, the flattening has created a list of maps, if we created resources from this data, terraform wouldn't have anything to refer to them as, so it would just use their index, i.e. 0, 1, 2, etc. This can lead to problems down the line and I would probably just avoid it if at all possible.
In the ec2_map output it has created a map of maps, the key for each defined resource is back so that terraform refers to them as something readable instead of an index. This is effectively a long and inefficient form of the original local.ec2 block, but it's consumable by resource blocks and it happens in the background, so you don't have to worry about it.

Going a step further

Let's look at another example of flattening data and show how you can use it in a resource block, referring back to the convoluted route table example I mentioned earlier.

In this example we define the VPC & subnets in a tuple in the locals.vpc[var.environment] block, for this use case it is an elegant way to lay things out, but when it comes to creating subnets, it means we either create separate resource blocks for each subnet (that refer to each.value[2[0]] and each.value[2[1]] etc.) or we manipulate the data again. When we flatten the subnets, we can use the same flattened output to create the route tables.

locals {
  vpc = {
    dev = {
      # vpc_key = [region, vpc_cidr, [subnet_cidr1, subnet_cidr2, subnet_cidr3]]
      arryw-dev-dublin = ["eu-west-1", "10.0.0.0/23", ["10.0.0.0/26", "10.0.0.64/26", "10.0.0.128/26"]]
    }
  }

  public_routes = {
    dev = {
      some_public_eni_route = ["eu-west-1", "arryw-dev-dublin", "eni", "10.2.0.0/23"]
      some_public_route     = ["eu-west-1", "arryw-dev-dublin", "igw", "8.8.8.8/32"]
    }
  }

  # Flatten the structure for subnet creation
  flattened_subnets = flatten([
    for k, v in local.vpc[var.environment] : [
      for idx, subnet in v[2] : {
        vpc_key    = k
        region     = v[0]
        cidr_block = subnet
        az         = "${v[0]}${element(["a", "b", "c", "d"], idx)}"
      }
    ]
  ])
  # Make maps of the subnets for easy reference
  subnet_map = {
    for subnet in local.flattened_subnets : "${subnet.vpc_key}-${subnet.az}" => subnet
  }
}

resource "aws_vpc" "vpc_dub" {
  for_each = {
    for k, v in local.vpc[var.environment] : k => v
    if v.0 == "eu-west-1"
  }
  provider             = aws.dublin
  cidr_block           = each.value[1]
  enable_dns_support   = true
  enable_dns_hostnames = true
  tags                 = merge(local.global_tags, {
    Name = each.key
  })
}
resource "aws_subnet" "public_subnet_dub" {
  for_each = {
    for idx, subnet in local.subnet_map : idx => subnet
    if subnet.region == "eu-west-1"
  }
  provider          = aws.dublin
  vpc_id            = aws_vpc.vpc_dub[each.value.vpc_key].id
  cidr_block        = each.value.cidr_block
  availability_zone = each.value.az
}
resource "aws_route_table" "public_dub" {
  for_each = {
    for idx, rt in local.subnet_map : idx => rt
    if rt.region == "eu-west-1"
  }
  provider = aws.dublin
  vpc_id   = aws_vpc.vpc_dub[each.value.vpc_key].id
}
resource "aws_route_table_association" "public_dub" {
  for_each = {
    for idx, rt in local.subnet_map : idx => rt
    if rt.region == "eu-west-1"
  }
  provider       = aws.dublin
  subnet_id      = aws_subnet.public_subnet_dub[each.key].id
  route_table_id = aws_route_table.public_dub[each.key].id
}

Now, the resource blocks for the VPC, Subnets, Route Tables and Route Table Associations are all created from data we've already supplied, in one form or another. After this we move on to adding routes to the correct route tables, and this is where it starts to get complex.

If we want to throw some routes down to the network interfaces on the EC2 instances, we have to tie the EC2 instances, the route tables & the routes together, only then can we create the routes while being sure to reference the correct ENI (Elastic Network Interface).

Up until now, quite conveniently, all data manipulation has happened on a global scale, i.e., we didn't care about the region, we could filter on that at the last possible stage. Now that we need to reference route tables and ENIs directly, we would have to keep data manipulation regional.

locals {

  # Assign an index to route tables
  rt_dub_index = {
    for k in sort(keys(aws_route_table.public_dub)) : k => {
      id     = aws_route_table.public_dub[k].id
      index  = index(sort(keys(aws_route_table.public_dub)), k)
      az     = try(regex(".*(eu-west-1[abcd])", k)[0], "default-value-or-empty-string")
      vpc_id = aws_route_table.public_dub[k].vpc_id
    }
  }

  # Flatten route tables and routes
  flattened_rt_routes_dub = local.rt_dub_index != null ? flatten([
    for rt_key, rt in local.rt_dub_index : [
      for route_key, route in local.public_routes[var.environment] : merge(
        rt,
        {
          vpc_id       = rt.vpc_id
          vpc_key      = route[1]
          rt_key       = rt_key
          route_key    = route_key
          route_region = route[0]
          route_target = route[2]
          route_cidr   = route[3]
          az           = rt.az
        }
      )
      if route[0] == "eu-west-1"
    ]
  ]) : []

  # Convert to maps with readable, immutable keys
  rt_routes_map_dub = {
    for idx, route in local.flattened_rt_routes_dub : "${route.rt_key}-${route.route_key}" => route
  }

  # Flatted route tables, routes & ec2 instances so we can send a route to the ENI
  flattened_rt_routes_ec2_dub = length(local.rt_dub_index) > 0 ? flatten([
    for rt_key, rt in local.flattened_rt_routes_dub : [
      for ec2_key, ec2 in local.ec2_map : {
        rt_key        = rt.rt_key
        rt_index      = rt.index
        route_key     = rt.route_key
        route_name    = "${rt.route_key}-${rt.az}-${ec2_key}"
        route_target  = rt.route_target
        route_cidr    = rt.route_cidr
        az            = rt.az
        ec2_index     = ec2.index
        ec2_key       = ec2_key
        ec2_az        = ec2.az
        ec2_eni_route = try(ec2.eni_route, null)
      }
      if ec2.index == rt.index
    ]
  ]) : []

  # Convert to maps with readable, immutable keys
  rt_routes_ec2_map_dub = {
    for idx, route in local.flattened_rt_routes_ec2_dub : route.route_name => route
  }
}

output "rt_routes_ec2_map_dub" {
  value = local.rt_routes_ec2_map_dub
}

The flattened_rt_routes_dub block brings route tables and routes together, but we filter it on the region, this leaves us 3(subnets) x 2(routes) = 6 routes. this is important to know because further flattening steps can cause duplicates, which will cause a plan to fail.
The flattened_rt_routes_ec2_dub block brings the routes, the route tables and the ec2 instances together. We use the index that was created earlier to match the route table to the ec2 instance, this is how we prevent duplicates, we only want to create a route for each ec2 instance and there are only 2 ec2 instances in this example.

The output of rt_routes_ec2_map_dub now looks like this:

Changes to Outputs:
  + rt_routes_ec2_map_dub = {
      + some_public_eni_route-eu-west-1a-example-app_0 = {
          + az            = "eu-west-1a"
          + ec2_az        = "eu-west-1a"
          + ec2_eni_route = null
          + ec2_index     = 0
          + ec2_key       = "example-app_0"
          + route_cidr    = "10.2.0.0/23"
          + route_key     = "some_public_eni_route"
          + route_name    = "some_public_eni_route-eu-west-1a-example-app_0"
          + route_target  = "eni"
          + rt_index      = 0
          + rt_key        = "arryw-dev-dublin-eu-west-1a"
        }
      + some_public_eni_route-eu-west-1b-example-app_1 = {
          + az            = "eu-west-1b"
          + ec2_az        = "eu-west-1b"
          + ec2_eni_route = null
          + ec2_index     = 1
          + ec2_key       = "example-app_1"
          + route_cidr    = "10.2.0.0/23"
          + route_key     = "some_public_eni_route"
          + route_name    = "some_public_eni_route-eu-west-1b-example-app_1"
          + route_target  = "eni"
          + rt_index      = 1
          + rt_key        = "arryw-dev-dublin-eu-west-1b"
        }
      + some_public_route-eu-west-1a-example-app_0     = {
          + az            = "eu-west-1a"
          + ec2_az        = "eu-west-1a"
          + ec2_eni_route = null
          + ec2_index     = 0
          + ec2_key       = "example-app_0"
          + route_cidr    = "8.8.8.8/32"
          + route_key     = "some_public_route"
          + route_name    = "some_public_route-eu-west-1a-example-app_0"
          + route_target  = "igw"
          + rt_index      = 0
          + rt_key        = "arryw-dev-dublin-eu-west-1a"
        }
      + some_public_route-eu-west-1b-example-app_1     = {
          + az            = "eu-west-1b"
          + ec2_az        = "eu-west-1b"
          + ec2_eni_route = null
          + ec2_index     = 1
          + ec2_key       = "example-app_1"
          + route_cidr    = "8.8.8.8/32"
          + route_key     = "some_public_route"
          + route_name    = "some_public_route-eu-west-1b-example-app_1"
          + route_target  = "igw"
          + rt_index      = 1
          + rt_key        = "arryw-dev-dublin-eu-west-1b"
        }
    }

The key of each map is now a combination of the route key, the az and the ec2 instance key. We do this because each key fed into a resource block needs to be unique, if a plan fails because of a duplicate key, it will indicate a problem with the data.

We can now create routes with the data, like this:

resource "aws_route" "public_igw_route_dub" {
  for_each = {
    for idx, route in local.rt_routes_map_dub : idx => route
    if route.route_target == "igw" && route.route_region == "eu-west-1"
  }
  provider               = aws.dublin
  route_table_id         = aws_route_table.public_dub[each.value.rt_key].id
  destination_cidr_block = each.value.route_cidr
  gateway_id             = aws_internet_gateway.igw_dub[each.value.vpc_key].id
}

resource "aws_route" "public_eni_route_dub" {
  for_each = {
    for idx, route in local.flattened_rt_routes_ec2_dub : route.route_name => route
    if route.route_target == "eni" && route.ec2_eni_route == true
  }
  provider               = aws.dublin
  route_table_id         = aws_route_table.public_dub[each.value.rt_key].id
  destination_cidr_block = each.value.route_cidr
  network_interface_id   = aws_instance.ec2_dub[each.value.ec2_key].primary_network_interface_id
}

Conclusion