The core logic sits in a pair of EventBridge Schedules, created with the aws_scheduler_schedule
resource. One schedule turns a service off in the evening; another turns it back on the next morning.
resource "aws_scheduler_schedule" "turn_off_in_the_evening" {
name = "${var.service_name}-turn_off_in_the_evening"
# This cron expression will run at 7pm UTC on weekdays.
schedule_expression = "cron(0 19 ? * MON,TUE,WED,THUR,FRI *)"
target {
arn = "arn:aws:scheduler:::aws-sdk:ecs:updateService"
role_arn = aws_iam_role.scheduler.arn
input = jsonencode({
Cluster = var.cluster
Service = var.service_name
DesiredCount = 0
})
}
flexible_time_window {
mode = "OFF"
}
}
resource "aws_scheduler_schedule" "turn_on_in_the_morning" {
name = "${var.service_name}-turn_on_in_the_morning"
# This cron expression will run at 7am UTC on weekdays.
schedule_expression = "cron(0 7 ? * MON,TUE,WED,THUR,FRI *)"
target {
arn = "arn:aws:scheduler:::aws-sdk:ecs:updateService"
role_arn = aws_iam_role.scheduler.arn
input = jsonencode({
Cluster = var.cluster
Service = var.service_name
DesiredCount = var.desired_task_count
})
}
flexible_time_window {
mode = "OFF"
}
}
variable "cluster" { type = string }
variable "service_name" { type = string }
variable "desired_task_count" { type = number }
They’re triggered on a schedule, according to the cron expression. UK office hours are roughly 9 to 5, and the schedules are picked to include these hours plus a bit of “slop”. This is to account for people who work slightly earlier, slightly later, or when the UK timezone doesn’t match UTC.
I do a lot of this sort of “slop” in scheduling code. I’ll accept a bit of inefficiency or redundancy if it means I can get simpler code. I could tighten these schedules so they follow UK office hours more closely, but it would add a lot of complexity for marginal gains. It’s not worth it.
The most interesting bit to me is how the schedule updates the ECS service – it calls the UpdateService API with a payload that I provide. In this case I’m just changing the DesiredCount value, but it seems like this could be used to call other AWS APIs. That feels like it has a lot of potential elsewhere.
We’ve already got a variant of these schedules that turns an EC2 instance off/on outside our working hours, and I imagine this won’t be the last time I play with EventBridge Schedules.
Alongside the two schedules, you need an IAM role that allows EventBridge to modify your ECS services when it runs. This is how our IAM role is defined:
resource "aws_iam_role" "scheduler" {
name = "${var.service_name}-office-hours-scaling"
assume_role_policy = data.aws_iam_policy_document.assume_role.json
}
data "aws_iam_policy_document" "assume_role" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["scheduler.amazonaws.com"]
}
}
}
data "aws_iam_policy_document" "allow_update_service" {
statement {
actions = ["ecs:UpdateService"]
resources = [var.service_arn]
}
}
resource "aws_iam_role_policy" "allow_update_service" {
role = aws_iam_role.scheduler.name
policy = data.aws_iam_policy_document.allow_update_service.json
}
variable "service_arn" { type = string }
variable "service_name" { type = string }
This is pretty standard IAM – create the role, and allow the EventBridge Scheduler service to assume it. Then we create an IAM policy document that allows calling the UpdateService API for the service we’re turning off/on, and we attach that policy document to the role.
This isn’t a lot of Terraform, but it would be annoying to copy/paste this for every service we have. To save ourselves the hassle, it’s included it in our standard ECS service module, and services can opt-in to this behaviour with a single flag:
module "service" {
source = "git::github.com/wellcomecollection/terraform-aws-ecs-service.git//modules/service?ref=v3.15.3"
name = "staging-site"
…
turn_off_outside_office_hours = true
}
Partly this is for readability, but mostly it’s to make this behaviour quick and easy to enable – which means we’re more likely to actually do it.
We’ve already rolled this out to a dozen existing services, and there’s a nice dent in last month’s EC2 bill. As we build out new services, I expect this behaviour to spread ever further.