Fork me on GitHub
#aws
<
2021-12-30
>
orestis18:12:28

Once again I'm looking into using Docker containers in production, in order to simplify and future-proof our Elastic Beanstalk situation. It seems like there's some effort into making both EB and ECS understand docker-compose.yaml files but I'm not sure if it's going to be really supported or if it's half-baked.

orestis18:12:23

So far we haven't used any cloud provisioning tool (e.g. Cloudformation/Terraform/Pulumi), because spinning up an EB environment is simple enough to do via the console and we don't really do it often. But I'd like to rectify this going forward.

orestis18:12:33

So my question is, would you use something like e.g. Terraform to both provision e.g. an ECS cluster with services and task definitions and keep using to do day-to-day deployments? Or would you provision the cluster once then use the AWS API to update Task definitions and do deploys?

lukasz18:12:22

We use Fargate+ECS+ECR quite a lot, all configuration, services, tasks and related AWS resources (ALB, RDS and more) are managed exclusively via Terraform. The only things we do outside of TF are: • triggering deployments (we use mutable tags, so :release points at the latest stable version of the container image built in CI) • scaling up and down manually - this alters the number of instances of given service/task that runs right now, applying TF would roll it back to the "standard" configuration

lukasz18:12:45

These two tasks are done via a couple of scripts that interact with AWS API via AWS CLI

orestis18:12:28

Thanks! So e.g. in your CI you would build new container images, push them to ECR via docker-cli, then perhaps also change the task description or just trigger a deploy to pick up the current :release image via the aws-cli.

lukasz18:12:11

Pretty much: CI runs tests, build the image and pushes it to ECR. Then anybody can trigger a deploy either via GH Actions UI or a script. All other changes to tasks, services etc have to be done in Terraform via normal pull request flow. We have built a couple of scripts that help with things like 'is the latest version of given service built and deployed to ECS' but on a day to day basis, everyone just runs 2 make tasks. Next year we will most likely move to triggering deployments via CI too. One less command to run

orestis18:12:50

How complicated was the initial terraform configuration? Would you do it again if you had to start from scratch?

orestis18:12:07

Also, do you keep TF state in e.g. S3 or elsewhere?

lukasz18:12:06

We use TF Cloud for state storage, but moving to S3 wouldn't be an issue (just bootstrapping requires a couple of manual steps). The thing is I evolved our infra over last 5 years so there was always some prior setup I could lean on, e.g. we used to deploy our services in Docker via Ansible on to bunch of GCP hosts, but all images were already hosted in ECR, so bootstrapping the ECS cluster was that much easier. But yes, I built a library of internal modules that I would use again if I were to start from scratch and had to use AWS

lukasz18:12:56

It looks daunting in the beginning, but in my experience the most complicated bits were around observability and getting all of the metrics and logs working outside of AWS (we use Grafana Cloud, not CloudWatch). Other than that once I got how things are related to each other it was pretty straight forward

lukasz18:12:23

One thing, I wouldn't use any of the public modules for any of this - every infra is a unique snowflake, and it's better to have 100% control over it

orestis18:12:46

Can you explain the bit about public modules? You still need to bottom out to something TF-provided to build e.g. an ECS cluster or an RDS database, right?

lukasz18:12:03

Oh, I mean stuff like this: https://registry.terraform.io/modules/trussworks/ecs-service/aws/latest Of course you will use the AWS provider and built-in AWS resources

orestis18:12:55

Ah yes, I would avoid that out of instinct too. The point is to do exactly what you want, otherwise I'd stick with Elastic Beanstalk and the AWS opinions, at least I have a support contract with them 🙂

orestis18:12:58

Thanks for this info. We're not even in Docker yet, but finally I can see some upside in bringing that complexity onto us.

lukasz18:12:41

Yeah, in the end AWS doesn't care how you manage your infra - but given that I went through all of the SOC2 stuff, investing in IaC pays off, because it helps a lot with maintaining compliance and having things organized. I'd use Terraform for smaller projects too, e.g. orchestrating Digital Ocean resources and their app platform. It's just better than relying on manually clicking in consoles and such.

lukasz18:12:06

In that case, I'd Dockerize first (I think EB supports it), then think about ECS (either with Fargate or EC2)

valtteri18:12:33

Beanstalk supports Docker via ECS with some limitations which may or may not be showstoppers for you. • unit of scaling is your applicationapplication can consist of max 10 containers

valtteri18:12:10

I wouldn’t recommend it as the first choice. 🙂

valtteri18:12:15

(been there, done that)

lukasz18:12:37

ah there you go, I thought it works somewhat similar to App Engine, where Docker is just a different runtime

lukasz18:12:17

so yeah, investing in Fargate+ECS is imho easiest, potentially this might be worth looking into: https://aws.amazon.com/blogs/containers/introducing-aws-copilot/

lukasz18:12:11

oh, that's the Cloud Run equivalent, right?

valtteri18:12:24

I’m not too familiar with the GCP offering so hard to say. But it looks like it’s an easy way to setup a load-balanced docker app

lukasz18:12:06

Yeah, looks pretty much the same - give it an image and magic you have a running service. For simple workloads it's def the way to go

orestis18:12:44

I've seen references about EB using ECS under the hood but I can't find any docs about it. But yes, Dockerize is the first step. App runner unfortunately isn't available in my region (eu-central-1) and I suspect soc2 and our B2B platform would be better suited for something a bit more explicit.

valtteri18:12:32

We’re currently using Fargate heavily. I’m quite happy with it. We setup the infra with CloudFormation and deploy by pushing images to ECS and reloading the containers with aws sdk

orestis18:12:53

Our app is a very basic one, but we are slowly rewriting in Clojure so having everything in containers that communicate with each other make some sense. So far we've been hard coding URLs for this communication.

orestis18:12:26

Fargate seems tempting too, even at a bit higher prices than plain EC2.

valtteri18:12:53

@U7PBP4UVA > The other generic platform, Multicontainer Docker, uses the Amazon Elastic Container Service (Amazon ECS) to coordinate a deployment of multiple Docker containers to an Amazon ECS cluster in an Elastic Beanstalk environment. From https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker.html

lukasz19:12:56

Yeah, it requires cost calculation (ops time vs fargate cost). Plus, there are some thing you won't be able to do with Fargate, but the low operational overhead is imho worth it

👍 1
orestis19:12:42

@U6N4HSMFW ah thanks, that's using an older version of Amazon linux so I never clicked on that, thinking it was deprecated.

orestis19:12:32

Thanks to both of you for the guidance. I guess Terraform in Action goes to my reading list 🙂

valtteri19:12:34

Ahh true. It seems like they have a newer platform version nowadays! Wonder what it does under the hood..

orestis19:12:02

Seems like all glue and bash scripts, judging from the logs. They do seem to run plain docker compose though.

valtteri19:12:47

I’m not sure what to think about that. I mean using docker-compose on EC2. Well why not. At least parity with dev would be 1:1 minus load balancing

orestis19:12:56

Oh, a final Clojure question: are you able to have a nRepl session to a Fargate container? Sometimes we need to do that for support, not for our main deployments but for a bastion server that runs some back-office support tasks. It's a unique snowflake with an Elastic IP, not sure if it's worth bringing over to ECS or even containerise.

lukasz19:12:36

Yes. It comes with one limitation though: in order to expose a port your service has to be a part of a load balancer target group. I have not been able to have nrepl working for non-HTTP services (I need to revise that). So in your task config, you'd expose the nREPL port, and once it's live you can get the IP of the fargate instance and connect to it from within the VPC

valtteri19:12:28

I think you could achieve the same for non-http services with a Network Load Balancer

valtteri19:12:46

Might require some 🍻 and 🚬 to setup though

orestis19:12:47

(We use ssm to start a port forwarding session, it's nice because it generates a CloudTrail log).

orestis19:12:14

We have an http service, even a load balancer to setup SSL via ACM so thankfully that's a moot point for us.

lukasz19:12:47

We have a G-Suite->AWS SSO -> IAM roles setup so we get that too :-) You have to go through SSM to access the one and only ec2 host that we have to connect to the REPL

viesti20:01:03

ECS also has this Exec functionality to run commands and even a shell inside a container, even on Fargate: https://aws.amazon.com/blogs/containers/new-using-amazon-ecs-exec-access-your-containers-fargate-ec2/

viesti20:01:44

so you can say, have socket repl listening locally inside the container and bundle say netcat in the docker image and connect to the socket repl

viesti20:01:13

I haven't yet tested if this could be used for forwarding also

viesti20:01:37

Here's a small Terraform setup to play with ECS Exec: https://github.com/viesti/ecs-exec

lukasz16:01:45

Nice! I'm testing this today in our env, if all goes well we will go EC2-less in a couple of weeks

viesti06:01:03

The tech behind ECS Exec is interesting (from introduction post https://aws.amazon.com/blogs/containers/new-using-amazon-ecs-exec-access-your-containers-fargate-ec2/) > The long story short is that we bind-mount the necessary SSM agent binaries into the container(s). Looking at the description, if the whole ssm agent is mounted into the container, that makes me think that could the port forwarding capability of the agent be also used.... https://github.com/aws/containers-roadmap/issues/1050

orestis18:12:46

Currently we do the latter, with aws cli or even via Clojure you can easily deploy something to Elastic beanstalk, I don't really see the point to use a cloud provisioning tool to do that. Of course both EB and ECS have their own provisioning for deploys behind the scenes, perhaps that's the point?

Max18:12:39

I'd recommend against using terraform to trigger deployments. Usually you end up defining a line between “provisioned infra” and “ephemeral infra”, where the former is controlled by IaC and the former is controlled by CI/CD pipelines. Terraform (and I think other IaC tools?) don't tend to handle that line terribly well, I usually just use ignore_changes on things I expect the pipelines to change. https://github.com/hashicorp/terraform-provider-aws/issues/632.

☝️ 1
orestis18:12:16

I guess the issue being, if you do this, then a terraform apply would probably roll back your deployed code to a previous state. So I guess terraform apply isn't something that's done automatically and without thought, it's something you do once when setting up and then manually on big infra changes.

Max18:12:16

It shouldn't, that's why you use ignore_changes on the infra managed by the pipelines. The whole value prop of terraform is that it's not just fire-and-forget, as you make further infra changes over time it will keep your infra in line with the code. That said, you should always review your terraform plans before applying them, just like you would with code changes. You never want to do it automatically.

Max18:12:21

In general, if you're doing everything right and you run terraform plan on any random day on your main branch, terraform shouldn't find any changes. Getting that to happen just sometimes requires playing with lifecycle stuff a little

orestis18:12:06

Right, so there's a way to make TF ignore changes in some "leaf" nodes that change with every deploy. Good to know. I'm still evaluating on whether I should dedicate a couple of days on reading the available books 🙂

kardan19:12:31

Last time we ended up using a “latest” tag and then triggering deployment with “aws ecs update-service --cluster <cluster> --service <service> --force-new-deployment”

kardan19:12:03

That way terraform apply will always apply whatever is tagged as latest

kardan19:12:33

Maybe there is downsides to this, but that is we’re we ended up 🙂

Max20:12:40

That's also a viable strategy. The only downside is that you don't really get blue/green deployments that way because you can't roll back. If you're not worried about having to trigger a deploy to roll back though it works great

viesti20:01:55

Hmm, I've done Terraform + ECS setups a couple of times and haven't run into problems. The task_definition attribute takes also a ARN of the task definition so you can just simply do

task_definition = aws_ecs_task_definition.backend.arn
to refer to the newly created task definition

viesti20:01:56

perhaps not the most simples example, but here's one with infra split into couple of modules: https://github.com/metosin/cloud-busting/blob/main/aws/ecs-demo/modules/ecs/main.tf#L33

viesti20:01:39

I'd suggest to do a split where you have "infra" root module and a "app" root module, the "app" module containing only aws_ecs_service and aws_ecs_task_definition resources and pick attributes needed by the app module via remote state from the infra module. The infra module then itself can comprise of stateless utility modules (like network, database etc)