Fork me on GitHub
#aws
<
2022-06-11
>
Martynas Maciulevičius05:06:09

If I use terraform and I want to do rolling update of my container (no downtime). Should I use this monstrosity (tutorial shows cloud formation but I also have found one for kubernetes some time ago): https://github.com/souzaxx/Rolling-Update-Terraform/blob/master/main.tf#L18 This is the source but it contains too many stupid memes: https://dev.to/souzaxx/rolling-update-ec2-with-terraform-13bf Yes, I understand that I can't restart my instances one by one directly from terraform. But then is there a better way than to inline a bunch of configs? For instance this tutorial says that I can use create_before_destroy: https://medium.com/inspiredbrilliance/rolling-deployment-with-terraform-on-aws-6a0d1587c82c But then what if my container doesn't start immediately? What if it starts after... 1 minute...? This second tutorial also has Load balancer health check that's based on ELB. Will it make it so that there won't be downtime if I would configure it correctly? Or does it mean that I should use CF or ECS/EKS to handle my rolling upgrade? I also want to use Docker and not learn Ansible.

Max13:06:27

I've never managed instances directly from terraform. I always manage an auto scaling group and let it manage instances instead. Then you can use all the baked in update strategies, and if all else fails, kill them and let them respawn one by one yourself

Martynas Maciulevičius13:06:17

I found that I can specify the scaling group from terraform and do rolling deployment. The guide said that it will recreate the scaling group but I'm neither sure what it means nor the guide said that it will destroy running instances. Probably it matters when it fails to deploy and then the currently running instances shouldn't drain. Source: https://medium.com/inspiredbrilliance/rolling-deployment-with-terraform-on-aws-6a0d1587c82c

Max13:06:57

It depends whether the symbol in the plan is or +/-. +/- means it'll destroy and recreate the ASG and all its instances, if this happens, make sure you haven't messed up your tf change. means you're changing the group configuration in place, instances will get the updated config when they are replaced, which happens either when you kick them or die for any other reason

Martynas Maciulevičius13:06:26

create_before_destroy = true is this what your symbol means?

Max13:06:37

Another way to do it if your ASG is behind a load balancer is to create a whole new ASG, let it fill up, then tear down the old one. I think that's Netflix's model

Max13:06:31

In general I prefer not to manage deployments in terraform. I only manage the stuff that doesn't change and manage deployments from somewhere else

Martynas Maciulevičius13:06:41

So probably this example is what you told about Netflix's model: https://robmorgan.id.au/posts/rolling-deploys-on-aws-using-terraform/ > We set the waitfor_elb_capacity_ attribute of the auto scaling group, so Terraform does not prematurely terminate the current auto scaling group. Is this what you meant?

Max13:06:37

Right but that uses terraform to orchestrate the deploy which I would not do

Max13:06:06

I'd manage it from a script or codedeploy or something else, and only terraform the stuff that doesn't change

Max13:06:04

I know some people use terraform to do that but I feel like it goes against the grain with terraform’s declarative approach

Martynas Maciulevičius13:06:22

I don't yet know what I want. For me less tools and less config is currently better. I currently try to dig through 1K LOC terraform config that I found online and I want to understand how to do a basic thing. So for now for me less tools will be better. I don't want to configure code deploy and CD yet. I'll be very happy when I'll push a Docker image to deploy without getting into packer.

Max13:06:46

Stepping back a little, what are you trying to accomplish?

Martynas Maciulevičius13:06:46

I tried to understand what level of abstraction I should use for my deployment of stateful backend(s). I can run a single back-end on Heroku right now but I'll want to run multiple instances of the same back-end for redundancy (well it's stateful and I'll have a consensus protocol). This is why I want to have rolling deployments. So I'll go with ECS and Fargate for now (nothing is more constant than temporary, yes). Currently I want to run one docker container instance in one scaling group that I can reach from internets. The most basic thing but with the "scalability" knob. And the difference between this and Heroku is that the instance can see its peers. This is important.

Max13:06:34

I would strongly recommend using ecs/fargste unless you have a good reason to be getting into ec2s. Even for stateful services, the overhead is sooo much lower

Max13:06:02

If you need a persistent file system you can just stick efs on it

Martynas Maciulevičius13:06:10

I was considering EC2 but you misunderstood. I'll take Fargate. No. It's all in memory.

Max13:06:33

Aaahhh ok got it.

Martynas Maciulevičius13:06:40

It's actually more complicated than that. But this is the deployment bit. The app bit is more complicated because I'll have to do a consensus algorithm. That's going to be fun. But I want to not limit myself with this deployment. Maybe I'm stupid enough to try it 😄 I hope that's probably a good thing...

Max13:06:46

As for deployments I tend to tell terraform to ignore all of or part of the task definition, and deploy by changing the TD. You could even do that manually initially

Max13:06:14

So same thing as above with an asg

Martynas Maciulevičius13:06:59

Yes, I could do the ASG manually when I'll have it. 😄 But I was suggested to not bother with UI and go with terraform. So I'm trying to do it this way. Maybe it will be alright.

Martynas Maciulevičius13:06:33

What's actually more interesting is how will I assign different IPs to my backends when I'll redeploy the same infra :thinking_face: And how will the already deployed backends know about these new backends :thinking_face: Maybe that could mean that I can only deploy without recreating the ASG.

viesti16:06:42

didn't read whole thread, but with say a webapp that has ALB (Application Load Balancer) in front of thr backends run as ECS containers, with ECS you have a thing called Service in which you define how many copies of containers you run with a TaskDefinition, then you can tell ECS to use a launch type. If you use Fargate launch type, you will get a Fargate VM per task (task is a group of containers that run on same docker daemon). You tell the ECS Service what is the load balancer that the containers register to with a Target Group. When ECS starts Fargate VMs, they VMs will get fresh IP address from the VPC and the ECS service then tells the Load Balancer what the IP of the new VM is and the port of the container, in order to route traffic to that container on the Fargate VM

viesti16:06:16

you can use EC2 with a Autoscaling group, and then ECS will fit the new containers onto the EC2 machines, based on free memory etc requirements defined in the Task Definition. The ECS Service will then inform the load balancer where the new containers are and deregister old ones

viesti16:06:17

ECS will do this in a rolling manner, you can specify hoe many containers have to be online st all times (I forget what the config was, healthy percentage or something like that)

viesti16:06:03

there's this service discovery thing, which I think is DNS based, which I haven't used, but thinking that the service discovery is something that you might want to use if you're building something like the load balancer, that needs to know where new backends are and what old backends to retire

viesti16:06:13

(sorry braindump :D)

viesti16:06:32

oh and when you update the container image in a TaskDefinition, that means that a the ECS Service will make a new Fargate VM (if you use Fargate) and go through the register/deregister cycle for LB, telling the new IPs to LB. So when updating the app and running it via Fargate, you'll get fresh VMs on every update. With EC2 as launch type, the existing EC2 VMs get reused, so IPs stay same but the container ports differ since new containers get bound to ephemeral ports so that ECS can decide to run old and new container on same host. You can define exposed ports on the host side on your own in the TaskDefinition, but you then need to take care that ports don't collide. This you could do in Terraform, with just say random data source, or you could to the whole bokkeeping yourself with say https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source

viesti16:06:22

(I once used ECS with ASG, and used NLB (Network Load Balancer) in order to do UDP traffic load balancing and that lead to speccing the ports on the TaskDefinition beforehand, and https://registry.terraform.io/providers/hashicorp/random/latest/docs worked for avoiding stepping over ports already in use, at least works till today also :D)

Martynas Maciulevičius17:06:51

Thanks. I already deployed a hello world and it seems to be working. But now I'll have to think about more stuff. Thanks.

Martynas Maciulevičius07:06:50

@U01EB0V3H39 Hey. I was thinking about how to make deploys work and I found this post that says that I shouldn't use latest tag on my containers. Yes, my container could then probably deploy automatically but then I can't roll-back if I want to. For instance a comment of this answer mentions that there is a "force new deployment" parameter that could pick up latest container: https://stackoverflow.com/questions/53510783/ecs-auto-deploy-with-ecr But the answer itself says that I shouldn't use the latest tag for deployment and I agree with it. So I'm not sure what you really meant by "goes against the grain with terraform’s declarative approach" because if I would use a specific version of the container then I would probably be fine. I would be more concerned about the use of Terraform if I would rerun the cluster and use latest tag instead :thinking_face:

viesti09:06:21

I guess the thing is to split the infra and app deployment into modules (app deploy outside of Terraform can be like a "module", if thought that way :)). If everything is in same Terraform state, then if you only want to deploy a new version of the app, all other resources are refreshed (=checked for changes) by Terraform too. Also, with container image tagging, this creates a funny chicken-and-egg dilemma where you need a ECR registry and an image inside the registry on the first deploy, if you refer to the ECR registry from a TaskDefinition. While I've heard advise of keeping app deployment out of Terraform, I've done app deployment with Terraform too, by having two modules: infra and app. The app module contains ECS Service and TaskDefinition, so things like environment variables etc. changes to co-locate the app more then the infra. The app module then takes the container image name as a variable, so you can use tags on the image name. Then the app module just references by remote state to stuff like VPC ID, for telling into which subnet to run the ECS services (with Fargate, you need to tell the network that is used inside the TaskDefinition). So I think you can do app deployment with Terraform too. You then just have two runs of terraform apply: infra and app. If the "datacenter" would happen to grow more, you would then probably have more Terraform root modules (= module with state), for each of the "parts" in the datacenter. The whole "datacenter" doesn't change all the time, probably more often changes happen inside modules, and probably the infra for a module changes seldomly than the code for the app. So this rate of change kind of form modules boundaries.

Martynas Maciulevičius14:06:05

One more catch is that you also create ECR in a specific region. And one more catch which is somewhat strange is that they want you to create ECR per image type. https://stackoverflow.com/questions/48567787/ecs-ecr-is-common-practice-to-have-one-repository-per-image-and-associated-ver So the more image types there are the more ECR repos there will be. This is puzzling because in Azure I could upload any container names. And in here I have to specify it specifically or else it won't work :thinking_face:

viesti17:06:23

personally I haven't found it a problem to have repository per app

Martynas Maciulevičius19:06:49

I've managed to make rolling deployment work by specifying a docker container version. But what puzzles me is that if there is nothing to deploy it still "finds" something to ask as if it changed. I tried to do ignore_changes=[task_definition] but then it doesn't deploy. This is without the "force-redeploy" flag. Probably there is no way around this strange thing. But yes, I'll do a repo per container type 😄 And I also tried to do deployments via Terraform. It works but if I run two deployments at the same time then it doesn't deploy the newer one. But it's probably fine if I'm deploying by hand.