docker

orestis 2023-03-16T15:22:23.890459Z

More questions about optimising Docker builds; our code changes often, our dependencies not so much... yet, we produce uberjars that bundle everything together. How difficult is it to make a single jar with all the dependencies, and another jar with just our code?

lispyclouds 2023-03-16T15:25:49.425049Z

whats the rationale? to speed up the builds? if yes, are you presumably doing the uberjar builds in docker?

orestis 2023-03-16T15:26:02.851519Z

speed up the pushing, and save some storage space

lispyclouds 2023-03-16T15:28:39.660609Z

the storage space cannot be saved right? both the code and deps need to be colocated right?

orestis 2023-03-16T15:29:42.429849Z

the goal is for docker to cache the dependencies.jar layer so that it gets reused: during pushing & pulling, detect that it's the same so don't go for the network. Also, deduplicate based on hash.

lispyclouds 2023-03-16T15:33:09.342219Z

right, so for the fragmented jars thing docker really cant do much and need to come up with a different way to build things in clojure. not really sure if there is an established way to do that. i think its more of a question in #tools-build or something

lispyclouds 2023-03-16T15:33:47.952349Z

once thats done, both the jars could be loaded with java -cp deps.jar: -jar app.jar

lispyclouds 2023-03-16T15:34:46.966669Z

essentially build the jar just from your code and the rest of it should be in your classpath

orestis 2023-03-16T15:35:45.548269Z

I will look into that

orestis 2023-03-16T15:37:13.873519Z

My home wifi has a terrible upload speed, so anything to avoid pushing things to ECR again and again helps 😄

lispyclouds 2023-03-16T15:37:34.324209Z

well in that case, ill do the push on CI 😛

lispyclouds 2023-03-16T15:38:26.696769Z

id always setup the push on CI, hardly ever do it from my machine. avoid tokens and stuff

orestis 2023-03-16T15:38:27.157499Z

yeah it's mostly the time spent fine tuning the whole thing that helps

lispyclouds 2023-03-16T15:39:37.021329Z

these type of optimisations although fun, dont really pay off for the value they bring in. IMO

lispyclouds 2023-03-16T15:40:04.291179Z

too much work for small gains in the larger picture

orestis 2023-03-16T15:50:08.553059Z

yes, but, fun

orestis 2023-03-16T15:50:32.985059Z

to be honest, shortening the feedback loop for developing an end-to-end docker experience is valuable

orestis 2023-03-16T15:50:49.700959Z

plus I gain a (proven) understanding of how docker caching works

lispyclouds 2023-03-16T15:51:40.378689Z

yeah but pushing the image out is something thats not done very often. the feedback loop for me is build quick and test locally and push once

lispyclouds 2023-03-16T15:52:00.925249Z

push generally = im happy, ship it 😄

lispyclouds 2023-03-16T15:52:36.196059Z

all the other steps can be optimised quickly.

lispyclouds 2023-03-16T15:53:36.078669Z

i see it akin to git, commit all the time with pushes when happy

lukasz 2023-03-16T15:56:22.938149Z

Depending on your CI (I think at this point all of them support some way of doing this) you can/should use Docker's caches - https://docs.docker.com/build/cache/backends/ where your build tries to re-use as many layers as possible. With this some of our builds go from 7m to 1m, if dependencies do not change

orestis 2023-03-16T15:59:56.694919Z

doing 3-4 builds daily and resulting in a 200mb jar file containing mostly the same does look a bit off

orestis 2023-03-16T16:00:10.380099Z

we do the builds in our CI on every master merge

lispyclouds 2023-03-16T16:00:23.982589Z

@orestis also id like to know of your perspective of why faster pushes could shorten the feed back loop? the local docker image is as good as the pushed one right?

orestis 2023-03-16T16:01:44.849589Z

Because at this point I have to test in production, which involves pushing to AWS ECR.

lispyclouds 2023-03-16T16:02:31.384109Z

is it some other services this needs a connection to? or receive ALB trafiic?

orestis 2023-03-16T16:04:24.083689Z

Receive ALB traffic, connect to two different database, fetch stuff from S3, sit behind the nginx proxy next to another docker container 😉

orestis 2023-03-16T16:04:49.921329Z

This is me testing the actual production setup, not working on the app itself.

orestis 2023-03-16T16:05:14.022169Z

So the test is how things run in production, with all the intricacies and edge cases involved.

lispyclouds 2023-03-16T16:06:05.821519Z

id setup a local docker-compose thing with the two containers and connect to the real things like RDS/S3 from my machine and test

lispyclouds 2023-03-16T16:06:42.432589Z

that for me is a complex/faster way to test

lispyclouds 2023-03-16T16:07:06.117659Z

test with some canned data that the ALB would send

orestis 2023-03-16T16:12:57.303219Z

We're using elastic beanstalk, which uses its own way to set things up (using docker compose).

orestis 2023-03-16T16:14:05.053179Z

It's just not the same. I try to do principled testing (come up with scenarios, ensure/understand how things work) but at some point you hit the edge cases that only appear in the real thing.

lukasz 2023-03-16T16:16:33.026159Z

I've built something like that - build an image, push to ECR and run a suite of tests run via headless chrome, it was quite brittle though because of random timeouts, CI failures etc

cap10morgan 2023-03-16T16:02:20.304159Z

https://github.com/Quantisan/docker-clojure/issues/189