More questions about optimising Docker builds; our code changes often, our dependencies not so much... yet, we produce uberjars that bundle everything together. How difficult is it to make a single jar with all the dependencies, and another jar with just our code?
whats the rationale? to speed up the builds? if yes, are you presumably doing the uberjar builds in docker?
speed up the pushing, and save some storage space
the storage space cannot be saved right? both the code and deps need to be colocated right?
the goal is for docker to cache the dependencies.jar layer so that it gets reused: during pushing & pulling, detect that it's the same so don't go for the network. Also, deduplicate based on hash.
right, so for the fragmented jars thing docker really cant do much and need to come up with a different way to build things in clojure. not really sure if there is an established way to do that. i think its more of a question in #tools-build or something
once thats done, both the jars could be loaded with java -cp deps.jar: -jar app.jar
essentially build the jar just from your code and the rest of it should be in your classpath
I will look into that
My home wifi has a terrible upload speed, so anything to avoid pushing things to ECR again and again helps 😄
well in that case, ill do the push on CI 😛
id always setup the push on CI, hardly ever do it from my machine. avoid tokens and stuff
yeah it's mostly the time spent fine tuning the whole thing that helps
these type of optimisations although fun, dont really pay off for the value they bring in. IMO
too much work for small gains in the larger picture
yes, but, fun
to be honest, shortening the feedback loop for developing an end-to-end docker experience is valuable
plus I gain a (proven) understanding of how docker caching works
yeah but pushing the image out is something thats not done very often. the feedback loop for me is build quick and test locally and push once
push generally = im happy, ship it 😄
all the other steps can be optimised quickly.
i see it akin to git, commit all the time with pushes when happy
Depending on your CI (I think at this point all of them support some way of doing this) you can/should use Docker's caches - https://docs.docker.com/build/cache/backends/ where your build tries to re-use as many layers as possible. With this some of our builds go from 7m to 1m, if dependencies do not change
doing 3-4 builds daily and resulting in a 200mb jar file containing mostly the same does look a bit off
we do the builds in our CI on every master merge
@orestis also id like to know of your perspective of why faster pushes could shorten the feed back loop? the local docker image is as good as the pushed one right?
Because at this point I have to test in production, which involves pushing to AWS ECR.
is it some other services this needs a connection to? or receive ALB trafiic?
Receive ALB traffic, connect to two different database, fetch stuff from S3, sit behind the nginx proxy next to another docker container 😉
This is me testing the actual production setup, not working on the app itself.
So the test is how things run in production, with all the intricacies and edge cases involved.
id setup a local docker-compose thing with the two containers and connect to the real things like RDS/S3 from my machine and test
that for me is a complex/faster way to test
test with some canned data that the ALB would send
We're using elastic beanstalk, which uses its own way to set things up (using docker compose).
It's just not the same. I try to do principled testing (come up with scenarios, ensure/understand how things work) but at some point you hit the edge cases that only appear in the real thing.
I've built something like that - build an image, push to ECR and run a suite of tests run via headless chrome, it was quite brittle though because of random timeouts, CI failures etc