Fork me on GitHub

Good Morning Everyone!


A question for those running Clojure in a container. Do you uberjar your application then use the jar in your container, or do you explode (or copy) the entire source into the Container and run it like that?


Presently, I uberjar


Have you tried pack?


It will generate a docker image directly, and layer your dependencies separately from your application in order to make best use of caching.


I’ve always uberjar’d and copied into the container along with the start script etc


@dharrigan we've always uberjar'd... every successful CI run pushes an image to dockerhub these days... pack looks interesting tho


I wrote it, so biased of course


Although the docker code is contributed and mostly maintained by others.


morning all


nice plug @dominicm - I'll give this a try over the weekend


It also doesn't attempt to merge random files on your classpath together, like uberjarring does. Not worth the headache.


agreed, lein uberjar really gets that bit wrong


not that there aren’t a tonne of trade offs in trying to do it “properly” and preserve classpath roots. I really wish the JVM would support this natively.


i've never hit a problem because of uberjar merging... i can see that there might be issues - are they common ?


It’s not that they’re necessarily common, it’s that they limit what you can do, or at least constrain how you do certain things


things such as having a resource at the same path in multiple jars ?


jar signature stuff too


Exactly that. Sometimes it’s precisely what you want… e.g. if you want to put a particular manifest in a bunch of jars you then need to merge them somehow by hand with a lein plugin / function. It’s pretty trivial to do properly with java’s resource loaders (they support iterating over classpath roots); but uberjarring destroys it. Also it means the classpath is structurally very different between the REPL environment and the production environment.


yeah, i have bad memories of iterating over resources on different classpath roots a long time ago in a galaxy far far away


For example we’ve started packaging various pieces of reference/fixture data in jars; with each one including an edn manifest to the respective resource paths… and having a special loader that finds them all and inserts them to the database. That particular thing is doable with lein uberjar; but you have to supply a merge function to merge them into a new format… so if your format happened to not be something like EDN it might be a bit more fiddly. Really you’re changing when you do that operation; but uberjarring forces it to happen at build time.


ah, is that in a µservices context, mixing a bunch of lib jars into a service ?


Well that is I guess another thing we’re edging towards using the same technique for. We use integrant/duct, so some of our internal libraries which are shared across apps/services expose duct/integrant components. Right now we don’t automatically merge the default configurations for those into the apps system automagically; but we certainly could… not sure if we’d want to yet though. One main usecase is that our applications tend to depend on core vocabularies and reference data; which often want to be shared across apps. e.g. we have several different applications which use the same administrative geographies. In production they might get that from a centralised place; but in development/testing we want to ensure that dependency is tracked and loaded locally. There are other use cases about loading sample data in different contexts, e.g. a representative but small subset of data in dev. A large part of the point of this was to standardise the mechanism and approach we use to bootstrap this stuff in all environments; and allow it to easily be configured differently in each environment dev/test/prod and for each customer.


There's subtle bugs, lein changes the behavior of data readers subtly.


They're merged differently by lein than clojure itself


Changing the subject, a question: Does anyone know of any “modern” CI’s, or patterns of configuring them; that let you avoid conflating a repo with essentially a linear build process. My observation is that most modern CI systems assume a single repo represents the whole build pipeline, and prioritise repeatability via containered environments over anything else. So they encourage people into mono-repo architectures; because they have poor support for tracking “pipeline state”, and have limited capabilities for triggering downstream builds and failing upstream builds if downstream ones failed (Jenkins used to be pretty good at this bit). One ongoing battle is how to tackle long build/test times; which are only going to get longer. The obvious way to fix it aside from persuing localised optimisations (e.g. restructuring to load/tear down fixtures less often), is to split the pipeline into more independent steps that depend on a prior state. One example of this might be if your app has both a js/cljs browser based front end, and a backend, you can build and test the backend and the frontend separately, and if you only change the frontend you can only run the build/tests for that without testing the backend again. So the natural way most CI’s seem to want you to do this is to have multiple repos; but then you have to do more version dances etc — which essentially trade shorter localised build cycles for longer cycle times on integration 😞 I’m aware many CI’s provide build caches, but there seem to be very few features or tooling around how to avoid repeating work that needn’t be repeated within a mono-repo in a CI environment. Essentially I want to decrease the cycle time in CI and fail in meaningful ways faster; whilst also doing more verification / testing in the work pipeline. Yes I want my cake and eat it too! Has anyone else encountered or solved this?


I think in the mono-repo case what needs to happen is for the app to be split and tested as components with test suites mirroring those components; and for test suite success/failure to be cached under a cache key which is effectively the hash of the component trees. Hashing the world, again.


i.e. test results are really just another build artifact.

Wes Hall10:01:45

@rickmoynihan I haven't really dealt with your problem above specifically, but when it comes to CI's I always like to recommend circleci. It does the basics out of the box, and you can more or less get it to do anything else you like because you can just give it a docker container and commands / scripts of your choice. It does "watch" specific github repos etc for the build trigger (there might even be some more flexibility here that I have never used), but in the general sense, these guys I feel have gotten it as close to "right" as I have seen so far.

Wes Hall10:01:55

Also, I am pretty sure it's clojure.


@wesley.hall re circle Yeah, I believe it’s a clojure app. We’re currently on travis. Open to using other things; but at a cursory glance they all seem pretty similar… and non seem to properly tackle this problem. As @lady3janepl says AFAIK Jenkins is the best or at least most flexible in terms of doing this stuff; but normally only in supported languages (e.g. pure java/maven). Jenkins also sucks in lots of ways, e.g. maintainance. I’ve also played with jetbrains teamcity; which seemed to be more Jenkins like than the others, particularly in doing pipelines… seemed ok, but for clojure I couldn’t find a compeling reason to use it over jenkins. I’ve seen people mention gitlab’s CI as having a pipeline feature. I could be wrong, and should probably look again but the hosted CI’s feature offerings seem much the same.


(oh yeah; github have also introduced pipelines recently)


I’ve seen they recently introduced github actions… do they do pipelines though? 👀

Wes Hall10:01:41

@rickmoynihan Yeah, fair enough, I haven't used travis. I've never been a huge fan of Jenkins myself. It seemed to develop "wordpressitis", after the "Hudson" split and have most of it's functionality in all these plugins and UI configured stuff. Maybe that's changed. I generally try to keep to a policy of, anything my CI does should be just as easy to run manually from the command line. Something I would like to do, but haven't found time for yet, is to create a library of "build like stuff", that can be run just using tools-deps, and profiles, and then just use a tool-deps enabled docker image on a tool like circle to run them all form me, but I am probably getting into beard stroky philosophy at this point.


we're using circle - you've got lots of freedom to co-ordinate parallel build steps (which we do) , but we have a monorepo, so i've no idea how it is with multiple repos


@wesley.hall Travis is ok. But it’s hard to be a fan. It was recently acquired by some faceless/unknown-to-me big-corp. A large part of their team seemed to leave afterwards; and since then things anecdotally seem to have become a bit worse / flaky - but hard to pinpoint any real blame/issue… I don’t doubt other CI’s are a better in various ways; but they seem to fundamentally be the more or less the same with similar designs and feature sets. > I generally try to keep to a policy of, anything my CI does should be just as easy to run manually from the command line. 💯 and to be honest this is a big problem I have with a lot of CI’s. Their yaml formats rarely have a command line tool to execute/test/use locally. Yes you can work around to some degree by triggering a shell script for each build step, but it’s not quite the same.


Circle does have the option to run non linear builds. And you can do any fetching you want. There's also an API if you wanted to trigger downstream builds.


It has a fan out feature which has a very cool ui


circle definitely looks miles better than travis in that regard though I’ve never doubted it was. How isolated are the parallel jobs from each other? What is shared between them etc? Are they separate containers, or effectively just threads in the same container?


ok docs imply each parallel job is run on a different machine


i didn't do the work for the parallelisation, so my knowledge is hazy - but i know we get to allocate compute resources to each of the parallel jobs separately (we had a problem with one of the jobs being under-resourced), so they are running in separate containers. there are some caching facilities too, so there is at least some filesystem level sharing going on


I should point out that clones are explicit in circle, so if you wanted to clone a different repo, you could do that.


how do you mean?


When you run a build, you run a Git clone yourself, you're not in the context of the repo automatically


never used it but I seem to remember GoCD was created to deal with pipelines as first class citizen of the build


1. I've not seen anything better than Jenkins (firing off hand-written bash scripts) for complicated stuff yet 2. Would Docker multi-stage builds help?


@lady3janepl: Thanks for the pointer to multi-stage docker builds. I’ll take a look. I think my issue with these things is that there are lots of components that might help or be part of the solution; but no out of the box, or tried and tested solution to this issue.

3Jane10:01:09 this is also a thing, although i've seen it used in the context of ETL pipelines rather than builds


@rickmoynihan I've been thinking about this kind of thing lately. I'm thinking it would be cool to build a more "event driven" ci/cd system (codename "edcd"). At the edge, would be a bunch of "connectors" that deliver the data from various web-hooks, into an event log. Then a bunch of independent bots would consume the event log, each doing a single thing (e.g. run a unit-test, run an e2e test, build some assets, deploy to some env if the required approval has been observed). All the events would be added to a database (crux maybe?) and the bots would get a read-only handle to the database so they can query to find any events they are interested in (e.g. to check that the tests have passed for some code change event).


At a glance, airflow seems like it might be similar actually. Hadn't seen that before. Thanks for linking it @lady3janepl.

Wes Hall10:01:33

@cddr I like that concept 🙂. The log would be useful in and of itself.


Sweet! @wesley.hall. I've added you to my "potential customer" list 🙂

👍 4
Wes Hall10:01:30

@cddr I will let you know when I have any kind of budget at all 😉


@cddr: Yeah I’ve had similar thoughts… Though it may be worth mentioning there’s strictly no need for such a thing to run only in a hosted mode… i.e. it could potentially also run locally in some environments as part of your dev tooling. At one level there’s also not much difference between a CI and a tool like make, except perhaps the history/service aspect. @lady3janepl Also I saw airflow just before christmas, and also thought it might be usable in this regard — but yeah it mostly seems to be around orchestrating ETL flows.


Ah cool thanks! Hadn't seen that before. The one I'd been reading was this one. And just generally applying the principle that if "event driven" is a good idea for modelling complex business flows, then it might be a good idea for modelling complex devops flows.

👍 4

I think Concourse beat you to it @cddr

👀 4

Morning all!




Morn’ as is traditional ;)…

😄 4
Wes Hall16:01:30

Todo: Write a 'morning' bot.

😄 8