Fork me on GitHub
#tools-deps
<
2019-08-09
>
kenny00:08:19

The piece that is more interesting to me is shortening the time it takes to get a classpath. We have some services where it takes nearly a minute to get a classpath.

alexmiller02:08:25

are you i/o bound (and if so, why?) or is the tree huge? or it something in resolution (cpu bound)?

alexmiller02:08:13

a minute seems super long so would like to learn more about why

alexmiller02:08:46

also, curious what version of clj you're on? Just reading the code, it seems like the exclusions should get canonicalized in this scenario, at least in current code

alexmiller02:08:46

oh, I see why

andy.fingerhut02:08:17

Are there transitive dependency reachability checks done using exponential time algorithms? 🙂

andy.fingerhut02:08:06

Probably a completely irrelevant remark -- I just recall finding and fixing such a thing in tools.namespace years back.

alexmiller03:08:15

it's designed to be iterative and should be a single pass over the tree

alexmiller03:08:02

but problems like missing a cycle could cause issues

alexmiller03:08:18

@kenny I found and fixed the bug with exclusion canonicalization, really a problem with exclusions in any deps.edn, not just transitive. very good catch!

4
alexmiller03:08:33

not going to release it right now but will be in whatever the next release is

seancorfield03:08:03

I looked at our deps and we certainly have a mix of bare and canonicalized exclusions so it'll be interesting to see what affect that has on us 🙂

alexmiller03:08:20

my read of current release is that all bare exclusions are being ignored

seancorfield03:08:21

Thanks. I'll try to qualify all of ours and see if anything falls out...

alexmiller03:08:01

might want to do an -Stree before/after

4
kenny14:08:59

@alexmiller Thanks for the fix! > are you i/o bound (and if so, why?) or is the tree huge? or it something in resolution (cpu bound)? Just on a regular computer w/ ssd so not i/o bound. The tree is pretty big but I'd expect a tree this size with any enterprise product. Could be something in resolution that is slow, don't think it's cpu.

alexmiller14:08:29

I mean i/o bound in talking to the network to download jars

alexmiller14:08:42

not filesystem

kenny14:08:58

Oh, no. I think everything is already on disk

alexmiller14:08:24

like during that minute, could you grab periodic stack traces, either with ctrl-\ or with jstack, and look at the top of the stack?

alexmiller14:08:22

so parallel downloads would not help you at all if you're not downloading anything

alexmiller14:08:16

another debug thing to do is -Sdeps '{:aliases {:v {:verbose true}}}' -A:v

kenny14:08:46

BTW, the slowness is a "full" refresh, not using the cache.

kenny14:08:59

e.g. adding/changing/removing a dep

alexmiller14:08:59

if you see pauses in there, that would be suspicious, but otherwise, if you just have a big trace, I'd be interested in seeing that, could dm it to me

alexmiller14:08:30

not using which cache? classpath cache? m2 local repo? gitlibs cache?

kenny14:08:07

Not sure. Changing certain deps.edn takes a long time.

alexmiller14:08:44

so you're not actively clearing the m2 repo or anything

kenny14:08:53

Same for gitlibs and the rest

alexmiller14:08:53

so really classpath cache

alexmiller14:08:04

that's the only thing stale when you change deps.edn

alexmiller14:08:03

and you're on latest clj?

alexmiller14:08:21

there were some cycle detection issues that were fixed months ago

alexmiller14:08:50

clj -Sverbose for version

kenny14:08:54

This is from -Sdescribe :version "1.10.1.466"

alexmiller14:08:00

yeah, that's latest

alexmiller14:08:12

well, I'd love to take a look

kenny14:08:47

For starters, these messages have reappeared and they take a decent chunk of time:

Downloading: io/grpc/grpc-api/maven-metadata.xml from 
Downloading: io/grpc/grpc-core/maven-metadata.xml from 
Downloading: io/grpc/grpc-netty-shaded/maven-metadata.xml from 
@seancorfield had noticed that a pom from one of my deps (com.google.cloud/google-cloud-monitoring "1.78.0") used a RELEASE version. He suggested explicitly specifying the deps mentioned there. I have done that and they still appear.

alexmiller14:08:47

yeah, that's actually an s3 wagon issue, upstream from tools.deps

kenny14:08:01

He also mentioned it may have something to do with not correctly resolving deps from a parent pom.

kenny14:08:14

Oh. Well that could easily shave ~15s off the time.

kenny14:08:52

Those deps are coming from google-cloud-monitoring not Datomic

alexmiller14:08:09

I don't have enough info to debug this, would be useful to see deps.edn and the verbose trace above

kenny14:08:15

Sure. I can send that over. I'll see if I can create a smaller deps.edn first.

kenny14:08:54

Just this will do it:

{:deps      {com.google.cloud/google-cloud-monitoring {:mvn/version "1.78.0"}}
 :mvn/repos {"datomic-cloud" {:url ""}}}

alexmiller14:08:34

this is starting to ring a bell

alexmiller14:08:16

there's a loop in these maven deps iirc

kenny14:08:11

Adding time to the clj calls show I drastically overestimated the time it takes:

real	0m23.962s
user	0m46.126s
sys	0m1.158s
This certainly feels like an eternity when needing to do that many times a day. A big portion of that time is the "Downloading: ..." thing. It would be a huge productivity boost to get that under 5s.

ghadi14:08:03

why does repeated downloading happen in your environment?

alexmiller14:08:16

that's an issue with the s3 wagon I think

ghadi14:08:24

(jumping into this conversation without reading the backscroll)

kenny14:08:35

Does it not do that for you?

alexmiller14:08:49

I can repro it

8
alexmiller14:08:56

without the datomic repo in the mix, it's about 5 seconds to build a classpath for that

kenny14:08:22

That s3-wagon thing has always been a nightmare for me. I remember always hitting issues with it back when we used an s3 maven repo. Perhaps a good use case for aws-api? 🙂

alexmiller14:08:41

with it, I see about 8-9 seconds

kenny14:08:09

Yeah, I'd be curious what one of our large apps takes without the downloading thing.

alexmiller14:08:16

time clj -Spath -Sforce

kenny14:08:37

That will still have the downloading issue.

alexmiller14:08:02

yeah, that's the idea :)

kenny14:08:17

Oh. I already sent that above haha

ikitommi18:08:57

any success stories of multi-module/mono-repo library setups with deps? have some working lein projects for that, but have now a deps project that needs to be split into parts.

kenny18:08:53

@ikitommi We switched to a monorepo a month or so ago. All internal libraries are :local/root. Makes working in the REPL great. There's a few kinks with our setup though: - We use CircleCI for CI/CD. There's no support for monorepos with CircleCI. This leads to longer build times - every project runs through its test steps with every push. We recently switched to their new unlimited parallelism plan which has been quite helpful in getting CI time down. - CI configuration was moved into a set of clojure files because it became far too tedious messing around with YAML with the number of projects we have. This does, unfortunately, mean that CI config needs to be manually generated with a command every time you change the CI clojure files. - We have a small service diff library that detects when a particular service's code, deps.edn, or :local/root deps have changed. That ensures a service won't get deployed on every push. - We don't have a great way to have a common deps.edn across all projects. This would be quite useful for things like: global exclusions (when supported), overriding certain library versions, common aliases, etc. Ideally there'd be some way to just pass in N number of deps.edn files to clj and have it merge those in. I use Cursive so Cursive would also need to have some way to select which deps.edn files to use. - We don't have a good way to run commands across all projects or only in a certain project. For example, I'd like to be able to do something like: monorepo my-service uberjar. - Different libraries & services run tests with a different set of aliases. Every time we want to run the tests for a project, you need to go to the projects README (or check the CI config) and determine which aliases to use to run the tests. Either the aforementioned "command runner" or a way to combine aliases somehow would make this much better. Overall, this workflow is far better than having individual repos and constantly needing to restart the REPL when working across projects.

ikitommi19:08:24

Thanks @kenny! was hoping for the monorepo kinda script, too lazy to start cooking up own tools right now. In my case, it's a library, going to be split into set of libraries, so would be easy to have same aliases for all. A sample repo would be super awesome.

dominicm19:08:53

@ikitommi you might want to look at edge which does this

dominicm19:08:20

Also Sean has talked a lot about their setup at world singles

seancorfield19:08:11

We have a monorepo with maybe two dozen subprojects, and 90k lines of Clojure.

seancorfield19:08:33

The key thing we did was to have a primary deps.edn in a folder and point to that via CLJ_CONFIG (so it "replaces" the user-level deps.edn) and then each subproject has a deps.edn.

seancorfield19:08:08

We use :override-deps in the primary file to "pin" versions of libs across the whole repo as needed, as well as provide all the common tooling via aliases.

seancorfield19:08:55

The only "tooling" we've built on top of this is a small shell script that can execute multiple clojure commands and knows how to navigate to subprojects when running series of commands.

seancorfield19:08:05

Like @kenny we use :local/root deps for cross-module deps -- and we have an everything subproject that we can build the deps.edn into from across the monorepo and that's where we usually start our REPL/REBL from.

kenny19:08:54

Hmm that's a good idea! Cursive doesn't support CLJ_CONFIG unfortunately. Creating a everything project and starting a nrepl from the command line could solve that problem though! Generally it makes sense to have everything on the classpath while dev'ing.

kenny19:08:57

How does your script know which aliases to use for each project's tests?

seancorfield19:08:08

build set:of:aliases subproject is our shell script. But it can take multiple pairs of aliases/subprojects.

seancorfield19:08:57

and if we need arguments, we can use [ ] to wrap them, so build uberjar api [ run ci-ftp api [email protected] ]

kenny19:08:01

How do expose build? Is it a script at the root of the repo? Do you have devs add to PATH?

seancorfield19:08:59

We have a <repo>/build/bin folder containing scripts. Devs can either add it to their path or just run the scripts directly.

seancorfield19:08:44

I mostly work in the build folder but I have build/bin/build symlinked into my ~/bin folder for convenience. Other stuff I run with ./bin/<script>

seancorfield19:08:47

Between docker compose and two git clone commands, a dev can be set up "immediately" (assuming they have an OpenJDK8 installed).

seancorfield19:08:09

If they need to work on our legacy apps, there's one more git clone to run.

seancorfield19:08:04

(one of those repos is for semi-static tooling, which is where we run docker compose -- for Redis, Elastic Search, Percona/MySQL, and a custom search engine we use)

nwjsmith20:08:00

I’ve been using this trick to “bless” <repo>/bin path additions: https://thoughtbot.com/blog/git-safe

😮 4
seancorfield20:08:07

Interesting little trick!

kenny22:08:18

Does CLJ_CONFIG have to be an absolute path?

kenny22:08:31

I think it doesn't. I thought it wasn't working for a sec.

seancorfield22:08:32

Nope. We use "../versions" in our script.

kenny22:08:55

How do you deal with not knowing where the build script is run from?

seancorfield22:08:23

CLJ_CONFIG=../versions clojure -A:defaults:<task> <args>
:defaults pulls in all the overrides etc from versions/deps.edn

kenny22:08:58

For example: if I have a build/bin/build and I run it from build/bin, I need to know to set CLJ_CONFIG to ../../versions. If I run it from build, I need to set CLJ_CONFIG to ../versions.

seancorfield22:08:05

Then you can work relative to that.

kenny22:08:52

Oooo, I didn't know about $0

seancorfield22:08:24

(although we assume certain filesystem paths are the same on all dev/test/prod images so some of our scripts take advantage of that -- and devs just add a symlink to wherever they decided to put stuff)

kenny22:08:48

Given you symlink build, dirname $0 will return a path to wherever the symlink is. How do you deal with that?

kenny22:08:48

Perhaps readlink?

seancorfield23:08:40

Yes, you can use readlink $0 to get the actual file location (it exits with a non-zero status if the argument is not a symlink).

kenny23:08:55

You must then also deal with Linux/Mac platform differences with readlink 😵

seancorfield23:08:41

Like I said above, we also assume certainly filesystem paths to make our lives easier 🙂

seancorfield23:08:41

(partly because we have a shell script in the main repo that a new dev can download and it does most of the env setup for them, including git cloneing repos to specific places and setting up symlinks and handling the initial Mac/Linux differences)

seancorfield23:08:30

But there are plenty of ways to skin that particular kitty.

hkupty22:08:19

What is the difference between: • (resolve-deps {:paths [p1 p2] ...}), • (make-classpath ... [p1 p2]) and • (make-classpath .. nil {:extra-paths [p1 p2]}) All three seem to produce the same result, which is to add the local project's paths p1 and p2 to the classpath. To further exemplify my question from yesterday, I'm looking for something on the lines of: {:deps {some/dep {:git/url ... :paths [p1 p2]}}} Is the example above feasible?

alexmiller23:08:39

Those 3 produce the same result but are semantically different

alexmiller23:08:29

What’s your goal?

hkupty23:08:33

I want to add a folder (for example 'test') from a dependency I loaded w/ tools.deps to my currently running project's classpath.

hkupty23:08:39

That folder is not declared on the main {:paths [...]} clause on dependency's deps.edn for obvious reasons, it is not the mainline for that project.

hkupty23:08:41

However, as I'm trying to build something like a buildscript CLI interface for a group of projects, I want to dynamically load them using tools.deps and run their tests, or whatever else I might want to do with them, given that not only I can load their dependencies but also add folders (or other aliases from that deps.edn) into this buildscript CLI classpath.

hkupty23:08:45

There's a hacky way for me to work around that, which is to hijack the result of (result-deps) through some (update-in (result-deps ...) [dep :paths] conj '/my/hand-crafted/path/p2') before (make-classpath ...). That seems too hacky for me, but I can do that if tools.deps doesn't want to explore further the project structure of a dependency.

alexmiller23:08:17

If you’re calling tools.deps programmatically then you’re already in the machine - do whatever you want with the intermediate results

👍 4
kenny23:08:30

@ikitommi I added a build script to our repo that is similar to the one @seancorfield described. Here's what I ended up with. Our repo is structured like with all projects under projects and this script located at bin/build.