tools-deps

borkdude 2023-06-12T15:47:31.895799Z

deps.clj downloads the tools jar if it's not installed yet on a system, but 1 in 100 (or fewer) times I get complains that deps.clj returns:

Error: Could not find or load main class clojure.main
Caused by: java.lang.ClassNotFoundException: clojure.main
because of an invalid downloaded jar which is then solved by deleting ~/.deps.clj so it forces a re-download. I can repro this by messing with the tools jar:
echo '' >  /Users/borkdude/.deps.clj/1.11.1.1347/ClojureTools/clojure-tools-1.11.1.1347.jar
A solution to solve this problem could be that I mirror the tools.zip in deps.clj github releases along with a .sha256 file so I can verify if the download was successful. Or perhaps http://clojure.org could provide a checksum file (that I can verify in clojure/Java)

borkdude 2023-06-12T15:49:02.454939Z

Is there anything against me mirroring the tools.zip on github releases of deps.clj? Or would you be in for solution 2, then I could just avoid doing so

dominicm 2023-06-12T15:51:16.225469Z

Why is the jar invalid?

dominicm 2023-06-12T15:51:20.594939Z

How do you download the jar?

Alex Miller (Clojure team) 2023-06-12T15:52:13.585069Z

there is a checksum file in maven already

dominicm 2023-06-12T15:52:38.370379Z

Are the clojure tools uploaded to maven?

Alex Miller (Clojure team) 2023-06-12T15:52:48.098049Z

oh nvm, you're talking uber jar here

borkdude 2023-06-12T15:53:50.136679Z

yeah the zip file

borkdude 2023-06-12T15:54:22.045169Z

for example

borkdude 2023-06-12T15:56:30.886479Z

for zip and tar.gz files for other projects I now upload a .sha256 file for validation, e.g. see here: https://github.com/clj-kondo/clj-kondo/releases/tag/v2023.05.26 I directly copied this idea from graalvm: https://github.com/graalvm/graalvm-ce-builds/releases/tag/vm-22.3.2

dominicm 2023-06-12T15:57:35.134419Z

If you checked the jar for clojure.main (or something) after downloading, that would serve the same purpose, right?

dominicm 2023-06-12T15:57:48.783939Z

I’m still wondering why the jar is invalid without you knowing, though.

borkdude 2023-06-12T15:58:03.512649Z

yeah I think so, although there are more files in that .zip file

Alex Miller (Clojure team) 2023-06-12T15:58:24.343179Z

the .tar.gz sha is in https://download.clojure.org/install/stable.properties but we don't currently make or publish a zip sha

dominicm 2023-06-12T15:58:38.196149Z

Wait, the zip unzips but the files that come out aren’t valid?

borkdude 2023-06-12T15:59:00.565229Z

yeah that's weird. so it could also be an unzip problem perhaps

dominicm 2023-06-12T15:59:25.269659Z

Depending on what you’re using, I would have thought the checksumming in the zip would have prevented this.

dominicm 2023-06-12T15:59:29.854409Z

Something smells fishy here.

Alex Miller (Clojure team) 2023-06-12T15:59:31.430439Z

maybe it's unzipping a partially downloaded zip?

borkdude 2023-06-12T16:00:08.628639Z

or the user aborts while unzipping? dunno, but it sometimes happens

dominicm 2023-06-12T16:00:21.368189Z

Do you keep the zip file around?

Alex Miller (Clojure team) 2023-06-12T16:00:22.957099Z

like the first half of the zip. I don't know anything about zip but maybe it supports this so you can zip while still downloading

borkdude 2023-06-12T16:00:37.921369Z

deps.clj first downloads, then unzips

dominicm 2023-06-12T16:00:40.604949Z

I vaguely recall that zip stores the file listing at the end just to be annoying.

dominicm 2023-06-12T16:01:20.797049Z

https://unix.stackexchange.com/a/125102

borkdude 2023-06-12T16:01:46.216689Z

the .jar file does exist on disk, next time this happens for someone I should ask them to send this .jar file to me, but it's hard to know what exactly goes wrong. Getting rid of ~/.deps.clj solves the problem though. Perhaps I could also try/catch and check for clojure.main in the error message and offer a better suggestion

Alex Miller (Clojure team) 2023-06-12T16:01:59.776079Z

seems like it still needs a bit more examination (is there somewhere you can detect the problem and save off the badness when it happens) but happy to publish additional sha files if that's helpful

borkdude 2023-06-12T16:03:01.608759Z

I guess I could add to the error message: please go to the #babashka channel and post your .jar for examination ;)

borkdude 2023-06-12T16:03:43.872209Z

Re-downloading automatically seems risky as if I make a mistake somewhere you could get into a loop

dominicm 2023-06-12T16:03:52.099269Z

If you kept the zip around, you should be able to re-validate the jars against the zip.

borkdude 2023-06-12T16:04:50.165869Z

yeah, good idea, I'll keep the .zip around. I currently delete it

dominicm 2023-06-12T16:08:27.260049Z

If you wanted to check, you could .getCrc on your ZipEntry and calculate the Crc for the file on disk & compare them.

dominicm 2023-06-12T16:08:50.476319Z

Not sure how fast that would be, but maybe worth doing on failure to launch or something. Crc32 is pretty fast.

borkdude 2023-06-12T16:14:44.173999Z

oh that might be a good check, I'll try it out

borkdude 2023-06-12T19:01:34.769489Z

I'm getting -1 on all entries from the tools.jar zip

borkdude 2023-06-12T19:01:47.424689Z

which means "unknown"

borkdude 2023-06-12T19:04:18.313659Z

ah, the crc is known after you read it

borkdude 2023-06-12T20:14:53.365769Z

Something like this ought to do it. https://github.com/borkdude/deps.clj/pull/103/commits/8627f15d2278e78bf949861c589fa40c8781fdc1

borkdude 2023-06-14T09:22:04.238729Z

I guess if I could manually change the crc code of the tools jar in an existing .zip file, I could test if the current deps.clj would detect this (and give the better error message)

dominicm 2023-06-14T09:22:38.689429Z

You could also create a new zip with a different tools jar in ☺️

borkdude 2023-06-14T09:23:07.218489Z

and then?

dominicm 2023-06-14T12:10:01.963859Z

The CRC of the tools jar in the zip would be different than the one on disk.

borkdude 2023-06-14T12:11:00.882959Z

but not if you unzip that zip with the different tools jar

borkdude 2023-06-14T12:11:19.338979Z

I just want a zip file that I can throw at this function and then the function complains

dominicm 2023-06-14T12:33:56.468319Z

Ah, then you’re probably into some kind of malicious crc fiddling of the zip file, yeah.

dominicm 2023-06-14T12:34:00.757899Z

Time to pull out your hex editor 😄

😫 1
borkdude 2023-06-13T08:55:09.895909Z

I wonder how I can purposely damage a .zip file to test this, because I'm still not 100% sure if what I did makes any sense. I'm getting the crc32 of the entry and then read it through a checkedinputstream and then compare the crc32, but who says that those aren't always the same, even in the case of a weirdly downloaded zip file

borkdude 2023-06-13T09:10:50.696979Z

When I truncate some of the last bytes, I can't even unzip it:

dd if=clojure-tools-1.11.1.1347.zip of=tools-corrupted.zip bs=1 count=17999800

dominicm 2023-06-13T09:11:42.867759Z

I would expect that the file either unzips falsely or is "damaged" after the fact.

dominicm 2023-06-13T09:11:59.577129Z

If you keep the zip around you can revalidate the files against it.

borkdude 2023-06-13T09:12:16.085409Z

how would that work?

borkdude 2023-06-13T09:12:36.576749Z

the crc32 codes are -1 when I read them, which means they weren't in the zip file when the the file got zipped, right

dominicm 2023-06-13T09:12:59.481669Z

I'm not sure I follow

borkdude 2023-06-13T09:13:02.110169Z

they only become some number after I've actually processed the entry, which tells me it's lazily computed based on the data that was already there, which is kind of pointless

dominicm 2023-06-13T09:13:21.702049Z

You need to consume the stream for the CRC to be calculated, yeah

dominicm 2023-06-13T09:13:38.698289Z

So you could just read the whole thing and then check it against the CRC of what's on disk

borkdude 2023-06-13T09:15:01.162919Z

I'm checking this:

(= (.getCrc entry) (-> cis (.getChecksum) (.getValue)))
but you are suggesting calculating the crc32 of the file on disk, rather than from the checked input stream which was used to copy to disk?

dominicm 2023-06-13T09:18:21.005209Z

File on disk compared with the one in the entry

dominicm 2023-06-13T09:18:37.352019Z

Sorry, not both *

borkdude 2023-06-13T09:18:37.801459Z

so not based on the checked input stream?

dominicm 2023-06-13T09:18:48.332899Z

I think they're both the same value tbh.

borkdude 2023-06-13T09:19:05.129389Z

yeah, I think so too, the above comparison is always true, I think, no matter how corrupted

dominicm 2023-06-13T09:19:20.949949Z

You can use the file Vs the zip as a checksum

dominicm 2023-06-13T09:19:45.583899Z

Although the CRC should be written in the zip somewhere, too

borkdude 2023-06-13T09:19:48.090149Z

so comparing to on disk would detect some kind of data write failure?

borkdude 2023-06-13T09:20:02.296049Z

yeah, I think you need to produce a zip file with explicit crc32 on

dominicm 2023-06-13T09:20:07.331409Z

Or if the file had been modified (eg by antivirus)

dominicm 2023-06-13T09:20:17.782919Z

Oh I thought zips had them by default.

dominicm 2023-06-13T09:20:30.338139Z

It could be a limitation of the Java zip library, too

borkdude 2023-06-13T09:21:24.949169Z

ah right it has crc32 by default

borkdude 2023-06-13T09:26:23.345499Z

ok, I added the extra check: https://github.com/borkdude/deps.clj/commit/82adf0cb15249a753851579bc9fac0db52921778

borkdude 2023-06-13T09:26:46.415489Z

will keep an eye out until the next failure and will ask for the zip file so I can double-check if this really helps