Fork me on GitHub
#clojure
<
2022-04-10
>
hiredman00:04:44

Arrays are by reference, primitive or not

hiredman00:04:21

So passing an array to invoke which takes Object doesn't involve any boxing or whatever

didibus00:04:22

No boxing, but it seems to compile to (long[])object which I was wondering if the downcast was still slower

didibus00:04:01

A quick Google seems to say downcasting at runtime does incur a small cost. That's why I was wondering

Alex Miller (Clojure team)00:04:03

It's probably too small to care, esp on modern jvms

👍 1
Ben Sless03:04:03

One place that bit me with this was a comparison between a long and int on a loop. The cost of casting the int to long every iteration ended up being measurable

hiredman03:04:09

The case of primitives is slightly different

Ben Sless03:04:42

True, I had to learn to read byte code to make sense of that

didibus07:04:26

The cast for primitive is even slower? Or what is the difference we are talking about here?

Ben Sless08:04:55

Not in that sense, but that the compiler will insert casts for you from int to long in stuff like loops even if you initialize a binding to be an int. So if you compare against it every iteration (such as if (< i n)), you'll end up promoting int to long every iteration. It will be slower than working with long directly, while you might think ints will always be faster

didibus08:04:48

Ah ya, I just go for long and double when I can because of that. Though I don't really know how slow the cast also is.

didibus08:04:27

I think that's because none of the primitive supporting functions support int, basically you can't do anything really with an int, except shove it in an array, or use it with interop Java methods that take an int.

hiredman00:04:13

And it is the same for every reference type in clojure

didibus00:04:37

Thanks, turned out it was mod not having primitive support that was slowing the whole thing down, when looking at the decompiled java code I saw the downcast on the array and wondered at first if that was it.

didibus19:04:16

@alexmiller Since 1.11 introduced clojure.math which has primitive type support for all the functions, would it not make sense to give the same treatment for the math functions that were in core? Such as mod: https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L3567-L3575 It seems an easy loophole for people to get bitten by to assume they are using clojure.math and getting primitive arithmetic, and suddenly throw in a mod, quot, rem, max, zero?, pos?, etc. and suddenly they are back in boxed land

Alex Miller (Clojure team)19:04:29

As always, feel free to propose and vote at https://ask.clojure.org

👍 2
Alex Miller (Clojure team)19:04:27

Some of those you mention do have inline / polymorphic support

didibus19:04:24

I'll see if an existing issue exist for it or add one otherwise. Ya, I think I almost find it more annoying that some of them but not all do, I always end up having to decompile to be sure. Maybe a simpler improvement could be a clearer way to document which functions does and doesn't? Is there even such a way right now? Like I saw zero? actually will, but looking at the doc/code to zero? I'm not sure how I can tell it will?

didibus19:04:53

Oh... I guess it's the inlining that does it for zero?

Ben Sless03:04:03

One place that bit me with this was a comparison between a long and int on a loop. The cost of casting the int to long every iteration ended up being measurable

Yehonathan Sharvit13:04:20

Is there an easy way to compare two pieces of data, ignoring the order of elements in collections, recursively? cc: @ory.band

andy.fingerhut14:04:53

Perhaps do a first pass where you convert ordered collections into sets or multisets/bags first, then compare?

saidone13:04:16

All sequential things are treated as associative collections
  by their indexes

saidone13:04:14

So I suppose that lists should be sorted before

Nom Nom Mousse13:04:58

Is there a way to get a process and then tell it what to do? Like first just start a process and get its ID and then tell it what to do? (For example, run a shell job.)

Nom Nom Mousse14:04:09

Perhaps we get an ID just from using a processbuilder. Let me check.

Nom Nom Mousse14:04:49

No, we need to start it to get a process id

Cora (she/her)14:04:39

can you explain the use case a bit?

Nom Nom Mousse14:04:34

I want to run a process in a future. The process might not terminate so I need its handle to be able to kill it.

Nom Nom Mousse14:04:22

(future (->process "echo hi")) I do not know the process id from where I call future.

Nom Nom Mousse14:04:52

But perhaps future-cancel will lead to the subprocess running in the future being killed?

flowthing14:04:02

You could call .destroy on the java.lang.Process, I think.

Nom Nom Mousse14:04:59

But not from where I requested the future.

(future (->process "echo hi"))
;; I have no idea what pid the process has; everything is hidden in the future

Cora (she/her)14:04:04

you could use a promise to pass the process back

Cora (she/her)14:04:01

but I imagine if we had more context we could help you pick a better design

Nom Nom Mousse07:04:56

Thanks for the replies. I'll need to spend more time in the hammock it seems

didibus19:04:16

@alexmiller Since 1.11 introduced clojure.math which has primitive type support for all the functions, would it not make sense to give the same treatment for the math functions that were in core? Such as mod: https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L3567-L3575 It seems an easy loophole for people to get bitten by to assume they are using clojure.math and getting primitive arithmetic, and suddenly throw in a mod, quot, rem, max, zero?, pos?, etc. and suddenly they are back in boxed land

Carlo20:04:02

general question, has anyone used the materialize db from clojure? https://materialize.com/ - it's essentially differential dataflow, with an SQL interface

👀 2
Drew Verlee02:04:10

There are a number of older conf talks of trying to use dataflow using datalog, but i don't think the idea ever got packaged into a production ready application...

🙌 1
ericdallo20:04:19

Not sure if this is a good channel for this question, but since it would be implemented in clojure, sounds correct to me: How fast is to sha a file content? what's the faster way?

ericdallo20:04:13

I'm studying implement a cache mechanism by file on clojure-lsp, where if the file (especially jar content, so multiple files) didn't change we don't analyze it

ericdallo20:04:42

So, there is this second corner case, is it possible to sha a whole jar instead of iterating on each entry and sha each one?

p-himik20:04:59

I never measured it, but given that you'll have to read a file in full and that IO is always the limitation, the speed of computing the checksum of a file will be limited by the speed of your IO.

ericdallo20:04:03

Yeah, I thought the IO would be the expensive thing too, I wonder if there is any thing on java/OS side that could have this information, but it makes sense, you probably need to read the file content and then sha it

ericdallo20:04:17

I mean, how OSs know a file changed at that time when you ls -l

p-himik20:04:12

Due to the modification time stored in the file system. And that's exactly what other tools use for such things. E.g. gcc, if memory serves me well, won't compile sources if their modification timestamp is less than that of the resulting object files. shadow-cljs won't compile a cljs file if the resulting js file has been created after the cljs file was changed. And so on.

ericdallo20:04:06

hum, interesting, good to know, so maybe I can find some way to use that instead of a sha

ericdallo20:04:29

like, first time I analyze it I could save that timestamp and then check it next time? not sure how reliable would be tho

p-himik20:04:36

In fact, that's definitely what you should use. :) .File/lastModified should be the right method.

ericdallo20:04:02

Thank you! I'll do some experiments with that

p-himik20:04:13

Of course, a user can modify the modification timestamp for it to be in the past. But at that point, the user is deliberately shooting themselves in the foot, so it shouldn't be your concern.

ericdallo20:04:46

yea, I wonder if there is some other case where checking the filename + lastModified of a jar would not be reliable

ghadi21:04:14

@UKFSJSM38 if you want fast, use blake2 or blake3

☝️ 1
p-himik21:04:18

If you'd like a reference, this seems to be the right section of shadow-cljs code: https://github.com/thheller/shadow-cljs/blob/master/src/main/shadow/build/output.clj#L237-L241

ericdallo21:04:18

for a jar for example, it may be a little tricky as user could delete a .m2 for example, but I think that would be expected to reanalyze indeed 😅

p-himik21:04:45

Yeah, the deletion case should also be handled - that's in the code above with .exists.

ghadi21:04:12

or SHA1, which is a totally, completely broken algorithm, but secure for this threat model

ericdallo21:04:27

Awesome @U2FRKM4TW, that looks promising

ericdallo21:04:50

@U050ECB92 thanks, didn't know about blake, is that some clojure lib or built-in function for that? (maybe java one)

ghadi21:04:53

agreed with @U2FRKM4TW about the larger application concern

ghadi21:04:06

there's a blake2 impl for java somewhere

ghadi21:04:21

it's a well-respected algorithm, but not built into the JVM

ericdallo21:04:38

I found https://github.com/alphazero/Blake2b, but I thought it would be possible without an additional dep

ericdallo21:04:17

yeah, maybe we don't need something that powerful, a simple sha1 would do the tricky if the lastModified way didn't suppress the needs

hiredman21:04:00

The other option is to subscribe to filesystem notification events

ericdallo21:04:33

thanks, but since this is a one time thing after project started, probably listening would not be necessary

ericdallo22:04:55

It seems the filename + lastModified worked liked a charm 🪄 Used a function like this to create a map with lastModified by filename:

(defn ^:private paths->checksums
  "Return a map with file's last modified timestamp by filename."
  [paths]
  (reduce
    (fn [cks path]
      (let [file (io/file path)]
        (if-let [checksum (and (shared/file-exists? file)
                               (.lastModified ^java.io.File file))]
          (assoc cks path checksum)
          cks)))
    {}
    paths))
Thank you folks for the help!

p-himik22:04:39

Great! Although, I'd rename that checksum. ;)

ericdallo22:04:33

yeah, indeed, it's not a checksum really 😅