Fork me on GitHub
#clojure
<
2020-07-06
>
datran03:07:28

I've got a problem where I need to construct a directed graph, and I'm curious what the clojure approach to this is - everything I learned in school was very mutable. Is there any good reading on the topic?

dpsutton03:07:27

a map representation of an adjacency list is straightforward and clojure-y

dpsutton03:07:42

{1 #{2 3} 2 #{3}}

datran03:07:33

That's similar to what I tried first, but I'm going to have different nodes with the same value - I started going about generating uuids for each value before I thought, I'll just ask first

dpsutton03:07:00

what does "different nodes with the same value" mean?

datran03:07:16

Like, I'll have nodes [1 2 1] and edges [[1 2] [1 1] [2 1]]

dpsutton03:07:22

there are two nodes with value 1?

datran03:07:48

correct! I'm building a graph representation of a sequence, where the nodes are the members of the sequence and the edges are the "observed-before" relations. So, the sequence (1 2 3) is broken into {:mem [1 2 3], :ord [[1 2] [1 3] [2 3]]}

datran03:07:06

I need to construct a directed graph with that map, and then topologically sort it to get the original sequence back

datran03:07:38

In the pathological case I'll have a sequence like (1 1 1) that maps to {:mem [1 1 1], :ord [[1 1] [1 1] [1 1]]}

dpsutton03:07:17

ah neat. haha. not sure i would call (1 1 1) pathological but i get your point 🙂

🙂 3
dpsutton03:07:14

maybe use a tuple of ordinal and value then?

datran03:07:37

like with map-indexed?

dpsutton03:07:42

but that would be a pain if you need an invariant like the graph of (next coll) to be a subset of the graph of coll

datran03:07:17

I just found https://github.com/weavejester/dependency, so I'm going to read some source code and see what I can purloin

datran04:07:32

ok, that one has the same problem I mentioned above, but I think I can get around it by creating a map of uuid->element, and then passing the collection of uuids into the graph, topo-sorting it, and then once it comes back mapping the uuids back to the original values

datran04:07:04

Is that crazy? It adds overhead, but it seems much simpler than tagging each value

datran04:07:43

What the heck, I'll try it out and see how it feels

andy.fingerhut04:07:26

Whether you use uuid, or something else unique and convenient, e.g. the index of the node in the original order, perhaps paired with the value, e.g. [0 1] [1 1] [2 1] for your three nodes with value 1 example, I think you will be much saner if you pick some representation of nodes where they are all guaranteed to have unique values.

datran04:07:56

I'm hesitant to pair it with the value and complicate the structure, since I hope to be able to apply this work to nested datastructures after I get the simple case down. But you're right, my repl explorations have already proved that a unique id is useful, though now I have to use a special print fn to make them nice.

datran04:07:25

How expensive are uuids? I'm considering going with integers because uuids everywhere is going to make testing a nightmare

datran04:07:04

Oh, this is so nice! I think I can throw away my implementations of seq-intersection seq-difference and seq-union and just use clojure.set now

noisesmith04:07:17

the correct way to do an adjacency list is for every value to be unique (and most often meaningless to the domain)

noisesmith04:07:29

you can use a second hash map to assign a value to each node

datran04:07:12

yeah, that's what I've got going now and it simplifies so much! Trying to handle nil and false inside of those seqs was the hackiest thing I've done this month

datran04:07:46

Now I have an initial step of (zipmap (range) s) and I'm off to the races

noisesmith04:07:54

if you don't need the strong guarantees of uuids, a third option is gensym

user=> (zipmap (repeatedly #(gensym "node")) [:a :b :a :c :a :b :a])
{node9 :a, node10 :b, node11 :a, node12 :c, node13 :a, node14 :b, node15 :a}

datran04:07:53

What's cheaper, the (repeatedly #(gensym "foo")) or (range). I think what I'm going to be doing will be slow for large collections, so I'd like to save where I can

datran04:07:05

And the ids are pretty transient, they'll never leave the namespace

noisesmith04:07:07

range is definitely cheaper

👍 3
noisesmith04:07:21

but the gensym is useful if you aren't producing the result all in one go

datran04:07:38

oh, that's a good counterpoint

noisesmith04:07:09

also, in my experience clojure hits some hard limits with graphs, and to do interesting things you eventually need to write java code, or even worse, very hard to read and write clojure with very weird bugs

noisesmith04:07:19

clojure's better at defining reliable high level code, but java is better at defining reliable performant code, and luckily we are allowed to use both together

coetry04:07:15

I have a file.jar in my project root and in my deps.edn I included it as the following:

{:deps {foo {:local/root "file.jar"}}}
Yet when I try to require it in my repl, I run into a FileNotFoundException

coetry04:07:28

am I doing something wrong?

coetry04:07:30

user=> (require 'foo)
Execution error (FileNotFoundException)

datran04:07:33

:local/root requires you to specify the path from the filesystem root, I think. For me, I have /home/me/path/to/file.jar in my deps.edn

coetry04:07:43

cool, let me try that

seancorfield04:07:56

:local/root can be a relative path.

3
seancorfield04:07:36

Have you confirmed that the code you're trying to require really is in the .jar file?

coetry04:07:11

I assumed if i give any arbitrary symbol name, it would load up the jar

coetry04:07:24

i know the .jar is definitely in the path and yeah, its not empty if that's what you're asking

seancorfield04:07:43

I'm asking that you've confirmed the contents of the JAR are what you expect.

seancorfield04:07:09

Also, have you confirmed exactly what file is not found?

coetry04:07:54

Yeah I just extracted the contents, and its all there, maybe the problem is with my naming

coetry04:07:02

I'm trying to require jquran, but the name of the jar is jqurantree-1.0.0.jar and has a structure of org.jqurantree.* when uncompressed

coetry04:07:33

would i have to require it as jqurantree? Or can I refer to it with any random symbol, such as foo

seancorfield04:07:49

So the Clojure namespace is org.jqurantree.something? That's what you have to require.

seancorfield04:07:02

(this is Clojure code right, not compiled classes?)

coetry04:07:06

Its not a clojure jar, its a java package

coetry04:07:20

was trying to do some interop stuff

seancorfield04:07:29

You import Java classes. Not require.

coetry04:07:11

ahh, ok that makes sense. Would I still be able to import it under foo? Or do i need to follow the namespace structure that I find when I extract the .jar

seancorfield04:07:48

The group/artifact name in deps.edn is purely for tracking dependencies -- it has no relationship to code at all.

coetry04:07:14

ok cool, I think i got it now:

user=> (import org.jqurantree.core.io.FileWriter)
org.jqurantree.core.io.FileWriter
user=> FileWriter

coetry04:07:19

thanks Sean 🙏

seancorfield04:07:12

Most Java libraries are up on Maven and you just specify the coordinates in deps.edn. You rarely have to download the JAR.

coetry04:07:50

haha yeah, this particular project isn't on Maven: http://corpus.quran.com/java/overview.jsp

coetry04:07:17

I had to download it

seancorfield04:07:19

Ah, OK. Just wanted to check since you didn't know about import vs require so I wasn't sure where you were on your journery.

coetry04:07:54

Yeah, I'm still in the early stages for sure! But am able to get really productive surprisingly. Clojure is a gem.

3
datran04:07:26

Yeah, it's always nice to have that ace in the hole.

datran04:07:30

ok, I definitely need different collections to have different values, so now I'm thinking I'm back to uuids. How safe is gensym?

datran04:07:09

ok, I don't think this approach will work at all - I need to compare values in different collections, and when they are uuids of course that doesn't work -they are never going to be equal. Back to the drawing board on this one

datran04:07:20

well, I guess I could still use it for the graph creation, I just can't use it for all the other stuff I'm doing

noisesmith04:07:25

you'd need some kind of indexing - it ends up being very similar to using sql - you have an index that doesn't carry any value on its own, then you use it to cross-correlate

noisesmith04:07:17

the whole point of gensym is that it's safe within one execution (though might not be safe if you load symbols from another execution of your program, eg. from a file - that's when you really want UUIDs)

datran04:07:43

I think it's even worse than that. I'm trying to implement Mergeable Replicated Datatypes according to this paper: https://www.cs.purdue.edu/homes/suresh/papers/oopsla19-mrdt.pdf

datran04:07:15

I calculate the :mem and :ord relations for a sequence, like I showed above.

noisesmith04:07:01

if this is really about a datatype, you can do what clojure did for its hash-map and vector impls and just use mutable definitions with an immutable interface

datran04:07:01

Then I do a three-way merge of those relations with replica1 (r1), replica2 (r2), and the lowest common ancestor between them (l), according to this formula: :ord: Rob(v) ⊇ (Rob(l) ∩ Rob(v1 ) ∩ Rob(v2) ∪ Rob(v1 ) − Rob(l) ∪ Rob(v2) − Rob(l)) ∩ (Rmem(v) × Rmem(v)) :mem: Rmem(v) ⊇ (Rmem(l) ∩ Rmem(v1) ∩ Rmem(v2) ∪ Rmem(v1) − Rmem(l) ∪ Rmem(v2) − Rmem(l))

datran04:07:01

Once I'm on the other side I use :mem and :ord to produce a new sequence. So I churn them all together and get something completely new, and all the mappings are destroyed in the process

datran05:07:26

I may go that way, but right now I'm stuck on the last step, reproducing a sequence given a list of its members and the "observed before" relation for each member.

datran05:07:15

In this: {:mem [1 3 2], :ord [[1 3] [1 2]]}, how do I relate the ord stuff to the mem stuff? I mean, in this case it is obvious but what about when my collection is all 1s?

noisesmith05:07:19

you need a representation that separates a reliable index from the value held, I think

tianshu05:07:33

is it posssible to update the defmulti dispatching function, without remove current namespace or restart the REPL?

flowthing05:07:10

I believe the common trick is to do something like this:

(defn my-dispatch-fn [& args] ,,,)

(defmulti foo #'my-dispatch-fn)

tianshu05:07:05

interesting

tianshu05:07:47

This should work, just never think about this before

Ed09:07:26

You could also define the multimethod var as something other than a multimethod (for example (def foo nil)) then re-evaluate the defmuliti and all the relevant defmethods (since redefining it to nil will remove all of the methods) ... that way you can change it to use the var-quote form without restarting the repl

👍 3
❤️ 3
piumidp9607:07:35

Hi everyone 😄 I'm looking for code analyzers to detect concurrency issues (Like Race and deadlock) in clojure. Does anyone have any suggestions?

👍 3
Ed09:07:59

I've mostly done this sort of thing through runtime detection and automated testing rather than static analysis but ymmv ... maybe have a look at test.check (https://clojure.org/guides/test_check_beginner) ??

vemv11:07:43

Clojure's approach is making high-level thread-safe primitives which are thread-safe even when composed. Since it works, I'd imagine people haven't felt very compelled to author such analyzers. Still it'd be possible to write linters ensuring that refs, atoms, etc are used properly... not many "rules of thumb" come to mind right now, but there certainly are. My hunch is that it's a better investment to thoroughly understand all primitives, and ensure their proper usage through code review.

Alex Miller (Clojure team)12:07:39

data races occur when you have uncoordinated threads racing to write the same value. None of the concurrency primitives in Clojure allow this. (there are some corners you can use to exploit this via interop w/ mutable arrays, or mutable deftypes etc but usually people doing that stuff know what they're doing)

Alex Miller (Clojure team)12:07:59

deadlocks occur when you have a resource/lock cycle. most of the concurrency primitives have guards against this - atoms use spin/lock retries so you don't "hold" the resource, agents are async, refs can conceptually deadlock but has timeout/retry built in.

piumidp9612:07:07

Ah I see. Thanks a lot for the information @U45T93RA6, @alexmiller That really helps. 🙂 I will also check out test.check as @U0P0TMEFJ has suggested.

Ed13:07:43

It is easy to misuse things like atoms, by calling deref keeping the value in a let, and using swap! to write a new value in, which will result in data races. Don't do things like

(def thing (atom 1))

(defn make-not-odd! []
  (let [t @thing]
    (when (odd? t)
      (swap! thing inc))))

👍 3
Yehonathan Sharvit10:07:39

I’d like to write a function mywalk that is similar to `clojure.walk/prewalk` with the possibility to leave part of the structure untouched (for instance when the metadata contains :`skip true`) For instance,

(mywalk #(if (number? %)
            (inc %)
            %)
         {:a {:b 1}
          :c ^{:skip true} {:d 1}})
should return
{:a {:b 2}
     :c ^{:skip true} {:d 1}}
Any idea?

p-himik11:07:30

You can either just copy the code of walk and prewalk and make all the necessary changes there (there's not a lot of code), or you can just wrap and unwrap the marked values into something that walk doesn't recognize. E.g. in this case I used atom. But for it would be better to create a custom type, of course.

(defn filtering-prewalk [f form]
  (let [f (fn [v]
            (if (:skip (meta v))
              (atom v)
              (f v)))
        outer (fn [v]
                (cond-> v (instance? clojure.lang.IDeref v) deref))]
    (walk (partial filtering-prewalk f) outer (f form))))

p-himik11:07:33

Actually, the implementation of walk is hardly longer, so in my code I would definitely just copy and modify its code rather than using the above example.

Yehonathan Sharvit13:07:09

Eventually I solved it this way

(defn prewalk-skip
  "Like `clojure.walk/prewalk` but leaves as is parts of the form (and their children)
  that satisfies `skip` predicate"
  [f skip form]
  (if (skip form)
    form
    (walk (partial prewalk-skip f skip)
          identity
          (if (skip form)
            form
            (f form)))))

👍 3
gekkostate14:07:55

Hi all, we’re currently use https://github.com/lettuce-io/lettuce-core to interface with Redis. We’re facing a slightly weird issue where the push to Redis is “uneven.” In other words, lpush only works sometimes. I’m wondering if someone has faced a similar issue before? We’re using 6.0.5. Would be great if someone could provide some pointers on where to look. We have debugged the surrounding code and found that the issue lies specifically with the push itself.

Adrian Smith15:07:54

Has any one ever used xhprof here? Is there anything like that for Clojure? I'm not great with reading flame graphs

noisesmith15:07:35

The good profiling is going to come from the VM (I'm assuming Java for Clojure), the java tool hprof dumps data, there's various tools for loading and exploring that data. I've used visualvm and yourkit, yourkit is definitely nicer, but visualvm is free.

noisesmith15:07:50

The biggest gotcha with analyzing performance with clojure is that jvm tools expect the relevant context info about execution to be the class whose method is invoked. For clojure that means you get a lot of info about methods in PersistentHashMap and PersistentVector and LazySeq, but not necessarily info about the code that made those things execute.

👀 3
👍 3
noisesmith15:07:24

I've even gone so far as replace hash-maps / functions with records implementing protocols, just so I could get better profiling info

noisesmith15:07:29

luckily that conversion is trivial

benny16:07:07

is it possible to rename keyed parameters in an anonymous function?

(fn [{bar :foo}] (prn bar))
and expect it to be called like so
(wrapper {:foo "baz"})

noisesmith16:07:36

of course: (fn [{foo :bar baz :quux}] ...) - perhaps I misunderstand you though

benny16:07:14

that’s what i tried but maybe i’m doing it wrong, i’ll go back and dig a little more. mainly wanted to make sure i was taking the right approach

noisesmith16:07:05

your example works on a minimal test case

(ins)user=> ((fn [{bar :foo}] (prn bar)) {:foo 42})
42
nil

noisesmith16:07:34

you can also do the same destructuring inside a let block, which can improve clarity if the argument list starts to get noisey

👍 9
Tom Cooper19:07:17

Is anyone using Ogre or other gremlin-based API in Clojure for a graph database in a large scale production environment? (You get to define “large” 🙂 ) There is little in the way of examples or clojure-specific documentation that I could find.

souenzzo20:07:16

How to I know how much memory each namespace use? when I start clj, it use 13Mb After (require 'app.main), it goes to 116Mb I need to debug why this footprint is so high Both memory usages are from Used metaspace in visualvm

noisesmith20:07:44

the problem here is that objects don't belong to namespaces, if two namespaces hold references to one object, there's no definitive way to say who is responsible for the space usage

noisesmith20:07:41

you could compare how much space is used recursively by the objects under the fields in the ns, but, for example, all of clojure.core would count as references held by every ns

noisesmith20:07:58

so many things would be "counted" multiple times

souenzzo20:07:27

There is some methodology to find with namespaces are impacting my "initial memory" usage?

souenzzo20:07:04

Context: It's a tiny application that use almost no resources "on request", but do not fit in tiny ec2 machines

Alex Miller (Clojure team)20:07:56

also keep in mind that that size probably includes garbage that will be garbage collected if you are approaching the max heap size

Alex Miller (Clojure team)20:07:28

so you could probably set -Xmx64m and it would use less than that (because it would hit the limit and gc)

deactivateduser15:07:55

@souenzzo I can’t recall if EC2 instances are considered containers or not, but you might get some benefit from using -XX:+UseContainerSupport as a JVM parameter (note: requires JVM v9+). Amongst other things, that tells the JVM to check the container’s memory configuration and adjust things like memory limits appropriately.

souenzzo16:07:26

It's running on EC2/OpenJDK8. EC2 isn't a container I found some deps that requires cljs.analyzer and others cljs namespaces into classpath and remove they That footprint "metaspace usage" goes from 120Mb to 80Mb and now it works way better

👍 3
ghadi20:07:45

metaspace correlates with classloading. if your app.main loads a metric ton of dependencies, you'll see metaspace inflate

Alex Miller (Clojure team)20:07:27

oh yeah, sorry I didn't catch that above

Alex Miller (Clojure team)20:07:43

but even so, metaspace can be collected if needed too now right?

ghadi20:07:43

I think so, but likely won't have much collectable if it's the initial load of the application

noisesmith20:07:25

is it really the case that metaspace is that much larger than regular heap usage? (the root question is how to use a small enough amount of RAM to fit in a specific container, right?)

ghadi20:07:29

> do not fit in tiny ec2 machines @souenzzo can you elaborate?

borkdude21:07:26

@souenzzo If you're looking for something with a low memory footprint, take a look at GraalVM native-image. Babashka, a scripting environment compiled with GraalVM, can run with as low as 16Mb or something

seancorfield21:07:03

A question about accretive vs breaking change: if I have a protocol in a library and some implementations of it, and I add a new method to the protocol and add the definition to each of the library's implementations and that new method is only used in a new feature that is added to the library at the same time, should that be considered purely accretive? Concern: existing users of the library may have their own implementations of the protocol, without this new method... but at long as they don't use the new feature (a single new function), they would be unaffected by the change in the protocol. Only users who have their own implementations that want them to participate in the new feature would need to make changes -- to add an implementation of the new protocol method.

seancorfield21:07:15

I'm interested in anyone's / everyone's opinion but especially interested in @ghadi and @alexmiller I guess 🙂

Alex Miller (Clojure team)21:07:39

seems like a gray area but maybe ok. alternate would be to make a new protocol

Alex Miller (Clojure team)21:07:24

in general, I have often regretted making protocols bigger, but never regretted making them smaller :)

dpsutton21:07:30

protocols emit an interface right? any chance someone has used the underlying interface from java for speed? that would break there right?

Alex Miller (Clojure team)21:07:01

adding a method to an interface is not a breaking binary change in java

Alex Miller (Clojure team)21:07:17

sorry, I guess it would be for implementors

dpsutton21:07:22

oh i thought you had to fully implement interfaces in java

ghadi21:07:38

was the original protocol public? @seancorfield

ghadi21:07:46

you do not have to @dpsutton , will get AbstractMethodError if calling something not implemented

👍 3
seancorfield22:07:50

@ghadi Yes, I'm talking about next.jdbc.result-set/RowBuilder which is both public and documented -- although relatively unlikely to be used (except perhaps by @ikitommi for high-performance extensions to next.jdbc?). And the new method would only be used by the new feature I'm planning to add -- so the only "breaking" impact would be in terms of people starting to use the new feature on an existing 3rd party extension to the protocol that hasn't been updated with the new method.

seancorfield22:07:57

@alexmiller Even if I add a new protocol for this, the new feature is going to require it to be implemented so the new feature wouldn't be usable on 3rd party things that extend the current protocol but not the new one -- so the end result would be "the same" although the error message would be different (no implementation of <new protocol> vs AbstractMethodError calling the new method that was added to the existing protocol).

seancorfield22:07:07

So the change won't break anyone's existing code -- only new code, leveraging the new feature, and only for 3rd party implementations of the current protocol (which won't have the new method). Hence my asking here for input because it feels like it isn't technically a breaking change but might cause surprise for someone who tries the new feature on someone else's next.jdbc extension. The new method would allow for some refactoring of existing implementations to reduce boilerplate/repetition, if those authors wished.

seancorfield22:07:49

According to http://grep.app the only use I can find of that protocol (outside of next.jdbc itself) is metosin/porsas so I'm leaning to going ahead with the change and then opening an issue against that single project that has an implementation of RowBuilder...

ghadi22:07:22

What is the protocol before/after proposed change?

seancorfield22:07:44

The change would add a variant of with-column that allows for the column reading function to be passed in. Name TBD. The intention is to make it easier for people to write their own builders that can adapt/wrap other builders while controlling how columns are read/converted. And a new, more generic, builder-adapter would be added to next.jdbc itself that leveraged this machinery.

seancorfield22:07:56

Currently you have to have an adapter for each type of underlying builder because implementing with-column requires knowledge of how new columns are added to the row being built -- and that's the only place you can intercept/control how columns are read.

seancorfield22:07:25

(so it's ultimately a design bug in terms of how I allowed for future extension that I want to fix by adding a new method)

seancorfield22:07:07

Current adapter (for map row builders): https://github.com/seancorfield/next-jdbc/blob/develop/src/next/jdbc/result_set.clj#L237-L242 -- and there has to be another adapter for array row builders too, since that requires conj!. And it exposes the use of transients which is unfortunate.

seancorfield22:07:11

(a new type of column reader will be supported that is passed the ResultSet, the column index, and the entire builder object -- rather than just the ResultSetMetaData -- so that column readers can access (:cols mrsb) to get the actual Clojure keys associated with columns, rather than having to rely on JDBC interop to get SQL-level labels etc from the metadata object; and it will be up to the new column reader whether it calls read-column-by-index which is based on the ReadableColumn protocol, which I'll make extensible via metadata)

seancorfield22:07:05

(it isn't extensible via metadata right now because it's use was essentially "closed" inside the existing builders -- and I should have made that change when I added the adapters, which is lines 237-242 above)

seancorfield22:07:43

@ghadi Does that change any aspect of your initial feedback on the change (being accretive vs breaking)?