Fork me on GitHub
#clojure
<
2019-12-09
>
lvh03:12:00

I've finally had a chance to play with RDF/SPARQL again (dbpedia and wikidata) and I imagine there's all sorts of cool Clojure projects that could make this more pleasant than this in-browser editor. Anything I should look at?

lvh03:12:23

FWIW I can't tell if that means I'm doing it wrong but I'm mostly interested in geography (specifically bodies of water) and it feels like everything is ontology specific

lvh03:12:42

I'm using "length" but it's not like, foaf-equivalent, it's wikidata's weird length property

lvh03:12:28

(at some point I will remmeber how owl:equivalentClass works)

Alex Miller (Clojure team)03:12:45

well, we spent 3 years at Revelytix building a ton of cool RDF graph libs, SPARQL query engines, etc all in Clojure, but company sold and all its closed source code is no longer... :(

lvh04:12:28

Ah :( The best technology no-one’s ever heard of huh :( oh well, I’m sure I’ll be able to do something neat with this crummy browser :)

borkdude10:12:28

we're using a fork of https://github.com/arachne-framework/aristotle for some of our RDF work

borkdude10:12:28

I need a util to find a file on a classpath, something like URLClassLoader or URLClassPath that doesn't rely on these classes directly on any of the internal sun.misc or jdk.internal packages (separate from the running JVM's classpath). So a kind of simplified re-implementation. Does this exist already? It has to work with both JDK8 and JDK11. I can roll my own, but chances are there's already a Clojure lib for this.

Alex Miller (Clojure team)12:12:31

Did you look at java.classpath?

borkdude13:12:12

I have looked at it, but it doesn't seem to be able to find one thing in a classpath

borkdude13:12:26

you can enumerate all the things of course and then pick the first thing

borkdude13:12:07

but some form of caching would be nice for performance

borkdude13:12:31

I think I'll just have to build my own thing and use that code as inspiration

p-himik11:12:32

reify documentation states:

Methods should be supplied for all methods of the desired
protocol(s) and interface(s).
I'm a bit confused by that "should". I have a bunch of protocols that I need to reify in different ways that all have one thing in common - all but one method are supposed to not be implement and should throw something. It seems that not providing such methods to reify works well - calling the only implemented method works fine, calling the not implement ones results in AbstractMethodError. But is this something I can rely on? As in "make sure it fails loudly" not not in "I'll try to catch the exception and deal with it".

p-himik12:12:34

I can find some bits and pieces around but nothing authoritative enough.

Alex Miller (Clojure team)12:12:51

You can rely on it

👍 8
metame17:12:14

Based on some simple benchmarking using criterium it seems that clojure.data.int-map is performing about the same as a hash-map. Does anyone have any insight on whether there are still performance benefits to using int-map? The test:

(def mm (map #(assoc {} :a % :b (- 0 %)) (range 0 10000)))

(quick-bench (into {} (map (juxt :a :b) mm)))
;; Execution time mean : 2.863400 ms
;; Execution time std-deviation : 408.476661 µs
;; Execution time lower quantile : 2.610338 ms ( 2.5%)
;; Execution time upper quantile : 3.520416 ms (97.5%)

(quick-bench (into (i/int-map) (map (juxt :a :b) mm)))
;; Execution time mean : 2.796002 ms
;; Execution time std-deviation : 180.611581 µs
;; Execution time lower quantile : 2.678685 ms ( 2.5%)
;; Execution time upper quantile : 3.102775 ms (97.5%)

(quick-bench (r/fold i/merge conj (map (juxt :a :b) mm)))
;; Execution time mean : 3.031864 ms
;; Execution time std-deviation : 150.951817 µs
;; Execution time lower quantile : 2.894153 ms ( 2.5%)
;; Execution time upper quantile : 3.249662 ms (97.5%)

andy.fingerhut17:12:11

I believe one of the motivations for creating it was reducing memory usage.

andy.fingerhut17:12:44

That is mentioned in the README, with measurements.

metame18:12:18

Yes, it just also mentioned in the README performance benefits especially when using r/fold for map insertion (which I added to my test above). The memory benefits are probably still worth it though.

andy.fingerhut18:12:13

I may have missed corresponding measurements of r/fold on your environment for the built-in Clojure maps? Comparing your measurements to those in the README isn't a useful exercise.

metame19:12:41

Not comparing to those in the README, just to each other the default {} is performing better for map insertion than the r/fold version

andy.fingerhut19:12:39

Your def if mm is not legal Clojure syntax, and not clear to me what you intended.

andy.fingerhut19:12:56

data.int-map maps are intended to be restricted to keys that are ints. It appears that your example map mm is intended to have keywords :a and :b as keys? I don't know why data.int-map would even allow you to create such maps.

andy.fingerhut19:12:39

Oh, never mind my previous comment. Thinking...

metame19:12:09

Yep probably some quick examples of what you're getting at each step would be helpful

metame19:12:30

basically it's going from a coll of

[{:a 1 :b -1} {:a 2 :b -2}]
then the output of (map ...)
[[1 -1] [2 -2]]
then final output of the map insertion
{1 -1 2 -2}

andy.fingerhut19:12:52

This line may have been mangled at some step: (def mm (map #(assoc {} :a % :(- 0 %)) (range 0 10000))) since it is not accepted by the reader.

metame19:12:23

ah yes mangled by slack I think

metame19:12:49

updated... should be:

(def mm (map #(assoc {} :a % :b (- 0 %)) (range 0 10000)))

andy.fingerhut19:12:50

The original data.int-map results you are comparing against used 1 million entry maps, run on a 4-core machine. Not sure how many cores your machine might have, or be taking advantage of. Often parallel approaches need pretty large data structure sizes before they give advantages.

andy.fingerhut19:12:02

(if they give advantages at all)

metame19:12:44

yep makes sense, appreciate you taking a look. In general the maps we're using are <1000 entries so probably not going to see strong parallel advantages but going to analyze memory footprint next to see what we could get there.

andy.fingerhut19:12:51

Built-in Clojure maps, sets, vectors (except for the ones created using vector-of ) all use boxed integers/floats/etc. which for numbers are about 3 to 4 x larger in-memory than their unboxed counterparts. data.int-map uses some techniques for dense integer key sets where I believe not even the unboxed versions are stored explicitly.

andy.fingerhut19:12:35

Here is a gallery of some images created by a library cljol I wrote that can show sizes and in-memory layouts of Java objects, and graphs of such objects implies by the references between them: https://github.com/jafingerhut/cljol/blob/master/doc/README-gallery.md#different-ways-of-holding-vectors-of-numbers-in-clojure

metame19:12:12

cool thanks will check it out

andy.fingerhut18:12:33

Unless you happen to have matched the author's hardware, OS, JDK version, etc.

worlds-endless19:12:10

I'm looking for the fastest way to get the current year in CLJ without including any libraries. Anyone?

Alex Miller (Clojure team)19:12:18

(+ 1900 (.getYear (java.util.Date.)))

worlds-endless19:12:06

Thanks! I wasn't sure if java.util.Date was the new or old Java date thing

ghadi19:12:26

friends don't let friends use java.util.Date

ghadi19:12:46

(.getYear (java.time.LocalDate/now))

✔️ 4
ghadi19:12:56

that's the new thing

ghadi20:12:32

(they're obviously equivalent in this use-case, but we can't migrate to the better java.time package unless we have ubiquitous examples)

sogaiu20:12:01

@ghadi i was watching: https://www.youtube.com/watch?v=-zszF8bbXM0&amp;t=20m23s the other day and wondered if you had anything in text form somewhere

ghadi20:12:28

for java time @sogaiu?

sogaiu20:12:39

yes, sorry

ghadi20:12:12

the highest quality reference I know of for java.time is https://docs.oracle.com/javase/tutorial/datetime/TOC.html

orestis20:12:19

When writing a new piece of functionality, I write many small functions, trying to isolate the ones doing transformation from the ones doing side effects, and it all feels nice. At some point though, it all has to be tied together in a long-ish let binding that orchestrates all the small functions and has all the “defensive” code inside. Example:

(let [msg (decode arg)
      _   (when-not (valid? msg) (throw (ex-info ...)))
      record (lookup db msg)
      _      (when-not record (throw (ex-info "Can't find record"...)))
      ;; and so on
]
  (do-work! record ,, ,,))

orestis20:12:25

I wonder if this is considered “idiomatic” or there are different ways of doing this?

orestis20:12:27

I can think of putting the exception throwing inside the decode , lookup etc — but I’d prefer them returning nil or some explicit invalid value instead.

noisesmith20:12:49

one idiom is (let [msg (doto (decode arg) (assert "some message"))] ....) - as long as the error state is nil or false this works

noisesmith20:12:11

since doto will provide the value to assert, and return it

orestis20:12:07

I guess I could have my own little assert-like validation functions (I like my errors to be specific and have data, since then I can see what’s going in post-mortem)

Ike Mawira20:12:24

@orestis i think using the -> macro is more idiomatic for cases where you want this logical order. Certainly makes the code look neater too.

orestis20:12:46

(-> {:foo 1 :bar 2}
      :bar
      (doto (when-not (throw (ex-info "not found" {})))))

Ike Mawira20:12:48

Using this form, I'd suggest taking the exception throwing to the independent functions, and return nil from them. You can then check for nil in the next form.

Ike Mawira20:12:36

That would make this function to be

(-> {:foo 1 :bar 2}
    (decode)
    ((fn [xl]
      (if (not= (nil? xl)) (record xl) nil))))

seancorfield21:12:29

(not= (nil? xl)) isn't what you want here

user=> (not= (nil? :x))
false
user=> (not= (nil? nil))
false
user=> 
You could use (some? xl) -- or use plain (nil? xl) and swap the if clauses around, or use (if-not (nil? xl) ,,,) (which would let you keep the original clause order)

jumpnbrownweasel00:12:19

For assertions with data and informative messages I like using https://github.com/ptaoussanis/truss It has assertions that return the value being checked.

jumpnbrownweasel00:12:30

For example, the truss assersion have is passed a predicate and a value, and it returns the value if the predicate is truthy, or throws an exception if it is falsey.

(->> arg 
  (decode)
  (have valid?)
  (lookup db))

jumpnbrownweasel00:12:39

It can work out nicely in many situations.

jumpnbrownweasel00:12:29

The error messages are really nice. For example, if I make the valid? function above fail with value "1" I get:

Execution error (ExceptionInfo) at taoensso.truss.impl/default-error-fn (impl.clj:61).
Invariant violation in `test:2`. Test form `(valid? (decode arg))` failed against input val `"1"`.

Ike Mawira13:12:11

Thanks @U04V70XH6 for that correction.

orestis20:12:54

Combining the two seems… off 🙂

orestis20:12:36

Too tired to figure out a better way…

csm20:12:09

(defmacro guard->
  [init pred error & forms]
  `(if (~pred ~init)
     ~error
     ~(condp = (count forms)
        0 init
        1 `(-> ~init ~(first forms))
        2 (let [[pred error] forms]
            `(guard-> ~init pred error))
        (let [[f pred error & forms] forms]
          `(guard-> (-> ~init ~f) ~pred ~error ~@forms)))))

csm20:12:29

(only because I’m otherwise bored, I don’t know if this works well or not…)

Michael J Dorian20:12:26

I see a lot of threading macros around here, are they considered substantially better than using line breaks and indentation to format code?

csm20:12:23

“it depends”

dpsutton20:12:54

the alignment will keep indenting if you just use linebreaks

lilactown21:12:37

you might have intended this for #adventofcode

pastafari21:12:55

Yes thanks 🙂

Michael J Dorian20:12:41

Ok, so I would reach for -> to keep runaway indentation under control.

ghadi21:12:02

what indentation problem are we talking about here?

andy.fingerhut21:12:32

Some people like -> and ->> because it puts the function calls in the order that they occur in time, from beginning of text to end of text, instead of the opposite order.

👍 12
Michael J Dorian21:12:41

Lol, I'm used to the nested style (came here from common lisp) but I can see that would be clearer in a lot of cases. I'll try using it. Thanks!

sogaiu21:12:47

was reminded of this section of stuart sierra's talk "thinking in data": https://www.youtube.com/watch?v=kuNxVXnmjHA&amp;t=28m40s

👍 4
dpsutton21:12:02

i was mentioning indentation since it was suggested you could use newlines to replace threading

(->> x
     foo
     bar
     baz)
(foo
  (bar
    (baz x)))

ghadi21:12:39

so that isn't equivalent

ghadi21:12:49

the forms are in the same order

ghadi21:12:46

which makes the function applications in the reverse order

ghadi21:12:39

my heuristics for using threading macros are generally more conservative than what I see "in the wild"

👍 4
💯 4
ghadi21:12:06

I don't like using them when the threaded calls or threaded datastructures change dramatically

👍 4
ghadi21:12:57

starting with ->> then having a bare keyword used a function gives me pause

ghadi21:12:47

it's certainly valid usage

ghadi21:12:10

which is another way of saying I have a lower threshold for migrating parts into a let binding

ghadi21:12:21

which of course gives you the burden of naming

dpsutton21:12:35

yes. sorry. just wanted a quick example while working

frozenlock22:12:23

When using specs with generators, is it possible to provide it with some kind of seed value? Given my-spec, generate an example where :some-key = "Some given string". The rest of the keys might be dependent upon :some-key, for example if there's a multi-spec dispatching on it.

frozenlock22:12:04

Ah! It might be the overrides argument in the clojure.spec.alpha/gen function.

Alex Miller (Clojure team)23:12:44

Overrides are for alternate generators

Alex Miller (Clojure team)23:12:17

You can of course provide an alternate generator that creates values from an expected form or subset