It's a bad habit I have of sometimes profiling cool projects and finding some performance lying in the floor. Especially in light of this https://tonsky.me/blog/datascript-2/ I was wondering if these findings could be useful and if there's interest in applying the results
Looks great! Could you make a PR?
Re: precompile queries, I had a similar idea. I have a query parser that I wrote for query engine rewrite, but I didn’t finished it. Old query engine doesn’t use it, and new one I didn’t finish. The parser is the only part of the rewrite that is finished and used for validation right now
I could make a PR, but I want better test cases first to have a better picture
I'd also want to bench both clj and cljs
to make sure I don't cause damage
There are some basic benchmark in scripts folder, take a look at them. I think they should be easy to run, but if not, let me know
Are they representative of real use cases?
Not really, but it's something
I mean if they become stricly better than probably real use-cases will become better too
Alright, I'll start and split the changes up
Right, after a few rounds of golf serious work the first round of improvements looks like: (with profiling on, java 8, linux)
add-1 372.2 ms/op 256.9
add-5 616.8 ms/op 503.9
add-all 602.6 ms/op 518.0
freeze 629.8 ms/op 582.6
init 23.9 ms/op 19.4
pull-many 1.2 ms/op 1.1
pull-many-entities 3.2 ms/op 2.9
pull-one 0.633 ms/op 0.580
pull-one-entities 1.1 ms/op 0.994
pull-wildcard 1.5 ms/op 1.4
q1 1.4 ms/op 1.1
q2 3.5 ms/op 2.7
q3 5.2 ms/op 4.2
q4 8.2 ms/op 5.9
qpred1 4.9 ms/op 4.7
qpred2 14.8 ms/op 10.2
retract-5 486.8 ms/op 383.6
Wow, here's something cool I just pulled off, q1 0.685
q4 down to 5ms
I think that without changing some of the essential algorithms there, it's the best an initial round of optimizations can get
If you feel like these improvements are worth it I can push the branch and we'll review the changes, I did a few things to help the JIT along which are weird
👀 I managed even better
q1 0.667 ms/op
q2 2.1 ms/op
q3 3.0 ms/op
q4 4.4 ms/op
@tonsky this is the one thing I'm really not sure about, besides the fact it causes a test to fail, because it's perverse code
(defmacro defcomp
[sym [arg1 arg2] & body]
`(def ~sym
(reify
java.util.Comparator
(compare [~'_ ~(with-meta arg1 {}) ~(with-meta arg2 {})]
(let [~arg1 ~arg1 ~arg2 ~arg2] ~@body))
clojure.lang.IFn
(invoke [~'this ~(with-meta arg1 {}) ~(with-meta arg2 {})]
(.compare ~'this ~(with-meta arg1 {}) ~(with-meta arg2 {})))
IFn$OOL
(invokePrim [~'this ~(with-meta arg1 {}) ~(with-meta arg2 {})]
(.compare ~'this ~(with-meta arg1 {}) ~(with-meta arg2 {}))))))
But it gives a direct code path with primitives all the way out to compareAll comparison combinations happen on the stack, which is nice
I don’t see why you would call it perverse. I think it’s pretty clear what it does. Maybe move (with-meta arg1/2 {}) outside quote and give it another name, so that macro body reads easier, otherwise looks nice. Which test fails?
Also I don’t get what’s the idea with (let [~arg1 ~arg1 ~arg2 ~arg2]?
Otherwise happy to review PR
The test which fails:
ERROR in (test-uncomparable-356) (RT.java:1249)
Uncaught exception, not in assertion.
expected: nil
actual: java.lang.IllegalArgumentException: Value out of range for int: 2829092613
at clojure.lang.RT.intCast (RT.java:1249)
datascript.db$reify__583.compare (db.cljc:419)
me.tonsky.persistent_sorted_set.Leaf.searchFirst (Leaf.java:54)
me.tonsky.persistent_sorted_set.PersistentSortedSet.slice (PersistentSortedSet.java:70)
me.tonsky.persistent_sorted_set.PersistentSortedSet.slice (PersistentSortedSet.java:49)
me.tonsky.persistent_sorted_set$slice.invokeStatic (persistent_sorted_set.clj:30)
me.tonsky.persistent_sorted_set$slice.invoke (persistent_sorted_set.clj:24)
datascript.db.DB._search (db.cljc:558)
datascript.db$fsearch.invokeStatic (db.cljc:515)
datascript.db$fsearch.invoke (db.cljc:514)
datascript.db$transact_add.invokeStatic (db.cljc:1345)
datascript.db$transact_add.invoke (db.cljc:1335)
datascript.db$transact_tx_data.invokeStatic (db.cljc:1601)
datascript.db$transact_tx_data.invoke (db.cljc:1426)
datascript.core$with.invokeStatic (core.cljc:263)
datascript.core$with.invoke (core.cljc:256)
datascript.core$with.invokeStatic (core.cljc:258)
datascript.core$with.invoke (core.cljc:256)
datascript.core$db_with.invokeStatic (core.cljc:270)
datascript.core$db_with.invoke (core.cljc:266)
datascript.test.transact$fn__9660.invokeStatic (transact.cljc:421)
datascript.test.transact/fn (transact.cljc:402)
clojure.test$test_var$fn__9763.invoke (test.clj:717)
clojure.test$test_var.invokeStatic (test.clj:717)
clojure.test$test_var.invoke (test.clj:708)
clojure.test$test_vars$fn__9789$fn__9794.invoke (test.clj:735)
clojure.test$default_fixture.invokeStatic (test.clj:687)
clojure.test$default_fixture.invoke (test.clj:683)
clojure.test$test_vars$fn__9789.invoke (test.clj:735)
clojure.test$default_fixture.invokeStatic (test.clj:687)
clojure.test$default_fixture.invoke (test.clj:683)
clojure.test$test_vars.invokeStatic (test.clj:731)
clojure.test$test_all_vars.invokeStatic (test.clj:737)
clojure.test$test_ns.invokeStatic (test.clj:758)
clojure.test$test_ns.invoke (test.clj:743)
clojure.core$map$fn__5885.invoke (core.clj:2757)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:51)
clojure.lang.ChunkedCons.chunkedNext (ChunkedCons.java:59)
clojure.core$chunk_next.invokeStatic (core.clj:710)
clojure.core$reduce1.invokeStatic (core.clj:944)
clojure.core$reduce1.invokeStatic (core.clj:936)
clojure.core$merge_with.invokeStatic (core.clj:3063)
clojure.core$merge_with.doInvoke (core.clj:3055)
clojure.lang.RestFn.applyTo (RestFn.java:139)
clojure.core$apply.invokeStatic (core.clj:669)
clojure.test$run_tests.invokeStatic (test.clj:768)
clojure.test$run_tests.doInvoke (test.clj:768)
clojure.lang.RestFn.applyTo (RestFn.java:137)
clojure.core$apply.invokeStatic (core.clj:667)
clojure.test$run_all_tests.invokeStatic (test.clj:780)
clojure.test$run_all_tests.invoke (test.clj:780)
datascript.test$test_clj$fn__10311.invoke (test.cljc:46)
datascript.test.core$wrap_res.invokeStatic (core.cljc:41)
datascript.test.core$wrap_res.invoke (core.cljc:39)
datascript.test$test_clj.invokeStatic (test.cljc:46)
datascript.test$test_clj.invoke (test.cljc:45)
clojure.lang.Var.invoke (Var.java:380)
user$eval144.invokeStatic (form-init836534765378270893.clj:1)
user$eval144.invoke (form-init836534765378270893.clj:1)
clojure.lang.Compiler.eval (Compiler.java:7181)
clojure.lang.Compiler.eval (Compiler.java:7171)
clojure.lang.Compiler.load (Compiler.java:7640)
clojure.lang.Compiler.loadFile (Compiler.java:7578)
clojure.main$load_script.invokeStatic (main.clj:475)
clojure.main$init_opt.invokeStatic (main.clj:477)
clojure.main$init_opt.invoke (main.clj:477)
clojure.main$initialize.invokeStatic (main.clj:508)
clojure.main$null_opt.invokeStatic (main.clj:542)
clojure.main$null_opt.invoke (main.clj:539)
clojure.main$main.invokeStatic (main.clj:664)
clojure.main$main.doInvoke (main.clj:616)
clojure.lang.RestFn.applyTo (RestFn.java:137)
clojure.lang.Var.applyTo (Var.java:705)
clojure.main.main (main.java:40)
let [~arg1 ~arg1 ~arg2 ~arg2]
is for metadatato propagate the Datom type hint
I can probably refactor the macro to look more readable, if it's okay to move it past the wild experiment phase
I guess one of the comparisons doesn't return 0,1-1
> is for metadata Oh I think I get it! Clever
I’m ok with the macro, as long as you figure out what’s going on with the test and if it needs fixing
Found what was failing the test, the last case of value-compare
I'll update the benchmark results soon with java 8, 11, 17
Awesome, will take a look soon!
I think the coolest part for me was realizing I can build up a chain of eductions during -collect which finally let me do a everything in a single lazy pass. It reduces straight over the sorted set with everything else being a transducer on top of it, which is cache friendly 🙂
now just to figure out a way to further optimize join
This is a bit of trivia, but adding vpred had an incredible effect on query performance
@ben.sless Sorry to butt in here. I was a bit confused by this code and had a couple of questions. While working through it I think I may have answered them but I'll post it anyway with what seems to me to be a better variation of it below:
(defmacro defcomp
[sym [arg1 arg2] & body]
`(def ~sym
(reify
java.util.Comparator
(compare [~'_ ~(with-meta arg1 {}) ~(with-meta arg2 {})]
(let [~arg1 ~arg1 ~arg2 ~arg2] ~@body))
clojure.lang.IFn
(invoke [~'this ~(with-meta arg1 {}) ~(with-meta arg2 {})]
(.compare ~'this ~(with-meta arg1 {}) ~(with-meta arg2 {})))
IFn$OOL
(invokePrim [~'this ~(with-meta arg1 {}) ~(with-meta arg2 {})]
(.compare ~'this ~(with-meta arg1 {}) ~(with-meta arg2 {}))))))
1. Why are you removing the metadata from the arguments using ~(with-meta arg1 {})?
2. Is the idea of (let [~arg1 ~arg1 ~arg2 ~arg2] ~@body) to restore the metadata back to the arguments?
3. Given that you aren't exposing the arguments, why not make the macro both more standard and safer with this form instead?
(defmacro defcomp
[sym [arg1 arg2] & body]
`(def ~sym
(reify
java.util.Comparator
(compare [_# arg1# arg2#]
(let [~arg1 arg1# ~arg2 arg2#] ~@body))
clojure.lang.IFn
(invoke [this# arg1# arg2#]
(.compare this# arg1# arg2#))
IFn$OOL
(invokePrim [this# arg1# arg2#]
(.compare this# arg1# arg2#)))))
I didn't know that you can implement primitive variants of IFn like IFn$OOL 🤔
@darrickw I'm happy for anyone with good feedback to butt in, thank you.
You're right the macro can be written like that, I ended up with something in between which made it into the PR.
Regarding invoke prim, there are many of illegal fun things you can get the compiler and runtime to do
This is the current version in HEAD
#?(:clj
(defmacro defcomp
[sym [arg1 arg2] & body]
(let [a1 (with-meta arg1 {})
a2 (with-meta arg2 {})]
`(def ~sym
(reify
java.util.Comparator
(compare [~'_ ~a1 ~a2]
(let [~arg1 ~arg1 ~arg2 ~arg2]
~@body))
clojure.lang.IFn
(invoke [~'this ~a1 ~a2]
(.compare ~'this ~a1 ~a2))
IFn$OOL
(invokePrim [~'this ~a1 ~a2]
(.compare ~'this ~a1 ~a2)))))))
I think this block is technically wrong, though it will probably work ok for now. It should really refer to the a1 and a2 args.
(let [~arg1 ~arg1 ~arg2 ~arg2]
~@body)
Of course!
With a couple of caveats: I profiled relatively small data sets, didn't run all the tests, and looked at a particular scenario
----
For example, I found that often there's a lot of duplicated work and redundant collections which are created, take hash-join and collapse-rels for example
That whole code path would benefit greatly from an efficient implementation which gives both the keys' intersection and difference between the rels. hand rolling those gives quite a performance boost (vs. using (set (keys ,,)))
(defn vintersect-keys [attrs1 attrs2]
(persistent! (reduce-kv (fn [vec k _] (if (attrs1 k) (conj! vec k) vec)) (transient []) attrs2))
#_(set/intersection (set (keys attrs1))
(set (keys attrs2))))
;; (vintersect-keys {:a 1 :b 2} {:b 2})
(defn keys-intersect?
[m1 m2]
(if (> (count m2) (count m1))
(recur m2 m1)
(reduce-kv (fn [_ k _] (if (contains? m1 k) (reduced true) false)) false m2)))
(defn -group-by
[f init coll]
(persistent!
(reduce
(fn [ret x]
(let [k (f x)]
(assoc! ret k (conj (get ret k init) x))))
(transient {}) coll)))
(defn hash-attrs [key-fn tuples]
(-group-by key-fn '() tuples))
(defn hash-join [rel1 rel2]
(let [tuples1 (:tuples rel1)
tuples2 (:tuples rel2)
attrs1 (:attrs rel1)
attrs2 (:attrs rel2)
common-attrs (vintersect-keys (:attrs rel1) (:attrs rel2))
common-gtrs1 (mapv #(getter-fn attrs1 %) common-attrs)
common-gtrs2 (mapv #(getter-fn attrs2 %) common-attrs)
keep-attrs1 (keys attrs1)
;; keep-attrs2 (vec (set/difference (set (keys attrs2)) (set keep-attrs1))) ; keys in attrs2-attrs1
keep-attrs2 (persistent! (reduce-kv (fn [vec k _] (if (attrs1 k) vec (conj! vec k))) (transient []) attrs2)) ; keys in attrs2-attrs1
keep-idxs1 (to-array (vals attrs1))
keep-idxs2 (to-array (->Eduction (map attrs2) keep-attrs2)) ; vals in attrs2-attrs1 by keys
key-fn1 (tuple-key-fn common-gtrs1)
hash (hash-attrs key-fn1 tuples1)
key-fn2 (tuple-key-fn common-gtrs2)
new-tuples (->>
(reduce (fn [acc tuple2]
(let [key (key-fn2 tuple2)]
(if-some [tuples1 (get hash key)]
(reduce (fn [acc tuple1]
(conj! acc (join-tuples tuple1 keep-idxs1 tuple2 keep-idxs2)))
acc tuples1)
acc)))
(transient []) tuples2)
(persistent!))]
(Relation.
(into
{}
(map-indexed (fn [x y] [y x]))
(->Eduction cat [keep-attrs1 keep-attrs2]))
#_(zipmap (concat keep-attrs1 keep-attrs2) (range))
new-tuples)))
Another one is silly, since join-tuples suffers when the tuple is an array, just add a predicate if it's an array and run another loop instead:
(defn join-tuples [t1 #?(:cljs idxs1
:clj ^{:tag "[[Ljava.lang.Object;"} idxs1)
t2 #?(:cljs idxs2
:clj ^{:tag "[[Ljava.lang.Object;"} idxs2)]
(let [l1 (alength idxs1)
l2 (alength idxs2)
res (da/make-array (+ l1 l2))]
(if (.isArray (.getClass ^Object t1))
(dotimes [i l1]
(aset res i (aget ^objects t1 (aget idxs1 i))))
(dotimes [i l1]
(aset res i (#?(:cljs da/aget :clj get) t1 (aget idxs1 i)))))
(if (.isArray (.getClass ^Object t2))
(dotimes [i l2]
(aset res (+ l1 i) (get ^objects t2 (aget idxs2 i))))
(dotimes [i l2]
(aset res (+ l1 i) (#?(:cljs da/aget :clj get) t2 (aget idxs2 i)))))
res))
Another place which kept popping up in profiling was looks-like?. big optimization would be to precompile patterns such that they won't have to be interpreted at query time, but even cutting out last/butlast where vectors are involved and avoiding sequence comparisons gave quite a boost:
(declare looks-like?)
(defn seq-looks-like? [[x & xs] [y & ys]]
(if (looks-like? x y)
(if (and xs ys)
(recur xs ys)
true)
false))
(defn symbol-looks-like?
[pattern form]
(if (= '_ pattern)
true
(= form pattern)))
(defn sequence-looks-like?
[pattern form]
(if (and (== (count pattern) 1) (= '* (first pattern)))
(sequential? form)
(let [vec? (vector? pattern)
_last (if vec? peek last)
_butlast (if vec? pop butlast)]
(if (= (_last pattern) '*)
(and (sequential? form)
(seq-looks-like? (_butlast pattern) form))
(and (sequential? form)
(= (count form) (count pattern))
(seq-looks-like? pattern form))))))
(defn- looks-like? [pattern form]
(cond
(symbol? pattern) (symbol-looks-like? pattern form)
(sequential? pattern) (sequence-looks-like? pattern form)
:else (pattern form)))
These were all quick and dirty experiments, not a methodological study, but they look promising enough to pursue things further.
Next items would be: proper methodology, good test cases, ClojureScript (which I pretty much neglected here), balancing readability and performance, and what counts as good enough improvements to justify ugly/hand-rolled codeTechnically you're right, but this macro is very anaphoric, they all represent the same symbols behind the scenes
I'm not averse to changing it 🤷
I'm a mere nosy onlooker here! My personal style is to only inject non-gensym symbols in a macro in places they are absolutely required, but this isn't my project. I'm looking forward to these speedups, though! Great work! I use this project in many contexts...
It's welcome feedback, I can't claim to be sure regarding the proper style here
Using gensym or sym# protects you from odd bugs. For instance, what if I call one of my arguments this? You could argue that it's not a good idea, but it would break this macro.
Oh you are right, I missed it. Luckily this macros is only used internally, but I’ll change ~'this to this# in the next version
Thanks @ben.sless this is a fantastic contribution! Well-presented, too
🙏
partyparrot
Thank you, I hope we can find more optimization opportunities there
100%, let me know if you spot anything else
BTW I never used Eduction before, nice to see an example
Eductions are awesome if you can guarantee you'll consume them once, maximal laziness on one hand and operators fusion on the other
It's interesting how different jvm versions have such widely varying results