This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-08-03
Channels
- # announcements (63)
- # asami (29)
- # beginners (23)
- # calva (23)
- # cider (18)
- # clj-kondo (12)
- # clojure (156)
- # clojure-europe (30)
- # clojure-italy (6)
- # clojure-nl (5)
- # clojure-uk (6)
- # clojurescript (14)
- # conjure (26)
- # cursive (8)
- # datalog (143)
- # datascript (1)
- # datomic (6)
- # duct (2)
- # emacs (50)
- # events (4)
- # figwheel-main (6)
- # fulcro (7)
- # graphql (12)
- # jobs (6)
- # malli (1)
- # mid-cities-meetup (2)
- # off-topic (4)
- # pathom (6)
- # portal (7)
- # re-frame (10)
- # reagent (8)
- # reitit (10)
- # releases (1)
- # reveal (18)
- # shadow-cljs (39)
- # sql (4)
- # tools-deps (36)
- # vim (25)
- # xtdb (6)
Asami is also still work in progress, Paula is still planning to add a storage layer for instance.
Am I doing something wrong here or are Asami queries really 100x faster than with Datomic/DataScript?
The first query is literally just a get-in
operation followed by a projection.
The second query does the planning (at a guess, it will likely pick the provided ordering of the query) and it will then turn into a set of get
operations in a loop for however many people there are named “Ivan”. I would it expect it to be trivially fast as well
I know one has to be careful with microbenchmarks, I'd absolutely love for people to challenge these results. PRs to that repo are also welcome, would be nice if it could grow into a comprehensive comparison of the different implementations.
I don't know much about either of these implementations but you could check where the time is spent e.g. via clj-async-profiler
It would be unfair to compare Asami to Datomic, because that system is designed for its durable provisioning. There’s a lot going on. Datascript is more comparable. Asami indexing is simpler, and it doesn’t really keep things like deletion datoms. That may help Asami to be faster here? Also, a micro benchmark will be hitting the cache on Asami’s query planner. I don’t believe that the others have a query optimizer, while Asami does. There is a small cost to using it, but if you keep running the same query without updating the database then it will just return the same query plan without getting re-executed.
I just realized… the query planner doesn’t execute on the first query, because there’s nothing to plan. I’m honestly confused why Datascript would be so slow for that query
I just looked at the code that would be executed for the first query. It’s several if
/`cond` statements that figure out the structure of the query, followed by a get-in
from the index.
Then comes the expensive bit…
For every result, it does a reduce
over the 1 column defined in the :find
clause, doing a map get
to find the offset of the bound variable ?e
(which is 0) and doing: (assoc '[?e] 0 result)
to create each result row.
Literally, that’s it. I can’t see why it would take multiple milliseconds?
because I think the performance difference mostly comes from the the differences in index data structures
The indexes are: {entity {attribute #{value}}} {attribute {value #{entity}}} {value {entity #{attribute}}}
The indexes are pluggable. I’m building a new one at the moment, which is quite different (very similar to the Mulgara indices)
It’s a tree backed by a buffer. In ClojureScript I want that buffer to be stored in the browser’s persistent storage. In Clojure it’s backed by a MappedByteBuffer on a file
As I said, it’s based on Mulgara. I have not maintained it for some time, but the last I heard it was still being used commercially: http://mulgara.org/
I stopped working on Mulgara because I wanted to redo a lot of it in Clojure (it’s a Java project)
I want to revisit the architecture. We made decisions about it in 2001 that involved tradeoffs that we don’t need to make anymore.
and the main overhead for the trees is I/O, so the speed improvement gained from Java isn’t a big deal
It’s a hybrid plan… • We use it at Cisco. I’ve been expanding on its capabilities as needed. • It has supplanted Mulgara as my personal project. I’m trying to get back to what Mulgara can do
But, to a certain extend, the fact that Datomic was not open source has been frustrating. So I thought it would be nice to have an option. It will fill a different niche though, since it isn’t peer-based, and it’s based on an OWA like RDF/SPARQL
There are new architectures that I want to try with triple indices for different use-cases. Mulgara’s complexity has been too hard to make this change, and I’m hoping that Asami gives me the flexibility to do what I’ve wanted to there
it’s good to have more options, so far i counted 7 options that is doing datomic-like queries in the clojure world
Well one thing I haven’t done is to keep the full history of assertions and retractions. There’s a lot of complexity if I want to fully index that
That protocol may get tweaked (around transactions), but that’s basically the level of abstraction I need
It takes a graph and 3 elements (that come from a constraint pattern). Each element can either be a scalar value (e.g. a keyword, string, long, etc), or else a symbol that starts with ?
. These symbols are treated as variables.
It then returns bindings for all of those variables, based on the triples that match the provided scalar values.
The simple index (there’s another index as well) does this using asami.index/get-from-index
https://github.com/threatgrid/asami/blob/main/src/asami/index.cljc#L33
There’s a simplify
function that converts values into :v
and variables into ?
. That’s used to determine the dispatch
You can see there are 8 variations, each with its own code to return the appropriate result
The lettering of SPO is based on RDF: Subject-Predicate-Object. This is equivalent to EAV
so instead of returning all the matched full datoms like in datascript, this returns the bound values only?
Well, I thought about returning all 3 columns, and at the time I thought that I would just need to project them away anyway, so why would I?
so the indexing has to be more complex than a simple set of datoms, like in datomic or datascript
there’s an index-add
function, an index-delete
function and then just a bit of code in the graph to call them
B-trees are nice too, but I’ve had great success with AVL for both upload speed and read performance
So even though it’s a deeper tree, it’s not as slow to search as it seems at first glance. Also, each node handles a range, which means it’s not purely AVL
i see, so it sounds to me the storage could be the exactly the same as datascript, only need add a step to filter to return the bunded values only
consider these triples:
a m 1
a m 2
a n 1
a o 3
b m 2
b m 3
c m 2
This is equivalent to:
a m 1
2
n 1
o 3
b m 2
3
c m 2
The second is the nested-map viewon disk you find the boundaries of what you’re looking for by tree searches O(log(n)). When you have maps, then it’s O(1), which is obviously nicer.
or you still storing the later, but use the bounds to avoid scan the whole range? instead, scan a small range, then move to the next?
There are ways to reduce the amount being stored, but given the tradeoffs, I’m going with the simplest approach first
If I’m looking for [a ?x ?y]
then I need to search twice. I’ll find the start, and then find the point between [a o 3] and [b m 2]
so my understanding is the exact same storage that datascript uses can be used in your query engine
You can even just store flat triples and do everything via filter
. The performance would be terrible, but it would work fine
what I did in Datalevin is to port datascript to LMDB, which becomes faster than Datascript, even though datalevin is on disk and datascript is in memory
the same thing can be done with Asami, the only thing missing is the filtering step we discussed above
once resolve-triple is implemented as your api requires, your query engine can be used, i think this combination will yield the best possible implementation
sorry, I’ve been distracted. It seems that ClojureScript does not support find
on transient maps
i don’t care about clojurescript, i only care about clojure, so it’s good to have choices
@U07FP7QJ0 created the channel #asami so feel free to ask things in there