This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-12-03
Channels
- # adventofcode (151)
- # asami (34)
- # babashka (43)
- # beginners (111)
- # cider (2)
- # clj-kondo (6)
- # cljdoc (12)
- # clojure (140)
- # clojure-australia (10)
- # clojure-europe (14)
- # clojure-france (5)
- # clojure-gamedev (5)
- # clojure-nl (4)
- # clojure-uk (10)
- # clojurescript (20)
- # community-development (9)
- # conjure (1)
- # core-async (4)
- # cryogen (3)
- # cursive (2)
- # datomic (17)
- # emacs (9)
- # events (1)
- # fulcro (27)
- # juxt (8)
- # kaocha (2)
- # lambdaisland (14)
- # off-topic (23)
- # pathom (37)
- # pedestal (2)
- # re-frame (8)
- # reagent (8)
- # reclojure (9)
- # reitit (5)
- # reveal (34)
- # shadow-cljs (27)
- # spacemacs (10)
- # tools-deps (123)
- # vim (28)
- # xtdb (17)
@quoll I have done some quick experiments, but I probably do something wrong. My results are here: https://github.com/whilo/asami/blob/benchmark-comparison-hitchhiker-tree/test/asami/durable/pool_test.cljc#L230
Maybe you can have a look before our meeting later. I realized that the book contains a lot of double words and decided to only insert a set of those, but performance seems similar otherwise as well.
Even if performance seems similar, only inserting a set of the words is not going to be realistic usage. The purpose of this is to get a single, consistent ID value for every string that is passed in. Which means that it needs to search for them as well as store the ones it doesn’t find. Of course, caching will matter a lot for that, and 13k words will easily fit in memory, which will be why the performance looks similar at this size.
I haven’t looked at the loading yet, though reading is being done completely differently. In the HH case you’re iterating over the data, while for Asami you’re doing an individual lookup (i.e. search the tree) for each word. A more direct comparison there will be to use a tree iterator node-seq
Strings of 7 characters or fewer are turned into a number. It’s only strings of 8 characters or more that are saved in the tree
(I say “characters”. It’s really “bytes”. But I use UTF-8, and that text is all in the ASCII range, so it’s the same)
One difference I see in loading is in what is being done for each insert. Without knowing the hh code, it appears that you’re loading everything in a batch, then flushing. I presume that the load operation is ordering the data, right?
The Asami code is doing a 2 stage operation for each string:
• Does the string exist in the pool? If so, return its ID.
• Insert the string, returning the new ID.
The result is a seq of IDs that represent the original text. (Called coded
in my example).
The second step of my test was to use the pool to map the sequence of numbers in coded
back into their original strings, using the index.
That said, I fully expect the HH code to be faster at loading, though not by as much as your earlier test suggests
This machine (apparently fast than yours) encodes the strings into numbers in about 7 seconds (7436ms on my last run. I’m seeing faster and slower) However, that’s only the strings that are longer than 7 bytes, which 7051. You’ll likely be faster again when you restrict yourself to just those. Iterating over that takes 220ms
The data is not presorted before it is inserted into the hitchhiker-tree. Every string is inserted in sequence and the sorting happens internally, partially lazily, because the hitchhiker-tree is fractal.
Ok, with (time (doall (node-seq (:index bpool) (first-node bpool))))
bulk reading takes half the time for me than before. I have not benchmarked random accesses yet. For this it might be sensible to run a more careful benchmark. The hitchhiker-tree has a fairly nice benchmarking suite with reporting to an Excel sheet, albeit only for random insertions and deletes so far.
I am not sure abou the pool internals with the string ids that you are mentioning. Are the ids important for the user of the pool or are they just internal?
You had originally stated 11am last week. When I said I couldn’t do that time you suggested today and that you were going to ask around for who wants to join… but then nothing more was said!
Let's meet here https://meet.jit.si/WeakParticipationsRotateSharply.
I have to apologize. I was in a rush this morning, and didn’t think it through. I have a department meeting right now!
As being discussed offline, this is the program that Naga was running:
sibling(fred, barney).
parent(fred, mary).
sibling(mary, george).
gender(george, male).
parent(B, C) :- sibling(A, B), parent(A, C).
brother(A, B) :- sibling(A, B), gender(B, male).
uncle(A, C) :- parent(A, B), brother(B, C).
sibling(A, B) :- parent(A, P), parent(B, P), A != B.
gender(F, male) :- father(A, F).
parent(A, F) :- father(A, F).