Fork me on GitHub
#datascript
<
2017-05-31
>
misha14:05:11

@tonsky is there any particular design decision behind tx-ids ints range? I just noticed, that ~5-10 tx ids make up 70% of my transit-string-db. This is insane!

danielstockton14:05:46

Transactions are also entities. I think it's to keep an eid range reserved for attribute ids?

danielstockton15:05:34

It mirrors datomic and different eid range for different partitions.

misha15:05:56

thought of that too, @danielstockton

danielstockton15:05:40

This is my recollection anyway, don't have a good understanding stored in my head atm.

misha15:05:56

asking around in #clojure how to cache those in transit then

misha15:05:52

this means DS has hard limit of 50B entities. oh noes!

misha15:05:40

it seems I can do (str tx) on transit-write, and (int tx-str) on-transit-read

danielstockton15:05:58

Yes, I don't think transit caches integers but trying to find the reasoning.

misha15:05:47

Specifically, all ~#tag, keyword and symbol values are cached when they are more than 3 characters long (including the tag). Strings more than 3 characters long are also cached when they are used as keys in maps whose keys are all "stringable".

danielstockton15:05:44

Yep, was looking in the same place.

misha15:05:10

stringifying txs is questionable, because most of the transactions are just <10 datoms long, might not yield much benefit, but might be quite slower

danielstockton15:05:09

Yes, i'd assume so. Depends if you're optimizing for really tight bandwidth or speed.

misha15:05:26

is doing anything but his main project...

pedroteixeira15:05:46

is anyone already working on full-text search support? (thought about leveraging the lunrjs js impl)

misha16:05:33

[:datoms 497]
[:txs 4]
1 time: "Elapsed time: 4.710000 msecs"  ;; i7 macbook pro 2012, google chrome tab
1000: "Elapsed time: 4127.350000 msecs"
[:fast-count 16332]
1 time: "Elapsed time: 6.100000 msecs"
1000: "Elapsed time: 4978.610000 msecs"
[:short-count 13883]]

misha16:05:37

short 1, 1000 writes, 1, 1000 reads
"Elapsed time: 12.200000 msecs"
"Elapsed time: 4892.865000 msecs"
"Elapsed time: 18.630000 msecs"
"Elapsed time: 5594.200000 msecs"

fast 1, 1000 writes, 1, 1000 reads
"Elapsed time: 14.220000 msecs"
"Elapsed time: 3826.140000 msecs"
"Elapsed time: 20.645000 msecs"
"Elapsed time: 5157.845000 msecs"

misha16:05:57

15% shorter transit string. I think it does not worth it, given that I there are only 4 transactions in this data set

misha16:05:54

N 536870915 (9 symbols) becomes: 1 "~:536870915" and N-1 "^K" (13 and 4 symbols) but with amount of transactions, width of cached "^K" will grow, and since you cannot control what would be encoded with which code, you might end up encoding tx, which appears only in 2 datoms with "^K", and attribute, which appears in 50% of the datoms - with "^ZZZ", and any suddenly it is sower and takes longer to read/write

misha17:05:44

for the record, I did (-> tx keyword str) and (-> tx name js/parseInt).

misha17:05:26

actually, there are use cases, where preserving txs is not important at all:

[:no-tx 11362]
"Elapsed time: 3.825000 msecs"
"Elapsed time: 3596.825000 msecs"
"Elapsed time: 9.495000 msecs"
"Elapsed time: 4808.090000 msecs"