Fork me on GitHub
#datomic
<
2019-09-16
>
frankitox16:09:32

Hello! When accessing the EAVT index, is it possible to get retracted entities? In which case? I'm thinking of creating an ElasticSearch index by traversing the EAVT, but I want to be sure I don't include retracted things.

✔️ 4
Joe Lane16:09:05

Will you be recreating the ES index as a batch? I've done a lot of work on this problem but with Lucene and it may be easier to add all assertions and retractions, then filter via the es query itself.

frankitox16:09:06

Yes, I'll be recreating the ES index as a batch. I thought about that too, this will be a problem if I ever want to use the ES suggestions (because they lack good filtering capabilities). Other possibility is filtering by :added while traversing? But this goes back to my original question as I'm not sure what EAVT returns.

favila16:09:23

only the history database includes retractions

favila16:09:32

(d/history db)

favila16:09:47

if you never called d/history on a db it will only ever have assertions

frankitox16:09:46

Ok thanks! So if I transact and then retract an entity e, it won't be included in the EAVT index?

favila17:09:40

(and easily verified by trying it)

frankitox17:09:47

Thank you. Yes, that was a lazy question, sorry.

thumbnail16:09:34

👋:skin-tone-2: Hello, I’m noticing this warning in our datomic projects; WARNING: requiring-resolve already refers to: #'clojure.core/requiring-resolve in namespace: datomic.common, being replaced by: #'datomic.common/requiring-resolve Any way to circumvent or suppress?

8
jaret15:09:07

Hi @UHJH8MG6S are you using datomic pro? What version? I believe we resolved this issue after clojure 1.10.1 release in the latest Datomic pro.

thumbnail16:09:47

Currently I’m on datomic pro 0.9.5786. will check out latset version

thumbnail16:09:02

Thanks! bumping to 0.9.5951 fixed it :thumbsup::skin-tone-2:

wilkerlucio20:09:50

hello, we are discussion here around some performance characteristics on datomic, we are using on-prem. the current implementation uses datomic queries and entities API, our query deals with 10.000 entities currently, we are wondering if moving from q + entities to q + pull would be faster. our assumption is that it may be faster because the pull may be able to more efficiently get all required datoms inside our instance, but we don't know enough about internals to validate if this is a good assumption. does this refactor approach makes sense?

favila20:09:59

if you know what you need ahead of time pull is likely to be faster, or at least can be made faster (whereas entity will always have a “should I prefetch this? will you need it?” problem)

favila20:09:36

there’s another advantage that pull gives you Real Maps and can do some key renaming for you, and you know for sure that IO is done

favila20:09:55

so you can isolate potentially blocking/latency-sensitive code to its own threads

favila20:09:19

(entity has unpredictable latency because there’s always a chance it has to perform blocking io)

wilkerlucio20:09:08

thanks, we did some benchmarks and got results that match your description:

wilkerlucio20:09:11

(crit/with-progress-reporting
    (crit/report-result
      (crit/quick-bench
        (->> (d/q '{:find  [[?e ...]]
                    :where [[?e :artist/name _]]}
               db)
             (mapv (comp #(select-keys % [:artist/name])
                         #(d/entity db %)))))))
                         
Evaluation count : 48 in 6 samples of 8 calls.
             Execution time mean : 12.594899 ms
    Execution time std-deviation : 1.845767 ms
   Execution time lower quantile : 10.651151 ms ( 2.5%)
   Execution time upper quantile : 14.538328 ms (97.5%)
                   Overhead used : 1.820422 ns


==============================================================


(crit/with-progress-reporting
    (crit/report-result
      (crit/quick-bench
        (->> (d/q '{:find  [[(pull ?e [:artist/name]) ...]]
                    :where [[?e :artist/name _]]}
               db)))))

Evaluation count : 36 in 6 samples of 6 calls.
             Execution time mean : 18.813521 ms
    Execution time std-deviation : 298.939459 µs
   Execution time lower quantile : 18.551782 ms ( 2.5%)
   Execution time upper quantile : 19.209279 ms (97.5%)
                   Overhead used : 1.820422 ns


==============================================================


(crit/with-progress-reporting
    (crit/report-result
      (crit/quick-bench
        (->> (d/q '{:find  [?e ?name]
                    :where [[?e :artist/name ?name]]}
               db)))))
Evaluation count : 300 in 6 samples of 50 calls.
             Execution time mean : 2.156460 ms
    Execution time std-deviation : 143.362504 µs
   Execution time lower quantile : 1.990183 ms ( 2.5%)
   Execution time upper quantile : 2.312776 ms (97.5%)
                   Overhead used : 1.820422 ns

wilkerlucio20:09:27

also, it seems like using the datalog matching is much faster (10x) than both of the other options

wilkerlucio20:09:39

this was run against the demo mbraiz database

wilkerlucio20:09:17

just to be sure, new benchmarks with full tests:

wilkerlucio20:09:19

(crit/with-progress-reporting
    (crit/report-result
      (crit/bench
        (->> (d/q '{:find  [[?e ...]]
                    :where [[?e :artist/name _]]}
               db)
             (mapv (comp #(select-keys % [:artist/name])
                         #(d/entity db %))))
        :verbose)))
        
Warming up for JIT optimisations 10000000000 ...
  compilation occurred before 1 iterations
  compilation occurred before 347 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
x86_64 Mac OS X 10.14.1 12 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 25.181-b13
Evaluation count : 6420 in 60 samples of 107 calls.

      Execution time sample mean : 9.898978 ms
             Execution time mean : 9.895288 ms
Execution time sample std-deviation : 536.776513 µs
    Execution time std-deviation : 544.280358 µs
   Execution time lower quantile : 9.142990 ms ( 2.5%)
   Execution time upper quantile : 10.901935 ms (97.5%)
                   Overhead used : 1.820422 ns


==============================================================


(crit/with-progress-reporting
    (crit/report-result
      (crit/bench
        (->> (d/q '{:find  [[(pull ?e [:artist/name]) ...]]
                    :where [[?e :artist/name _]]}
               db))
        :verbose)))
        
Warming up for JIT optimisations 10000000000 ...
  compilation occurred before 103 iterations
  compilation occurred before 307 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
x86_64 Mac OS X 10.14.1 12 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 25.181-b13
Evaluation count : 3300 in 60 samples of 55 calls.

      Execution time sample mean : 18.169708 ms
             Execution time mean : 18.174275 ms
Execution time sample std-deviation : 516.347064 µs
    Execution time std-deviation : 522.849633 µs
   Execution time lower quantile : 17.626452 ms ( 2.5%)
   Execution time upper quantile : 19.723233 ms (97.5%)
                   Overhead used : 1.820422 ns

Found 6 outliers in 60 samples (10.0000 %)
	low-severe	 2 (3.3333 %)
	low-mild	 4 (6.6667 %)
 Variance from outliers : 15.7926 % Variance is moderately inflated by outliers


==============================================================


(crit/with-progress-reporting
    (crit/report-result
      (crit/bench
        (->> (d/q '{:find  [?e ?name]
                    :where [[?e :artist/name ?name]]}
               db))
        :verbose)))
        
Warming up for JIT optimisations 10000000000 ...
  compilation occurred before 5846 iterations
Estimating execution count ...
Sampling ...
Final GC...
Checking GC...
Finding outliers ...
Bootstrapping ...
Checking outlier significance
x86_64 Mac OS X 10.14.1 12 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 25.181-b13
Evaluation count : 29940 in 60 samples of 499 calls.

      Execution time sample mean : 1.945301 ms
             Execution time mean : 1.945398 ms
Execution time sample std-deviation : 32.284694 µs
    Execution time std-deviation : 32.687682 µs
   Execution time lower quantile : 1.908870 ms ( 2.5%)
   Execution time upper quantile : 2.033389 ms (97.5%)
                   Overhead used : 1.820422 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 6.2567 % Variance is slightly inflated by outliers

favila20:09:45

these tests seem to show that I am wrong? d/entity appears 2x faster to me

favila20:09:55

than pull in a query expression

favila20:09:17

did you try d/pull or d/pull-many after the query?

favila20:09:55

maybe there’s a large fixed cost to compiling the pull expression

favila20:09:51

and yes, pulling directly from the query itself (no entity or pull) is always going to be much faster, but the shape is often wrong and it can’t represent nils

dazld19:09:24

wilker, did you try extracting the pull part from your query, into its own function?

dazld19:09:39

quite interested in your findings

wilkerlucio19:09:48

I didn't ran the tests with the pull out yet, have to test that one in some other time

👍 4