xtdb 2020-07-11 | Slack Archive

Frosku00:07:31

(let [node (crux/start-node
              {:crux.node/topology '[crux.standalone/topology
                                     crux.kv.lmdb/kv-store]
               :crux.kv/db-dir (str (io/file "db"))})]
  (crux/submit-tx node [[:crux.tx/put
                         {:crux.db/id :img,
                          :description "",
                          :tags
                          ["female"
                           "grass"],
                          :source "",
                          :format "PNG",
                          :mime-type "image/png"}]])
  (crux/q (crux/db node)
          '{:find [id]
            :where [[e :crux.db/id id]]
            :full-results? true}))

OK, I've simplified it down to pretty much the simplest thing that doesn't work, and I'm baffled. Why when I do this lookup is the only thing in the database => #{[nil]}?

jonpither08:07:15

Hi @frosku. This could be that you need to use await-tx to know that that the tx has completed on the node, before you attempt a query.

Frosku00:07:13

It seems to work with rocksdb, but rocksdb has other issues for me -- it never seems to let go of its lock.

Frosku01:07:52

It works with RocksDB but not with LMDB

quadron01:07:51

it works for me

Frosku01:07:56

How do I get RocksDB to let go of a lock?

jonpither08:07:34

Is this causing issues for the running of Crux itself, i.e. exceptions?

jonpither08:07:49

I'm aware that running a Crux node will result in a lock against the rocks db file, though we haven't encountered it being an issue for users.

Frosku11:07:54

It's making me unable to mount the db in REPL to see if anything saved.

quadron01:07:20

no idea about rocksdb

Frosku02:07:57

LMDB is giving me issues now too, it's defaulting to memKv for no discernable reason

Frosku02:07:07

20-07-11 02:57:48 canterlot WARN [crux.kv.memdb:1] - Using sync? on MemKv has no effect. Persistence is disabled.
20-07-11 02:57:48 canterlot INFO [crux.tx:1] - Started tx-consumer
20-07-11 02:57:48 canterlot INFO [crux.hash.jnr:1] - unknown
20-07-11 02:57:48 canterlot INFO [crux.hash:1] - Using libgcrypt for ID hashing.

Frosku02:07:21

(defn get-crux-node
  [database]
  (crux/start-node {:crux.node/topology '[crux.standalone/topology
                                          crux.kv.lmdb/kv-store]
                            :crux.kv/db-dir (str (io/file database))}))

Frosku02:07:35

Why is it trying to use MemKv?

Frosku02:07:43

But then my db/data.mdb file has grown...

Frosku02:07:44

So confusing

jonpither08:07:46

The standalone topologies uses two KV stores, one for the tx log and one for indexing. We are working on making this less confusing, I think you been caught up by https://github.com/juxt/crux/issues/818

refset09:07:31

Yeah this is a very recent change - the memdb used for indexing doesn't need to be logging a sync warning - now noted that we need to hide it, thanks!

mac11:07:12

Can someone explain what is going on here? Is this a case of the index not being updated before the query is run? How can this be prevented?

pengine.core> (defn start-standalone-node ^crux.api.ICruxAPI [storage-dir]
    (crux/start-node {:crux.node/topology '[crux.standalone/topology]
                      :crux.kv/db-dir (str (io/file storage-dir "db"))}))

  (def node (start-standalone-node "crux-store"))
=> #'pengine.core/start-standalone-node=> #'pengine.core/node
pengine.core> (defn q
    [query]
    (crux/q (crux/db node) query))
=> #'pengine.core/q
pengine.core> (defn add-test-doc []
                (crux/submit-tx
                 node     
                 [[:crux.tx/put
                   {:crux.db/id (uuid)
                    :type :test}]]))
=> #'pengine.core/add-test-doc
pengine.core> (defn find-test-doc-id []
                 (let [query {:find '[?e]
                              :where [['?e :type :test]]}] (q query)))
=> #'pengine.core/find-test-doc-id
pengine.core> (let [doc (add-test-doc)]
                (find-test-doc-id))
=> #{}
pengine.core> (let [doc (add-test-doc)]
                (find-test-doc-id))
=> #{[#uuid "72f065f8-522b-45fc-825e-360e6799cccc"]}

malcolmsparks11:07:30

Hi @U09UV3WP6. Have you tried calling 'sync' (see crux api).?

malcolmsparks11:07:23

Submit is async, so normally you'd have to wait for it to index before you can read.

mac11:07:46

(let [doc (add-test-doc) s (sync node)]
                (find-test-doc-id))

gives same result

malcolmsparks11:07:37

Which version of Crux are you on?

mac11:07:00

[juxt/crux-core "20.06-1.9.0-beta"]

refset11:07:51

@U09UV3WP6 1.9.0 has known bugs that might be related, see the announcement earlier in the chat yesterday https://clojurians.slack.com/archives/CG3AM2F7V/p1594394543220800 please try again with 1.9.2 and see if you have the same result

malcolmsparks11:07:55

Could you verify the same behaviour on 1.9.2-beta? If it's still the same I'll try to replicate. I'm still learning the ropes myself so bear with me.

mac11:07:43

I can confirm same behaviour with [juxt/crux-core "20.07-1.9.2-beta"]

👍 3

refset12:07:23

Sync isn't necessarily fool-proof in a read-your-writes scenario, try using await-tx and pass in the result of submit-tx

malcolmsparks13:07:36

Ah yes, I should have mentioned await-tx. Thanks @refset

mac14:07:38

await-tx gives the desired result. Thanks.

👍 6

Frosku11:07:18

If I want to re-mount the db say in repl to do some queries, what do?

malcolmsparks11:07:24

@frosku you can certainly do that. I usually have my Crux node defined as a component in an Integrant or component set up, and access it via the 'system'. Or you can just def it somewhere.

Frosku11:07:53

I'm choosing the directory via a variable to a CLI script

Frosku11:07:09

https://github.com/Frosku/mirrorpool/blob/master/src/mirrorpool/core.clj ^ The variable is parsed here

Frosku11:07:27

https://github.com/Frosku/mirrorpool/blob/master/src/mirrorpool/derpi.clj ^ Then fed into get-crux-node here.

Frosku11:07:32

What I'd like to do is ideally while the script is running, query the db to make sure stuff is going into it.

malcolmsparks11:07:02

Ok. So the question is whether 2 jvms can share a node's index dir structure. I'm not sure. What's your index engine, RocksDB or LMDB?

Frosku11:07:19

LMDB

Frosku12:07:04

Maye I need to set it up with Kafka and create an observer app?

Frosku12:07:48

I don't need fast ingress, just fast queries, so I think LMDB is right?

malcolmsparks12:07:03

You just need a shared tx log, Kafka is one option.

malcolmsparks12:07:21

LMDB should be good for that yes. I think you do need to scale out and create another node that is hooked up against the same tx log.

Frosku12:07:22

Just check my sanity here, once this script is finished, I should be able to re-mount the db from REPL and it should be queryable?

malcolmsparks12:07:25

I believe so, yes.

Frosku12:07:33

Also, are writes synced by default?

Frosku12:07:12

Because I do a (System/exit 0) at the end and I don't want to lose data lol, so not sure if I should be doing something explicit in cleanup step to wait for it to finish.

malcolmsparks12:07:17

No. The configuration docs say the sync default is false.

Frosku12:07:57

OK, so what should I be doing to ensure I write everything before exiting?

malcolmsparks12:07:58

But you can configure that

Frosku12:07:17

Async is probably better for the long running process.

Frosku12:07:41

But is there a fn to await all outstanding tx?

Frosku11:07:04

-rw-r--r-- 1 frosku users 26820608 Jul 11 12:56 data.mdb

Frosku11:07:10

It definitely seems to be

Frosku12:07:58

But then when I try to re-open and query node I get nothing.

Frosku12:07:27

(defn h
  []
  (let [node (get-crux-node "/derp/db")]
    (crux/q (crux/db node) '{:find [id] :where [[e :crux.db/id id]] :full-results? true})))

(h)
=> #{[nil]}

malcolmsparks12:07:52

Which Crux version @frosku

Frosku12:07:15

[juxt/crux-core "20.06-1.9.1-beta"]
                 [juxt/crux-lmdb "20.06-1.9.1-alpha"]

malcolmsparks12:07:11

Try with 20.07-1.9.2-beta

malcolmsparks12:07:38

There's a particular issue you're hitting with 1.9.1 that's fixed in 1.9.2

Frosku12:07:41

Same version bump for lmdb?

malcolmsparks12:07:00

Yes, but I think that's irrelevant.

malcolmsparks12:07:11

But better to align

Frosku12:07:12

Still -alpha?

malcolmsparks12:07:15

Yes

malcolmsparks12:07:50

I believe lmdb is still alpha, you'll get a maven download error if you get it wrong

Frosku12:07:41

What's the issue btw?

Frosku12:07:02

Still #{[nil]}

Frosku12:07:51

When the script closes, the query returns data, but when I try to fire up a repl with the same node, it returns #{[nil]}

Frosku12:07:00

Is there something I'm not understanding?

Frosku12:07:33

And then if I run it for a different set of data it returns nil at the end of script execution

Frosku12:07:52

OK, so turns out I needed to do a bit more to persist the event log and then I could query back again

Frosku12:07:02

(defn get-crux-node
  [database]
  (crux/start-node {:crux.node/topology '[crux.standalone/topology
                                          crux.kv.lmdb/kv-store]
                    :crux.kv/db-dir (str (io/file database "db"))
                    :crux.standalone/event-log-kv-store 'crux.kv.lmdb/kv
                    :crux.standalone/event-log-dir (str (io/file database "evt"))
                    :crux.standalone/event-log-sync? true
                    :crux.kv/sync? true}))

malcolmsparks12:07:17

dvingo13:07:21

I think this is a common "gotcha" when starting with crux - it seems like a good thing to make more prominent in the documentation under standalone node configuration

👍 9

☝️ 3

➕ 3

Frosku13:07:49

100%, I definitely found it very confusing

2020-07-11

Channels