Fork me on GitHub
#xtdb
<
2020-05-12
>
euccastro01:05:29

Newbie question: Would it be reasonable to store blobs (images, even videos?) in crux with a S3 document store? Or would that cause problems with e.g., indexing, LMDB/RocksDB local caches, or something else?

jarohen07:05:22

Excellent timing - I raised an issue for this just last night 🙂 https://github.com/juxt/crux/issues/864. In theory there's nothing that should prevent us from doing this, although I'd just want to check that we haven't made any assumptions regarding byte arrays as values within the engine. I'm aware of a few people base-64 encoding binary values, although that may not be suitable for larger blobs. We used hashes in the query indices, so the size of values won't impact query speed (until it needs to return you the blob, obviously). Even so, I'd recommend that you split the blob into a separate Crux entity and then refer to it by key from other entities - this should minimise the disk space usage.

euccastro17:05:02

thank you very much!

euccastro01:05:15

I guess that would be a newish use case with S3, since I don't imagine it would be trivial to configure a kafka backend with arbitrary size storage, or split the blobs in user space? Or are people storing blobs in crux already?

jlmr14:05:38

I’ve been tinkering with crux lately. I don’t really understand yet how references work in crux. In Datomic you explicitly define references in your schema. How do they work in crux? How do you create a reference from one document to another? How do you write a query to get some attribute from a referenced document?

malcolmsparks14:05:35

Suppose you put an entity E with :crux.db/id of :abc. You then use :abc as the value of an attribute in another entity E2.

malcolmsparks14:05:04

You can use .URI, java.util.UUID too, but not strings.

malcolmsparks14:05:24

{:crux.db/id :foo} {:crux.db/id :bar :ref :foo}

malcolmsparks14:05:59

Of course, :foo might be just a keyword and not exist as an id in the database at all. This is intentional.

jlmr15:05:20

@malcolmsparks thanks. Let me see if I understand it correctly. Suppose entity E also has an attribute :info with "some info" as the value. And E2 has :some-ref with value :abc Would the following query work:

'[:find info
  :where [E2 :crux.db/id <the id of E2>]
         [E2 :some-ref the-ref]
         [E  :crux.db/id the-ref]
         [E  :info info]]

refset15:05:30

@U56R03VNW hey, pretty much, though you need to put :find vars in a vector, also it looks like you probably want to use :args, i.e.

'[:find [info]
  :where [E2 :some-ref the-ref]
         [the-ref :info info]
  :args [{:E2 <id of E2>}]]

jlmr16:05:08

Thanks! And I assume cardinality many refs would work similar

refset16:05:11

yep that's right, these all work exactly the same from a query perspective: {:crux.db/id <id of E2> :some-ref <ref-a>} {:crux.db/id <id of E2> :some-ref [<ref-a> <ref-b>]} {:crux.db/id <id of E2> :some-ref #{<ref-a <ref-b>}}

Eric Ihli21:05:53

I'm submitting a put transaction and it's not showing up in a subsequent query. I just finished the tutorial where I was running these expressions and everything was working fine.

app.pathom> (crux/submit-tx db/conn [[:crux.tx/put {:crux.db/id :person/test :email ""}]])
;; => #:crux.tx{:tx-id 12, :tx-time #inst "2020-05-12T21:00:02.538-00:00"}

;; Looks like that submitted. 
;; I expect to see a `:person/test` entity on this next query.

app.pathom> (crux/q (crux/db db/conn) '{:find [e] :where [[e :crux.db/id v]]})
;; => #{[:commodity/Pu] [:person/johanna] [:manifest] [:person/ilex]
  [:commodity/borax] [:commodity/CH4] [:commodity/Au] [:commodity/C]
  [:tombaugh-resources] [:gold-harmony] [:person/thadd]
  [:blue-energy] [:encompass-trade] [:commodity/N]}
Anyone have an idea about why this would happen? I'm using the following config.
(def config
  {:crux.node/topology '[crux.standalone/topology
                         crux.kv.rocksdb/kv-store]
   :crux.standalone/event-log-dir "crux-store/eventlog-1"
   :crux.kv/db-dir (str (io/file "crux-store" "db"))})
The only thing that I can think of that changed between when I was going through the tutorial and what I'm running now is that I'm starting the node from mount/defstate. So db/conn in that code above is from (defstate conn {:start (crux/start-node config)}) .

refset22:05:59

Hi @UJP37GW2K - are you running the above in a repl with a short pause in between the commands? I.e. checking this is not a case of needing crux/await-tx

Eric Ihli22:05:06

A pretty long pause. Minutes+

dvingo22:05:36

I was seeing something similar and changed two things. 1. setting the event-log kv-store (i believe it defaults to in mem). And 2 using await in all my writes:

(defn rocks-config [data-dir]
  {:crux.node/topology                    '[crux.standalone/topology
                                            crux.kv.rocksdb/kv-store-with-metrics]
   :crux.kv/db-dir                        (str (io/file data-dir "db"))
   :crux.standalone/event-log-dir         (str (io/file data-dir "eventlog"))
   :crux.standalone/event-log-kv-store    'crux.kv.rocksdb/kv
   :crux.standalone/event-log-sync?       true
   :crux.kv/sync?                         true})

(defn put-async
  ([data] (put-async crux-node data))
  ([crux-node data]
   (crux/submit-tx crux-node [[:crux.tx/put data]])))

(defn put
  ([data] (put crux-node data))
  ([crux-node doc]
   (crux/await-tx crux-node (put-async crux-node doc))))
In my code i only use put now (for my use case)

Eric Ihli23:05:06

Wrapping in crux/await-tx didn't work. I haven't tried changing the RocksDB config yet.