xtdb 2021-01-25 | Slack Archive

Steven Deobald01:01:50

@coyotesqrl You mean a list which contains both carrots and peas but not a list which contains only one or the other?

R.A. Porter01:01:59

Correct. The intersection of my query list and the list in the docs should be the entirety of my query list. Basically a "has all these elements" request.

Steven Deobald01:01:37

As far as I can tell, there's no built-in way to perform that query.

R.A. Porter01:01:14

That's what I thought. Tried using a predicate but that didn't cut it. I think I'll have to post-process my results to filter.

Steven Deobald01:01:05

When the folks in the UK wake up, I'd certainly be curious if there's any way to combine predicates and vector queries.

Jacob O'Bryant01:01:12

I think :where '[[doc :list "carrots"] [doc :list "peas"]] should work, right?

✅ 3

R.A. Porter02:01:42

It does; I guess I’ll just have to generate those clauses dynamically.

👍 3

mmer09:01:06

I am having issues with using arguments in my queries. The first query returns me values as I would expect. But the second does not. As you can see I have copied it from the docs as I have been struggling to get it to work. What am I missing. :chapter is a string not a number.

refset09:01:32

What happens when you use a literal in the query? i.e.:

(crux/q
    (crux/db conn)
    '{:find [p1 chpt]
      :where [[p1 :chapter "9"]]})

refset09:01:13

is "9" definitely a value for chpt that is returned in your first query?

mmer12:01:47

Literal works. So what is going on?

mmer13:01:40

This is what my data looks like: {:crux.db/id :web/myBook_1, :index 1, :source "Text", :lang :english, :book "MyBook", :chapter "9", :lines [{:line-no 1, :line "Line one"}]}

mmer14:01:51

I am using the crux api over the http connection, could this be the issue?

refset14:01:12

I'm not certain what the issue could be. Are you able to see the same behaviour locally (not over http)? Is this using 1.14.0 ?

mmer14:01:58

I am using 1.13. You can use external vars so may be it is not such an issue.

refset16:01:47

Thanks for confirming the version. I'm not sure what you mean by external vars in this context though

refset20:01:38

Hi again, has there been any joy since? If not I will attempt to reproduce what you're seeing tomorrow

Aleksander Rendtslev11:01:33

IDs of documents I’ve noticed three different trends in different repos here: 1: expose crux.db/id all the way to the client 2: convert back and forth. Eg: crux->id id->crux 3: save an extra field in the document called id (or a namespaced version of it) Any thoughts on that? I’m leaning towards number two. I don’t want db implementation details on my client, but number 3 also seems redundant

nivekuil12:01:14

I don't use crux.db/id at all in application logic. The reason is that it complects an operational concern (the document) with a logical concern (the triples). So humans reason in terms of triples, and the document is just a grouping of triples whose shape is purely a performance concern (bigger docs = faster queries on related attributes, but worse write characteristics).

nivekuil12:01:37

biggest downside is you don't get to use eql/project to navigate around since it only joins on ?e

Steven Deobald12:01:32

@U797MAJ8M If you don't expose :crux.db/id to your app logic at all, do you tend to do your lookups by some natural key?

nivekuil12:01:45

yeah I do 3: above, a logical entity id which doubles as the type. Although I think [?e :mytype/id] is a bit slower than [?e :type :mytype]

nivekuil13:01:05

oh, if you mean entity lookups I have a sort of hacky schema-ish thing built on malli that standardizes the :crux.db/id generation. So the rest of the app doesn't think about it.

Steven Deobald13:01:13

Oh, gotcha. Makes sense.

Aleksander Rendtslev13:01:35

Thank you @U797MAJ8M! I’m processing what you’re saying about triples and it makes sense (very much seem to be the way Datomic is designed?). I’m curious about your “hacmy schema-ish thing”. I’m doing somethnig similar (with malli) but would love to see how you’re doing it if you have something you can share?

nivekuil16:01:48

basically what I do is have :crux.db/id reference keys in the rest of the document. so I have a helper macro to 1. generate doc from schema (currently very slow, using malli/decode) 2. fill in :crux.db/id with specter, which looks like

(defn fill-id [doc]   (t/transform [:crux.db/id t/ALL]                (fn [[k v]] [k (get doc k v)])                doc))

3. pass in a map representing fields of the doc I want to set explicitly, which is merged with the malli generated doc and then gets passed to fill-id it's very rough, did it in one sitting with no real thought before or after so take it with a grain of salt. Mostly gets the API I want though

nivekuil16:01:47

that api being something like (doc/mytype {:mytype/id 1 :mytype/other-field :foo})

zclj15:01:39

I have been doing my dev work with an in-memory config. I now wanted to run with persistence and thought I'd use SQLite. What I have observed is that 1. Sometimes when doing a start/stop/start in the REPL, were start starts the crux node and stop calls close, the SQLite file will be locked and start fails. 2. If the running service is killed the same file locked error can show itself. Is this inherent to SQLite? At least in 1. I assumed crux would do a "clean" shutdown, since close is called. Finally, if getting into this state how should the crux node recover?

nivekuil16:01:01

are you using (or is crux setting) the WAL pragma? iirc the default journal gets really annoying with the locks. I think WAL is pretty much a mandatory setting with sqlite

zclj15:01:42

I'm just using the crux config, don't know what crux is doing internally

markaddleman17:01:59

Is there a way to exclude an attribute from lucene indexing? I'm looking to ingest json documents into crux. For traceability purposes, I'd like to record the json document as a string within a crux document but there is no need for full text search on the raw json.

refset17:01:49

Hi 🙂 there's no explicit mechanism for doing this, but you use the principle that Crux only indexes top-level attributes to your advantage by nesting the json string within another map

👍 3

chromalchemy17:01:57

I am trying to normalize data across a set of local csv files, and running into memory errors trying to do all the ETL manually, and have been wanting to get into datalog query style. Is there a simple way to use Crux locally, and have persistence? I am not advanced at config boilerplate stuff, or network deployment. Or should I be looking more at at something more like https://github.com/juji-io/datalevin (Datascript + persistence on LMDB) ?

Jorin17:01:18

A crux config to run it locally can be as simple as this:

{:crux/tx-log {:kv-store {:crux/module crux.rocksdb/->kv-store
                          :db-dir "tx-log"}}
 :crux/document-store {:kv-store {:crux/module crux.rocksdb/->kv-store
                                  :db-dir "docs"}}
 :crux/index-store {:kv-store {:crux/module crux.rocksdb/->kv-store
                               :db-dir "indexes"}}}

You can store all data in rocksDB. events, documents, index.

✅ 3

Jorin17:01:29

You can run the Crux instance right in the same JVM where you want to do your queries or you can query Crux from another process if you enable the HTTP API.

Jorin17:01:32

Hope that helps 🙂

Jorin17:01:42

RocksDB does all the caching for you. So you can definitely query more data than fits into memory. If it's the right tool for your use case, I cannot tell.

chromalchemy18:01:20

Ok, that doesn't look too hard.. So rocksDB handles the local file-level persistence..? Thanks, I will try it. :thumbsup:

refset21:01:53

Hi @U09D96P9B yep that's right, RocksDB handles local persistence. If you need high-availability then you'll want to look at using e.g. crux-jdbc with Postgres

chromalchemy19:01:26

It's working now. I'm giving it a go. Thanks!

🙏 6

2021-01-25

Channels