Fork me on GitHub
#xtdb
<
2021-03-26
>
R.A. Porter15:03:22

I’m currently writing a ticket for my project, in advance of the next Crux release, to update a few of our queries that currently use eql/project. Is the new pull going to be considered more stable going forward or is it slated to be alpha and subject to change again? If it’s the former, I’ll just plan on us fixing/tweaking it for the new api; if the latter, then I at least want to consider the option of removing the use of project/pull from queries so we don’t churn on future updates.

refset17:03:50

Hey @U01GXCWSRMW we're definitely dropping the "ALPHA" warning with the rename, as per https://github.com/juxt/crux/commit/a0dc81f41e21542b1e370f37a20dbfec2d811309#diff-868d3597c7324a08da9c6f15712d4d98972d2f582e9b1314dcc0c53b5c096fc5L130 ...so you should be able depend on it more even more confidently 🙂

R.A. Porter17:03:14

Awesome! Can’t wait for the new release!

☺️ 3
seancorfield20:03:58

@U899JBRPF Any idea when the next release will be? Looks like lots of changes since the last one…

refset20:03:01

The timetable is currently looking like Wednesday or Thursday next week - although I can't confirm until Tuesday. Not long though!

3
👍 3
kongeor19:03:00

Hello! I'm playing with the lucene search (great feature btw! 🙂), and having some issues. Not sure if this should work:

(defn lucene-text-query [title]
  (crux/q
    (crux/db (-> system :db :db))
    '{:find [?e ?s ?t]
      :in [?q]
      :where [[(lucene-text-search "mext.headline\\/title:%s" ?q) [[?e ?s]]]
              [?e :mext.headline/title ?t]]
      :order-by [[?s :desc]]}
    title))

kongeor19:03:07

(it doesn't)

kongeor19:03:36

neither does the following, I would expect that it should, but maybe I'm wrong:

kongeor19:03:37

(defn lucene-text-query [title]
  (crux/q
    (crux/db (-> system :db :db))
    '{:find [?e ?s ?t]
      :in [?q]
      :where [[(lucene-text-search "mext.headline\\/title:%s" "cov*") [[?e ?s]]]
              [?e :mext.headline/title ?t]]
      :order-by [[?s :desc]]}
    title))

kongeor19:03:15

just ignore ?q in this case and passing the value. The doc says it will be passed via format

kongeor19:03:16

the query does work if I do the formatting before the param is passed to the query, which is what I would do anyway:

kongeor19:03:17

(defn lucene-text-query [title]
  (crux/q
    (crux/db (-> system :db :db))
    '{:find [?e ?s ?t]
      :in [?q]
      :where [[(lucene-text-search ?q) [[?e ?s]]]
              [?e :mext.headline/title ?t]]
      :order-by [[?s :desc]]}
    (format "mext.headline\\/title:%s" title)))

kongeor19:03:34

was just curious if the other cases should or shouldn't work

Steven Deobald20:03:36

I had initially tried the Lucene multi-field stuff before switching to wildcard and I had a similar issue (though I wasn't sure if it was the way I was using it or not)... from what I remember of digging through the Lucene docs, that Lucene multi-field string is surprisingly finicky.

Steven Deobald20:03:12

The test doesn't actually cover the internal formatting case described in the docs, I just noticed: https://github.com/juxt/crux/blob/master/crux-lucene/test/crux/lucene/multi_field_test.clj

kongeor08:03:59

thanks! yes, I also had to consult the test cases to resolve this.

Steven Deobald18:03:42

Did you find a test case that provided you an example that helped you get the built-in formatter to work?

kongeor06:03:55

I thought it's just the standard clojure format (and I guess it is) but you are right, there is no tests that covers the same example that exist in the docs

refset09:03:23

Hey @U050BA9V5 thanks for reporting this, I've now fixed it ahead of the new release today 🙂 https://github.com/juxt/crux/commit/8d07a45deaee2b64f029ba2eadc6c3a23cb597ed

kongeor09:03:48

hello! nice! thanks! btw, I had noticed also something weird when was playing with this. Didn't had the time to dig deeper into it so will just quickly explain it before I forget. I was poking with this for a few hours on a db that had really not much data , but I noticed my disk led constantly flashing, and I realized that it was this java process that was doing constantly I/O. This process had around 10G of I/O over a few hours, and I just had 30-50 documents with a few fields. Didn't seem normal 🙂

refset11:03:26

Oh, hmm, that does sound weird! Which OS are you using? Is that with Rocks for tx-log and doc-store?

kongeor15:03:21

I'm on Arch linux. It's Rocks for index and postgres for tx and doc stores. I only noticed that when was poking with the text search.

kongeor15:03:29

ok I ran the project again and I see that this happens when I reset my system.

kongeor15:03:38

Exception in thread "crux-polling-tx-consumer" java.nio.channels.ClosedByInterruptException
	at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:199)
	at java.base/sun.nio.ch.FileChannelImpl.endBlocking(FileChannelImpl.java:162)
	at java.base/sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:388)
	at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:182)
	at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43)
	at org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java:484)
	at org.apache.lucene.index.SegmentInfos.prepareCommit(SegmentInfos.java:804)
	at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4914)
	at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3308)
	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3597)
	at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1099)
	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1140)
	at crux.lucene$__GT_lucene_store$fn__6357.invoke(lucene.clj:239)
	at crux.bus.EventBus$fn__19497.invoke(bus.clj:104)
	at clojure.lang.AFn.run(AFn.java:22)
	at crux.lucene$$reify__6355.execute(lucene.clj:237)
	at crux.bus.EventBus.send(bus.clj:104)
	at crux.tx.InFlightTx.commit(tx.clj:343)
	at crux.tx$index_tx_log$fn__20371$fn__20376.invoke(tx.clj:440)
	at crux.tx$index_tx_log$fn__20371.invoke(tx.clj:429)
	at crux.tx$index_tx_log.invokeStatic(tx.clj:421)
	at crux.tx$index_tx_log.invoke(tx.clj:419)
	at crux.tx$__GT_polling_tx_consumer$fn__20392.invoke(tx.clj:464)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.lang.Thread.run(Thread.java:834)
	Suppressed: org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException
		at org.apache.lucene.index.IndexWriter$EventQueue.close(IndexWriter.java:369)
		at org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2300)
		at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2267)
		at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1104)
		... 14 more
	Caused by: java.lang.InterruptedException
		at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343)
		at java.base/java.util.concurrent.Semaphore.acquire(Semaphore.java:475)
		at org.apache.lucene.index.IndexWriter$EventQueue.close(IndexWriter.java:367)
		... 17 more

kongeor15:03:43

(when I start the system initially is ok, this happens after the reset)

refset15:03:14

thanks for sharing that detail. It might now be resolved by this change I made anyway https://github.com/juxt/crux/pull/1468

refset15:03:10

if you have 5m spare you could try to test against the latest dev-SNAPSHOT we released yesterday, but no pressure 🙂

kongeor15:03:10

I'll give it a spin 🙂 is changing just the lucene dependency ok?

refset15:03:44

I think that would work yep, but it's safest to change all of the Crux deps

kongeor15:03:53

ok, let me do that then

kongeor15:03:36

no, it's the same

refset15:03:00

damn, thanks for trying though. I'll try to repro this today. Can you .close the node before doing the reset as a workaround?

refset15:03:48

did it give the ~same stack trace btw?

kongeor15:03:01

it didn't crashed

kongeor15:03:42

let me try it a few more times

kongeor15:03:10

I'm closing the node already

kongeor15:03:23

ah! now it's hanging

kongeor15:03:28

when I do the reset 🙂

kongeor15:03:37

this project is open source and it should be relatively easy to bootstrap: https://github.com/kongeor/mext/blob/alpine/src/clj/mext/systems.clj if that saves you some time in order to set things up.

🙏 3
kongeor15:03:44

btw, that's a toy - almost like a scratchpad - project, I just do experimentation on it 🙂

Steven Deobald15:03:56

> disk led constantly flashing That caught me off guard, leading me to realize I haven't actually had a desktop computer in a decade. 😳

🙂 3
R.A. Porter21:03:23

Playing around with crux-sql for the first time today and I’m curious if there’s a way to work around a data problem. The query in the :crux.sql.table/query needs to include all the attributes of the document I want to express out in the :crux.sql.table/columns but if I have some records that don’t have one of the columns, they get excluded from the results. Makes perfect sense. I’m not surprised it would do that, since my datalog query doesn’t match the entity. But nullable data is pretty common in sql-land. I’m admittedly still quite weak at datalog and still feeling my way around with Crux so maybe there’s a simple or clever solution to this. Or maybe not?

refset23:03:56

That's right, the SQL columns require there to be some value in the index under the given attribute, so you would need to explicitly store nil if you want the entity to appear in the table. The nil values are then treated the same as SQL's NULL, see https://github.com/juxt/crux/blob/master/crux-sql/test/crux/calcite_test.clj#L361-L368 This is pretty much the case for modelling with Datalog also, i.e. explicitly storing nils is usually the right strategy. Otherwise certain shapes of queries can require ~exhaustive scanning of indexes

R.A. Porter00:03:36

Makes sense. And workable. Thanks.

nivekuil00:03:01

> This is pretty much the case for modelling with Datalog also, i.e. explicitly storing nils is usually the right strategy. Otherwise certain shapes of queries can require ~exhaustive scanning of indexes (edited) that's surprising, I thought clojure usually doesn't like to model data this way. Could you give an example of a query that would cause scanning?

refset18:03:31

This is fast, when explicit nils are stored, because it can lookup the :att nil combination in the AVE index:

{:find [e]
 :where [[e :att nil]]}
However this version, where nils are in the documents or indexes, has to scan through all e's to look for ones that don't contain :att values
{:find [e]
 :where [[e :crux.db/id]
         (not [e :att])]}

nivekuil12:03:11

ah, right, if you want to explicitly ask questions about the absence of data

nivekuil14:03:03

@U899JBRPF is binding to nil supposed to work? It seems like it is equivalent to not including the clause at all:

(c/q (:app.crux/node integrant.repl.state/system)
                                            '{:find  [?link]
                                              :in    [x]
                                              :where [[?e2 :embed/content x]
                                                      [?e2 :embed/id ?link]]}
                                            nil)
returns empty set, while
(c/q (:app.crux/node integrant.repl.state/system)
                                            '{:find  [?link]
                                              :in    [x]
                                              :where [[?e2 :embed/content nil]
                                                      [?e2 :embed/id ?link]]})
returns everything

nivekuil15:03:55

@U899JBRPF bumping this, not sure if you saw. quick repro:

(c/put node {:crux.db/id 1 :foo nil})
  (c/put node {:crux.db/id 2 :foo 2})
  @(c/q node '{:find  [?e]
               :where [[?e :foo nil]]})

;; #{[2] [1]}

refset15:03:52

thanks for the bump, indeed I missed it! The repro is very appreciated

refset23:03:50

@U797MAJ8M I'm opening an issue for this tomorrow, but essentially nil is being treated like _ in your query. As a workaround you can wrap the nil in a literal set #{nil] and it should work as you were expecting