Fork me on GitHub
#xtdb
<
2020-05-15
>
Eric Ihli02:05:57

I submitted a put transaction of a map that didn't have a crux.db/id key. Now, further submits throw an IllegalArgumentException "Missing required attribute :crux.db/id". It's like that transaction is still in the queue, but can't be processed because it doesn't have a required attribute, and I can't figure out how to remove it from the queue.

Eric Ihli03:05:24

The problem transaction is stuck in the cache. It's persisted across REPL restarts.

Eric Ihli03:05:26

(get-in adb/conn [:document-store :event-log-object-store :cache])
;; => #object[crux.lru$new_cache$reify__45334 0x3160d9b0 "{UnsafeBuffer{addressOffset=139787965043216, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}=#:person{:id 1, :name \"Sally\", :age 32}, UnsafeBuffer{addressOffset=139787965043248, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}=#:person{:id 2, :name \"Joe\", :age 23}, UnsafeBuffer{addressOffset=139787965368000, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}=#:person{:id 3, :name \"Fred\", :age 11}, UnsafeBuffer{addressOffset=139787965372064, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}=#:person{:id 4, :name \"Bobby\", :age 55}, UnsafeBuffer{addressOffset=139787965372192, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}={:crux.db/id :person-1, :person/id 1, :person/name \"Sally\", :person/age 32}, UnsafeBuffer{addressOffset=139787965372224, capacity=21, byteAr;; => ray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}={:crux.db/id :person-2, :person/id 2, :person/name \"Joe\", :person/age 23}, UnsafeBuffer{addressOffset=139787965372256, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}={:crux.db/id :person-3, :person/id 3, :person/name \"Fred\", :person/age 11}, UnsafeBuffer{addressOffset=139787965372288, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}={:crux.db/id :person-4, :person/id 4, :person/name \"Bobby\", :person/age 55}, UnsafeBuffer{addressOffset=139787965372320, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}={:crux.db/id :foobar, :val 1}}"]

Eric Ihli03:05:28

It's hard to see in that noise, but the problem document is right at the start.

{UnsafeBuffer{addressOffset=139787965043216, capacity=21, byteArray=null, byteBuffer=java.nio.DirectByteBuffer[pos=0 lim=21 cap=21]}=#:person{:id 1, :name \"Sally\", :age 32},

Eric Ihli03:05:47

I can't figure out how to get that value out of the cache.

Eric Ihli03:05:48

Well, I added a clear method to the LRU cache that proxied to Java's HashMap clear. That got it out of the cache. But I'm still getting the same error about the missing id. I guess it's still on disk.

ordnungswidrig07:05:43

Should this be possible at all? (crux/submit-tx @node [[:crux.tx/put {}]])

jarohen08:05:33

yep - can confirm that this shouldn't be possible - we'll put a fix through

Eric Ihli13:05:47

Thanks for clarifying. Any guidance on removing the invalid document from the queue?

jarohen13:05:35

nothing that I can think of using the public API I'm afraid, will have a think on the easiest way :thinking_face:

jarohen13:05:35

how many corrupt documents are we talking? (roughly's fine, interested in whether it's 1-5 vs hundreds)

jarohen13:05:01

right. assuming you've got a handful, I'd do it by hand in a REPL: let's assume you know which documents are corrupt (looks like you're getting them in the error message) - let's use {:person/id 1, :person/name "Sally", :person/age 32} we can get the Crux content-hash for this document with (def doc-hash (crux.codec/new-id {:person/id 1, :person/name "Sally", :person/age 32}))

jarohen13:05:21

we then need to re-submit that doc directly to the doc-store with the :crux.db/id added: (crux.db/submit-docs (:document-store node) {doc-hash {:crux.db/id :sally, :person/id 1, :person/name "Sally", :person/age 32}})

jarohen13:05:39

bouncing the Crux node should then retry to index the problematic transaction

jarohen13:05:49

one potential impact of this workaround would be that the content-hash doesn't strictly match the content. there's not many places we rely on this assumption, but one is match - until you overwrite the document, you may find that match operations don't match

jarohen13:05:54

in fact, yes - I'd be tempted to proactively follow it up with a corrected transaction to avoid this - eg another [:crux.tx/put {:crux.db/id :sally, ...}]

🙏 4
Eric Ihli14:05:15

Rad. :thumbsup: That did it.

👌 4
Eric Ihli19:05:08

I'm forgetting what my "schema" looks like. I submitted some transactions but forgot what attributes I used. What's the best way to do exploratory browsing of data in Crux?

refset19:05:08

There's the attribute-stats API which will show you full list of attributes across all time along with exact cardinalities

👆 4
refset19:05:47

We have an upcoming HTML-based explorer UI baked in to the HTTP API on master which you'd be welcome to test-drive

💯 8
refset19:05:27

...but it's still a week or so away from seeing prime time!

Lu19:05:49

@ericihli One way would be to add :full-results? true as one of the clauses

Lu19:05:04

Specifically:

{:find [e]
   :where [[e :crux.db/id _]]
   :full-results? true}
This will return all your documents in full :)

refset19:05:46

Kind of, but only the documents that are valid at the given tx+valid times!

Lu19:05:03

Good point! 👌:skin-tone-3:

Eric Ihli21:05:51

Big thanks to the community here. There's not nearly as much reference material for this kind of database and it's so different from SQL that it's a bit overwhelming to pick up.

🙂 4
refset21:05:12

Yes, there is never enough documentation, but it's great to have some good questions though!

Eric Ihli21:05:02

Something I'm just now thinking about is how to handle what is typically handled by cascading deletes in SQL. I create two users, Fred and Bob, and I make an association Fred has a friend, Bob. Then I delete Bob. Then I want to list all of Fred's non-deleted friends. 1. How is this typically handled? 2. The following query apparently doesn't work since ?friend can't be evaluated inside ~(keyword ,,, (name ?friend)). Is there a way to use a ?unification variable inside an evaluation like I'm trying to do?

`{:find [?friend-entity]
     :where [[?assoc :friends/from :fred]
             [?assoc :friends/to ?friend]
             [?friend-entity :crux.db/id ~(keyword "friends" (name ?friend))]]}))
(crux/submit-tx
   adb/conn
   [[:crux.tx/put
     {:crux.db/id :friends/fred->bob
      :friends/from :fred
      :friends/to :bob}]
    [:crux.tx/put
     {:crux.db/id :users/fred}]
    [:crux.tx/put
     {:crux.db/id :users/bob}]])

  (crux/submit-tx
   adb/conn
   [[:crux.tx/delete :users/bob]])

  (crux/q
   (crux/db adb/conn)
   `{:find [?friend-entity]
     :where [[?assoc :friends/from :fred]
             [?assoc :friends/to ?friend]
             [?friend-entity :crux.db/id ~(keyword "friends" (name ?friend))]]}))

refset21:05:10

I think you should be able to handle it as simply as this (haven't tested):

(crux/q
   (crux/db adb/conn)
   `{:find [?friend-entity]
     :where [[?assoc :friends/from :fred]
             [?assoc :friends/to ?friend]
             [?friend :crux.db/id ?friend]]}))

Eric Ihli22:05:59

We'll, the value that :friends/to points to is :fred, but the crux.db/id of "Fred" is :users/fred.

Eric Ihli22:05:23

Of course I could change it so the association points to the actual Crux.db/id, but still curious if it's possible to dynamically build the unification variable.

refset22:05:29

Ahhhh, my apologies, I didn't study your example hard enough. I think you can do it by introducing intermediate vars like so (again, untested):

(crux/q
   (crux/db adb/conn)
   `{:find [?friend-entity]
     :where [[?assoc :friends/from :fred]
             [?assoc :friends/to ?friend]
             [?friend-entity :crux.db/id ?kw]
             [(keyword "friends" ?name) ?kw]
             [(name ?friend) ?name]]}))

refset22:05:53

you could also create a custom clojure predicate for something more convenient, like:

[(some.namespace/path+name-to-keyword "friends" ?friend) ?kw]

jarohen06:05:16

> We'll, the value that :friends/to points to is :fred, but the :crux.db/id of "Fred" is :users/fred. tbh I'd address this first - if you want a reference between two entities it'll save a fair bit of effort if you can make sure their ids are the same

refset07:05:51

^ yes, I think I would generally agree, as otherwise there will be a lot of "scanning" going on, making little efficient use of the indexes Also, I think the ns string arg for the keyword fn (both in the original example and my suggestions) needs to be "users"

dvingo22:05:56

i'm pretty sure you can whatever you want using a function clause, this is from the tests file:

{:find [e]
 :where 
  [[e :last-name last-name]
  [(= "Ivanov" last-name)]

4