Fork me on GitHub

has anyone else run into this segfault? has happened a few times now with rocksdb (6.12.7). bumping it to see if a newer version is any better

# A fatal error has been detected by the Java Runtime Environment: # #  SIGSEGV (0xb) at pc=0x00007f707d067bd0, pid=1455309, tid=2007832 # # JRE version: OpenJDK Runtime Environment (15.0.1+9) (build 15.0.1+9) # Java VM: OpenJDK 64-Bit Server VM (15.0.1+9, mixed mode, tiered, z gc, linux-amd64) # Problematic frame: # C  []  rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImp lOptions&)+0x5f0


I'm guessing it is correlated with a burst of (crux/entity) calls


nope, I can actually repro it on 6.15.2 reliably it seems. It really doesn't like a project-many being made


Hey @U797MAJ8M, sounds like something's not quite right 😞 Would you able to submit a repro?


I can try -- given the nature of these things it might be hard to do without packaging up my whole system. Are you interested in a coredump? not sure how JNI works wrt debug symbols etc. In general I'm pretty sure I am triggering it with pathom making a bunch of crux/entity and now crux/project-many calls, which I guess hammers the index/doc/tx store (rocks is being used as all 3 locally) quite hard


Are you using open-q or open-db?


oh, yes I am actually.


one open-db for a HTTP request, that gets passed to the aforementioned pathom resolvers which then make a bunch of crux calls to build the response


are those calls in serial for any one opened DB, or are you calling the same DB instance from multiple threads?


good question, I would have assumed that a single request is handled entirely by one thread (server is aleph) but I am not actually sure what promesa's default execution model looks like or what pathom3 is doing behind the scenes with it


> Are you interested in a coredump? not sure how JNI works wrt debug symbols etc. There should be a file dumped by the JVM when it exits that way - hs_err_pid_<>.log ?


cheers! 🙏


@U050V1N74: actually, ignore 7141 -- that one seems to be from a long time ago, got swallowed up by the glob


thanks, received 🙂


At first glance, it looks like each of the segfaults is at one of the query's first calls into RocksDB, so I'd hypothesise that it's unrelated to project-many, and would happen on any query at that point We've seen this happen when trying to make accesses on closed DBs - could this be a possibility? Often when we've done a with-open that returns (and hence closes its resources) immediately, while its work continues on a different thread.


Unfortunately Rocks isn't very friendly when this happens - AFAIK there's not a lot Crux can do once this seg-fault is thrown


yup, I think this is definitely the case.. my big refactor moved things around to close the db before the request was fully handled!


apologies for the noise! and project-many works great as well


thanks 🙏 and no worries about the noise, keep them coming! 🙂


if I can add one point of feedback, the docstring mentions that open-db must be closed but doesn't mention the consequences of using a closed one. This is of course a bad idea but maybe a mention of the consequences for doing so would help make this connection faster


absolutely - we're on it 🙂

Aleksander Rendtslev18:02:50

Just wanted to ask about this issue: I’m hitting the same thing. Even after a complete index/log/doc wipe Crux crashes after a few writes with:

"Lucene store latest tx mismatch"
On the latest version:
juxt/crux-lucene      {:mvn/version "21.01-1.14.0-alpha"}
And my crux config:
(let [kv-store (fn [dir]  {:crux/module 'crux.rocksdb/->kv-store
                             :db-dir      (io/file (str  "data/crux-db/" dir))})
        node     (crux/start-node
                     {:crux.lucene/lucene-store {:db-dir "data/crux-db/lucene"}
                      :crux.http-server/server  {:port 4000}
                      :crux/index-store         {:kv-store (kv-store "index")}
                      :rocksdb-golden      (kv-store "tx-log-and-doc")
                                    :crux/document-store {:kv-store :rocksdb-golden}
                                    :crux/tx-log         {:kv-store :rocksdb-golden}})))]

    (log/info "DB Started")
    ;; Synchronize the node
    (crux/sync node)
    ;; Register our transactions
    (register-transactions node)
    ;; Return the node
Has any of you seen this before?


Thanks @aleksander990 We are looking into it

🙌 3
Steven Deobald19:02:14

@aleksander990 It does appear there is probably a bug in the current Lucene module for old stores. However, when I filed that bug I was able to get around it in dev by cleaning up tx/doc/index stores + lucene store, then restarting my service. When you do this: > Even after a complete index/log/doc wipe Crux crashes after a few writes ...are you also wiping the lucene-store?

Aleksander Rendtslev20:02:33

Yeah I did wipe all of them. But It’s been working for the last 30 minutes or so. I’ll keep an eye on it over the next few days though

Steven Deobald21:02:10

Hmm. Curious. If you do find you get a rogue tx mismatch after wiping all the stores, that's a separate (though perhaps related) bug from #1399 so it would be good to capture it as well.


Hi @aleksander990 this is fixed in, we will get a release out soon that addresses this.


thanks for reporting.