This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-02-01
Channels
- # announcements (11)
- # babashka (71)
- # beginners (34)
- # calva (25)
- # chlorine-clover (38)
- # cider (13)
- # clj-kondo (1)
- # cljsrn (2)
- # clojure (40)
- # clojure-australia (4)
- # clojure-europe (16)
- # clojure-france (3)
- # clojure-nl (4)
- # clojure-uk (16)
- # clojurescript (27)
- # conjure (2)
- # core-async (41)
- # core-logic (3)
- # cursive (1)
- # data-science (1)
- # datomic (16)
- # depstar (19)
- # emacs (7)
- # fulcro (33)
- # graalvm (4)
- # honeysql (20)
- # hugsql (4)
- # jobs (1)
- # juxt (4)
- # off-topic (48)
- # pathom (41)
- # reagent (9)
- # reitit (19)
- # remote-jobs (1)
- # shadow-cljs (20)
- # startup-in-a-month (2)
- # tools-deps (29)
- # vim (3)
- # xtdb (30)
has anyone else run into this segfault? has happened a few times now with rocksdb (6.12.7). bumping it to see if a newer version is any better
# A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f707d067bd0, pid=1455309, tid=2007832 # # JRE version: OpenJDK Runtime Environment (15.0.1+9) (build 15.0.1+9) # Java VM: OpenJDK 64-Bit Server VM (15.0.1+9, mixed mode, tiered, z gc, linux-amd64) # Problematic frame: # C [librocksdbjni-linux64.so+0x2afbd0] rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImp lOptions&)+0x5f0
nope, I can actually repro it on 6.15.2 reliably it seems. It really doesn't like a project-many
being made
Hey @U797MAJ8M, sounds like something's not quite right 😞 Would you able to submit a repro?
I can try -- given the nature of these things it might be hard to do without packaging up my whole system. Are you interested in a coredump? not sure how JNI works wrt debug symbols etc. In general I'm pretty sure I am triggering it with pathom making a bunch of crux/entity and now crux/project-many calls, which I guess hammers the index/doc/tx store (rocks is being used as all 3 locally) quite hard
one open-db for a HTTP request, that gets passed to the aforementioned pathom resolvers which then make a bunch of crux calls to build the response
are those calls in serial for any one opened DB, or are you calling the same DB instance from multiple threads?
good question, I would have assumed that a single request is handled entirely by one thread (server is aleph) but I am not actually sure what promesa's default execution model looks like or what pathom3 is doing behind the scenes with it
> Are you interested in a coredump? not sure how JNI works wrt debug symbols etc.
There should be a file dumped by the JVM when it exits that way - hs_err_pid_<>.log
?
yup, I have 6 of them. <mailto:[email protected]|[email protected]>?
@U050V1N74: actually, ignore 7141 -- that one seems to be from a long time ago, got swallowed up by the glob
At first glance, it looks like each of the segfaults is at one of the query's first calls into RocksDB, so I'd hypothesise that it's unrelated to project-many
, and would happen on any query at that point
We've seen this happen when trying to make accesses on closed DBs - could this be a possibility? Often when we've done a with-open
that returns (and hence closes its resources) immediately, while its work continues on a different thread.
Unfortunately Rocks isn't very friendly when this happens - AFAIK there's not a lot Crux can do once this seg-fault is thrown
yup, I think this is definitely the case.. my big refactor moved things around to close the db before the request was fully handled!
if I can add one point of feedback, the docstring mentions that open-db must be closed but doesn't mention the consequences of using a closed one. This is of course a bad idea but maybe a mention of the consequences for doing so would help make this connection faster
Just wanted to ask about this issue: https://github.com/juxt/crux/issues/1399 I’m hitting the same thing. Even after a complete index/log/doc wipe Crux crashes after a few writes with:
"Lucene store latest tx mismatch"
On the latest version:
juxt/crux-lucene {:mvn/version "21.01-1.14.0-alpha"}
And my crux config:
(let [kv-store (fn [dir] {:crux/module 'crux.rocksdb/->kv-store
:db-dir (io/file (str "data/crux-db/" dir))})
node (crux/start-node
(merge
{:crux.lucene/lucene-store {:db-dir "data/crux-db/lucene"}
:crux.http-server/server {:port 4000}
:crux/index-store {:kv-store (kv-store "index")}
:rocksdb-golden (kv-store "tx-log-and-doc")
:crux/document-store {:kv-store :rocksdb-golden}
:crux/tx-log {:kv-store :rocksdb-golden}})))]
(log/info "DB Started")
;; Synchronize the node
(crux/sync node)
;; Register our transactions
(register-transactions node)
;; Return the node
node)
Has any of you seen this before?@aleksander990 It does appear there is probably a bug in the current Lucene module for old stores. However, when I filed that bug I was able to get around it in dev by cleaning up tx/doc/index stores + lucene store, then restarting my service. When you do this:
> Even after a complete index/log/doc wipe Crux crashes after a few writes
...are you also wiping the lucene-store
?
Yeah I did wipe all of them. But It’s been working for the last 30 minutes or so. I’ll keep an eye on it over the next few days though
Hmm. Curious. If you do find you get a rogue tx mismatch
after wiping all the stores, that's a separate (though perhaps related) bug from #1399 so it would be good to capture it as well.
Hi @aleksander990 this is fixed in https://github.com/juxt/crux/pull/1406, we will get a release out soon that addresses this.