So I’ve encountered this weird issue where if I run a query on an empty database before running a transaction it makes subsequent queries return nil for that attribute. This should work, but doesn’t.
(comment
(def schema
{:transaction/signature
{:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one}
:transaction/block-time
{:db/valueType :db.type/long
:db/cardinality :db.cardinality/one}})
(def conn
(d/get-conn "db1" schema
{:validate-data? true
:closed-schema? true
:auto-entity-time? true}))
(d/q '[:find [?block-time ?signature]
:where
[?t :transaction/signature ?signature]
[?t :transaction/block-time ?block-time]]
@conn)
(d/transact! conn [{:transaction/signature "foo"
:transaction/block-time 234324324}])
(d/q '[:find [(max ?bt)]
:where
[?t :transaction/block-time ?bt]]
@conn)
;; => nil
)
This does work.
(comment
(def schema
{:transaction/signature
{:db/unique :db.unique/identity
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one}
:transaction/block-time
{:db/valueType :db.type/long
:db/cardinality :db.cardinality/one}})
(def conn
(d/get-conn "db2" schema
{:validate-data? true
:closed-schema? true
:auto-entity-time? true}))
(d/transact! conn [{:transaction/signature "foo"
:transaction/block-time 234324324}])
(d/q '[:find [?block-time ?signature]
:where
[?t :transaction/signature ?signature]
[?t :transaction/block-time ?block-time]]
@conn)
;; => [234324324 "foo"]
(d/q '[:find [(max ?bt)]
:where
[?t :transaction/block-time ?bt]]
@conn)
;; => [234324324]
)
Struggling to understand what the issue could be. Any pointers?So looks like this issue was introduce in 0.9.9 I’ll try and narrow down the commit
So as far as I can tell it’s this commit that introduces the issue
https://github.com/juji-io/datalevin/commit/720a7d79b8800e58fb6136c76698b102ec77afdc Going to take a bash at fixing this.
So this seems to be caused by this line: https://github.com/juji-io/datalevin/blob/master/src/datalevin/query.clj#L1468
sort-by (fn [[_ attr _]] (db/-count db [nil attr nil]))
My guess is db/-count is stateful and is now getting called in the reduce and the sort, where as before it was just being called in the reduce.Looks like a caching problem, will investigate when I have time
Thanks for narrowing it down.
Awesome! I’ve opened a PR that reverts the sort with all the detail (short term fix). Feel free to merge it or close it when you work out the caching issue. I tried to delve into the cache stuff but that is going to take me a fair bit longer to get to grips with. https://github.com/juji-io/datalevin/pull/269 Thanks again for the awesome project. We’ve recently started using it in production in embedded mode and it’s an absolute joy. No connection pool, no SQL query builder, no db management.
We have a topology of a few VMs connected (`d/get-conn...`) to a dtlv server. Version is 0.9.8 . Every so often the clients start throwing The client is disconnected when querying/pulling/transacting.
The current design has get-conn being long-lived on the client. Is that a safe approach? Should we reconnect per request or should 0.9.8 reconnect automatically on demand?
Client is supposed to reconnect automatically.
Recommend to run JVM version as the server. Native version cannot really handle many concurrent connections well, as the community version of GraalVM has only serial GC.
If you could reproduce this somehow, it would be awesome.
FYI @huahaiy. The db files are corrupt themselves. I am trying to find something creative here 🙂
Can you read the db files and export them as txt file?
dtlv dump -d . -a is working
I mean, as datalog
no. -g throws the same nippy thaw exception of when I try to open the db
If only :datalevin/opts dbi is bad, you can dump the other dbi, and import them into a new dir
An empty opts dbi is probably okay
These are the ones I have:
#{"datalevin/meta"
"datalevin/opts"
"datalevin/schema"
"datalevin/giants"
"datalevin/ave"
"datalevin/vae"
"datalevin/eav"}
I would dump all of them but /opts?Yeah
They all throw the same exception
You said dump is working
dump -a works no problem... is dump -g that throws
You can dump individual dbi
This is what I am trying. Let me know where I am getting it wrong: dtlv -d . -g dump datalevin/schemas
Then load each into the same db file
No
Dump them as kv
-g is for datalog, which is file scoped
Kv is dbi scoped
to dump as kv, should I use -a instead then? I am not familiar with kv stores
Or dump them with -a, and edit the file, remove the opts content
for instance.... I am uncertain what I would do with the exported file from -a
Read the docs
Import them
My confusion is that I am not using the kv-store options and just datalog instead.
The docs for dtlv only show options for -g (which is datalog and what I am using). There are no references to kv (or why it would be relevant in this particular case).
I am happy to improve the docs once I wrap my head around this.
Datalog store is on top of kv store
Everything is stored in a kv store
What’s confusing about it?
Got you. So, for dtlv dump and dtlv load the absence of -g means just kv store, correct?
correct
Perfect! Thank you. This is a crucial detail that completely escaped me.
Each dbi is a kv map
The datalog store is implemented with the 7 dbi you listed above
Only opts is using nippy encoding completely
The hope is that only that dbi is damaged
That's reasonable and makes total sense. And opts should be ok recycling from an empty db, right? That's your hypothesis
Right
If others are also damaged, it would not be able to recover
Once I am done with this, I will open a PR with a few reminders for the dtlv docs page for your consideration. Keep up the good work! What you are up to is fantastic!
When using -g to dump, only opts, schema, eav and maybe giants dbi are used.
Your hypothesis is sound as it's throwing from load-opts from storage/open
But maybe others are also damaged, opts are read first
Anyway, this is something I will need to look into and test for
I will gladly put a reproducible report once I find a way to consistently reproduce it
Thanks
Thank YOU!
You were totally right @huahaiy. Dumping took a long time (~20 min for 64MB worth of data). Loading back onto a new db seemed to be very slow - left it running for 2h and gave up.
Ultimately I ran the following on a copy of the db: dtlv -d . exec '(let [db (open-kv ".")] (drop-dbi db "datalevin/opts") (list-dbis db))' and when I reconnected, opts was recreated successfully.
A slightly uncommon thing I have on this app is a :db.type/bytes node that receives a lot of reads with concurrent writes. Is that also possibly an area of concern?
It seems to be when the server throws this:
[info] Exception in thread "main" clojure.lang.ExceptionInfo: Error creating server:"Fail to get-range: #error {\n :cause \"No reader provided for custom type id: -63\"\n :data {:type-id -63, :prefixed? nil}\n :via\n [{:type clojure.lang.ExceptionInfo\n :message \"Thaw failed against type-id: -63\"\n :data {:type-id -63}\n :at [taoensso.nippy$thaw_from_in_BANG_ invokeStatic \"nippy.clj\" 1722]}\n {:type clojure.lang.ExceptionInfo\n :message \"No reader provided for custom type id: -63\"\n :data {:type-id -63, :prefixed? nil}\n :at [taoensso.nippy$read_custom_BANG_ invokeStatic \"nippy.clj\" 1390]}]\n :trace\n [[taoensso.nippy$read_custom_BANG_ invokeStatic \"nippy.clj\" 1390]\n [taoensso.nippy$thaw_from_in_BANG_ invokeStatic \"nippy.clj\" 1722]\n [taoensso.nippy$fast_thaw invokeStatic \"nippy.clj\" 1781]\n [datalevin.bits$deserialize invokeStatic \"bits.clj\" 98]\n [datalevin.bits$get_data invokeStatic \"bits.clj\" 179]\n [datalevin.bits$read_buffer invokeStatic \"bits.clj\" 994]\n [datalevin.scan$get_range invokeStatic \"scan.clj\" 79]\n [datalevin.binding.java.LMDB get_range \"java.clj\" 545]\n [datalevin.binding.java.LMDB get_range \"java.clj\" 821]\n [datalevin.storage$load_opts invokeStatic \"storage.clj\" 1188]\n [datalevin.storage$open invokeStatic \"storage.clj\" 1265]\n [datalevin.storage$open invoke \"storage.clj\" 1255]\n [datalevin.storage$open invokeStatic \"storage.clj\" 1262]\n [datalevin.storage$open invoke \"storage.clj\" 1255]\n [datalevin.storage$open invokeStatic \"storage.clj\" 1260]\n [datalevin.server$open_store invokeStatic \"server.clj\" 913]\n [datalevin.server$reopen_dbs invokeStatic \"server.clj\" 1001]\n [datalevin.server$create invokeStatic \"server.clj\" 2388]\n [datalevin.main$_main invokeStatic \"main.clj\" 527]\n [datalevin.main$_main doInvoke \"main.clj\" 513]\n [clojure.lang.RestFn applyTo \"RestFn.java\" 137]\n [datalevin.main main nil -1]]}" {}Actually, this looks like our DB got corrupt. Any thoughts why @huahaiy?
Don’t think so. You won’t be able to start if a db is corrupt
Yes. That's what looks like happened
Any idea why it would get corrupt? Or how to fix it? 🙂
Prior to 0.9, a db may be corrupt if the program is killed. Shouldn’t happen after 0.9.0. So I don’t think it is the issue
I am running 0.9.8
I saw your thread about kv-opts prior to 0.9
You have not described what you are doing.
This is a java server running datalevin in server mode with a few clients connected to it. Clients run a pretty standard web app that writes (transacts) and reads (queries and pulls) from this server.
Every few days this server throws this error. Last few times we restarted with a fresh DB.
This error happens when reopen DB, and specifically when reading options
I will need to know the history of these dbs, are they upgraded from older versions, etc
They are fresh DBs. Never upgraded. We have a sibling environment where this has never happened. The difference between these two environments is the amount of concurrent clients. The environment where it never crashed has just one client for one server (both running java on different vms.) This environment that crashes every few days has one server and three clients (all on separate vms). Both connect to the server through the wire.
Ok, sounds like a concurrency problem, it seems that we need to lock the system db when reconnecting
Clients do fail and get restarted. They are behind an auto-scaler as well so I did have peaks of 13 client vms.
If it's relevant
If you could file an issue, it would be great
In terms of restoring this db, what are my options here? I can't connect to it but can dtlv copy
Sounds like only the sever system db is having problems
You can create a new server instance with a new root directory, recreate dbs, shut down server, and copy the old db files over
System db only needs the db names
Awesome! I will try. Thank you very much 🙇
Also, using the latest version helps, as it does improve concurrent reading during transaction