This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-04-20
Channels
- # announcements (3)
- # babashka (7)
- # beginners (36)
- # calva (71)
- # cider (25)
- # clj-commons (5)
- # cljdoc (19)
- # cljs-dev (5)
- # clojure (223)
- # clojure-austin (2)
- # clojure-bay-area (1)
- # clojure-europe (31)
- # clojure-france (6)
- # clojure-nl (2)
- # clojure-norway (19)
- # clojure-spec (13)
- # clojure-uk (7)
- # clojurescript (127)
- # core-logic (2)
- # cursive (21)
- # datalevin (53)
- # datomic (9)
- # emacs (37)
- # events (1)
- # graphql (8)
- # jobs (12)
- # lsp (8)
- # off-topic (92)
- # pathom (49)
- # pedestal (1)
- # polylith (3)
- # re-frame (25)
- # releases (2)
- # sci (11)
- # shadow-cljs (13)
- # vim (10)
hi, is there a swap!
or swap-vals!
function for datalevin in kv mode?
Can we / does it make sense to add one?
I am not sure what the semantics of swap!
would be a good fit here. A map specific API would be more appropriate?
if I read then update there is a chance another thread might come in an change the data
This is my attempt at an issue for CAS https://github.com/juji-io/datalevin/issues/110 . I hope it makes some sense
Also, an opinion - I find it confusing to have keep both kv api and datalog api in the same ns. Can I work with both on the same datalevin instance?
Yes you can work with both APIs in the same dir
. The default number of DBI (sub-db, or maps) supported is 128 (we can make it configurable in the future). the Datalog DB takes up 10 of them if full-text search is enabled, so the rest are for your own KV DBs. The intention is for users to use both APIs in the same db environment, hence they are all in datalevin.core
Hi for this problem http://www.learndatalogtoday.org/chapter/8 why is this not a valid rule
[[(sequels ?m1 ?m2)
[?m1 :movie/sequel ?m2]]
[(sequels ?m1 ?m3)
(sequels ?m1 ?m2)
(sequels ?m2 ?m3)]]
but this is a valid rule
[[(sequels ?m1 ?m2)
[?m1 :movie/sequel ?m2]]
[(sequels ?m1 ?m2)
[?m :movie/sequel ?m2]
(sequels ?m1 ?m)]]
Both work on the site which i assume runs datomic?Does this work in Datascript? If so, it’s easy to port it over. If not, then there will be more work. Please file an issue if possible.
Nope does not work in datascript. It just hangs. (infinite loop?) Will file an issue. Thanks!
I have a question, too. I got this error while trying to load a 600 MB CSV into Datalevin:
{:type datalevin.ni.Lib$LMDBException
:message MDB_MAP_RESIZED: Database contents grew beyond environment mapsize
:at [datalevin.ni.Lib checkRc Lib.java 630]}
I received that error in Babashka; I tried reading the database from Babashka because my REPL had been trying to ingest that file for ten hours and I wanted to see what the database had in its 21 GB.
After that error, I now get this error:
{:cause Fail to get-value: "MDB_CORRUPTED: Located page was wrong type"
:data {:dbi datalevin/meta, :k :last-modified, :k-type :attr, :v-type :long}
...}
I’ve only ever used toy datasets with Datalevin. Can anyone point me to what I’m doing wrong here? Is a 600 MB file too much for Datalevin/LMDB?You will have to be more specific about what you did. So we can potentially tell you possible cause of your problem. For example, maybe show some code of your “loading a 600MB CSV”? My guess is that you are not doing transactions in large batches. You should. The best would be to create datoms yourself and use init-db
to load them directly, avoiding transactions.
This is the code that reads the CSV and loads it into Datalevin. I am chunking 1k rows at a time. Each row has… ~200 columns, This works when I ingested only ten rows. Actually, I was wrong: it is a 1.2 GB file.
Yes, each row is a map.
To manually create datoms, would I use datalevin.datom/datom
? I can think of how to do that naively.
No need to chunk? d/transact!
can handle an arbitrarily large lazy seq?
I just wasn’t sure how transact!
would react to being given millions of maps.
Do you think removing that chunking would solve this? Or should I still look at init-db
and manually creating datoms?
That’s great. The rest of Datalevin seems so well-built that I should have trusted that transact!
could handle large inputs.
Thank you for your help!
LMDB is great at handling large data sets in big batches, not so great at doing tons of small writes while doing tons of reads at the same time (which is what Datalog transaction does), because it does MVCC but maintains only 2 copies of the DB, so a lot of dirty pages have to be kept around.
@huahaiy I have received two OOM exceptions now — first trying to load the 1.2 GB file, and then another trying to load a 600 MB file. I’ve pasted the 600 MB failure’s stack trace here.
Do you have suggestions about how to load this data? Datalevin can handle this much data, right? Should I try the init-db
approach you suggested, or will that face OOM as well?
data size is not a problem, but each map has 200 keys, that probably is not a usual case. You can init-db, which bypass the transaction logic
If you don’t mind, I have a few more questions for you.
I tried transacting rows one at a time but that was very slow (10k rows/min), and the queries were very slow too (something like 30s when I had 100k rows * 200 fields = 20MM datoms). I will probably have billions of datoms if I get this file loaded; I hope Datalevin can handle that scale.
So I pivoted to manually creating the datoms like you suggested. Wasn’t too hard, except that now I am not getting anything in my database, I see data.mdb
is 100M on disk, which tells me LMDB thinks it should be storing something. And (d/schema conn)
works. But I can’t get any entities out of the database.
Any suggestions of what I’m doing wrong?
@huahaiy No rush for you to respond, just wanted to ping you if you hadn’t seen my latest question ^^.
Probably your code is not adding anything to the DB. Unless you are using transducer, sequence
function is rarely used in Clojure. You need to force the lazy seq, otherwise nothing is probably done.
As to the slowness of query, that’s the current state. It is why I am working on the query engine rewrite.
BTW, all Datalog DBs in the Clojure world are slow like that. In fact, I would venture to say that others will be even slower, since you are talking about billions of datoms and each entity has 200 attributes. My goal of query engine rewrite is to bring the query performance close to that of a relational DB. Basically, relational DB store a row together, whereas in a triple store, they are broken down. Triple store affords maximal flexibility, but demands the query engine to do a lot more work, hence it is much slower. It is a well known problem that I am attempting to solve.
Makes sense about the query time. I ran into that with Crux (XTDB?) when I played with it a while back. It's a bummer because I much prefer Datalog to something like SQL. I'm interested to see what your query engine rewrite can do.
I used sequence
because I was using a transducer with mapcat
, and I assumed conn-from-datoms
would realize the lazy seq for me so I wouldn't need doall
. Perhaps that assumption is wrong. I will play around with it and see what I can do.
0.6.7 is released, fixed the float data type bug and allowed all classes in babashka pods
Hi @huahaiy The latest version 0.6.7 produces this warning. Is this expected? >user=> (require '[datalevin.core :as d]) WARNING: abs already refers to: #'clojure.core/abs in namespace: taoensso.encore, being replaced by: #'taoensso.encore/abs nil
@huahaiy: would something like polilyth make sense for datalevin ?
Also having the ability to build multiple artifacts from the same code base is a nice feature.
I think it does provide benefits. Especially on the clarification of which ns are API interfaces and which are implementation (something that core ns already does - kind of)
That could be used even without adopting polylith.
The gist of it is: have an interface
NS (can be named whatever) that imports the implementations and exports the public API.
An example here https://github.com/furkan3ayraktar/clojure-polylith-realworld-example-app/blob/master/components/database/src/clojure/realworld/database/interface.clj .