This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-04-26
Channels
- # announcements (1)
- # atom-editor (7)
- # babashka (9)
- # beginners (46)
- # cider (1)
- # circleci (2)
- # clj-on-windows (1)
- # cljdoc (5)
- # cljsrn (2)
- # clojure (25)
- # clojure-austin (8)
- # clojure-brasil (4)
- # clojure-europe (52)
- # clojure-nl (1)
- # clojure-norway (162)
- # clojure-uk (2)
- # cursive (3)
- # datalevin (134)
- # datomic (16)
- # defnpodcast (8)
- # graphql (9)
- # honeysql (5)
- # hoplon (26)
- # hyperfiddle (18)
- # introduce-yourself (1)
- # lsp (4)
- # malli (19)
- # nbb (16)
- # nrepl (1)
- # practicalli (3)
- # releases (3)
- # shadow-cljs (36)
- # tools-deps (7)
- # vim (2)
- # xtdb (9)
that is statefull though and might not always work. I don't always control who calls entrySet
that’s why you don’t use range-seq, entrySet will return the whole thing in one go,that’s the contract
if I have functions to get first key from db, and get next key after a key than this could be solved in a cleaner way IMO
you are talking about map API, which is not stateful. The stateful thing are already implemented, what you do will not be different, you are not buying anything by doing your own
I know get range can get keys using some filters
> range-type
can be one of :all
, :at-least
, :at-most
, :closed
, :closed-open
, :greater-than
, :less-than
don’t think that’s somethong you should be concerned about, you either use it, or not, there’s no point trying to implement it
I will see if I can get something that for a given key "a" I can get the next key in DB after it.
You are not doing anything useful by doing that. If you want an iterator, it is in there
what you describe is an iterator, which itself requires state that is independent from data
(defn entry-set-iterator
"Implemnts ^java.util.Iterator for use in entrySet implementation.
"
[db dbi & _opts]
(let [state (atom {:current-key nil})]
(reify java.util.Iterator
(hasNext [_this]
(throw (UnsupportedOperationException. "Not implemented")))
(next [_this]
(throw (UnsupportedOperationException. "Not implemented")))
(remove [_this]
(throw (UnsupportedOperationException. "Not implemented"))))))
you are doing unnecessary work, because we already have an iterator, you can just use it
Of course, you can layer things on top of things, but you gain nothing but lose performance. I don’t see why we you want to do that
ok, for some context. I pass my map implementation to a function that uses this code:
(doseq [x coll]
(.write wr (print* x))
(.write wr (int \newline)))
the above code called entrySet on my map and failed.
If I can provide entrySet it will work out of the box and iterate over collection.I understand fro you that I can use: • get-range - loads everything in memory and spills to disk if OOM • range-seq - opens a transaction that needs to be closed
I mean, you say you have big data, how implement your own iterator can avoid this? You still got big data.
yes, but I don't need to keep a transaction open if I open a transaction only when next() is called
sure, you can do that. it is doable, the performance will suffer, but it’s a trade off
it should be ok, you don’t have to read the buffer, when you do, you don’t have to read the whole buffer
you can do whatever with the buffer in fact. you can write the buffer as one data type, but read it as another, or partially. I do these in writing the search engine, for example.
i would still not deal with these low level things for this purpose though, I would use get-some, etc.
the purpose of with-transaction-kv is to make sure reads during write see what’s just being written
when you use with-transaction-kv, you are blocking write, which is not user would expect
btw, I did find guava-testlib that has unit tests for collection implementation. Planning to use those to check the implementation. Added a comment on issue with links
I wouldn’t think it is user friendly for a java.util.map to be blocking writes to the whole DB.
btw, for > 0.11.0 A new Datalog query engine with improved performance. I have an idea to implement datalog over Apache Calcite relational algebra. I think that might be a better query engine.
I did a small lib that allows you to use calcite from Clojure https://github.com/ieugen/calcite-clj
I believe it is possible to have datalog over relational algebra : https://calcite.apache.org/docs/algebra.html
Well, I tend to not believe marketing words. I think these things do need innovation from research, which is what I intend to do.
These things are not easily improved without doing innovative work, which most open source projects do not
I came from IBM Research - Almaden, I worked in the same floor as people who invented SQL
my intention is to bring datalog query processing performance to be on par with RDBMS, which is doable.
clojure is good for experimentation. If the experiment is successful, it is possible to write it in a lower level language like rust, zig, or whatever, that’s my long term plan
the benefit of triple store is the unrolled storage, which makes the most difficult part of query optimization easy, that is cardinality estimation. This is what I see as the breakthrough.
ok, calcite has an SQL parser. it translates the SQL query to relational algebra that it uses to run the query. there could be a Datalog -> relational algebra conversion
final FrameworkConfig config;
final RelBuilder builder = RelBuilder.create(config);
final RelNode node = builder
.scan("EMP")
.build();
System.out.println(RelOptUtil.toString(node));
> LogicalTableScan(table=[[scott, EMP]])anyway, that’s why i am doing this, so database is so central to programming. a litle improvement will have huge impact.
I would like to expore datalog on top of calcite relational algebra when I get to that part
another vision is to bring most of data access into an integrated story for application development.