This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-09-23
Channels
- # announcements (8)
- # babashka (12)
- # babashka-sci-dev (6)
- # beginners (62)
- # biff (5)
- # calva (4)
- # cider (2)
- # clj-commons (8)
- # clj-kondo (17)
- # clj-yaml (40)
- # clojars (3)
- # clojure (117)
- # clojure-europe (122)
- # clojure-nl (5)
- # clojure-norway (20)
- # clojurescript (10)
- # consulting (1)
- # datomic (65)
- # events (15)
- # figwheel (1)
- # fulcro (4)
- # lsp (15)
- # mount (15)
- # music (1)
- # off-topic (53)
- # polylith (12)
- # releases (3)
- # shadow-cljs (13)
- # sql (1)
- # test-check (8)
- # xtdb (31)
Latest datomic release has ‘High’ CVEs for h2 database dependency, any solutions to this ? it’s a pretty significant issue to deploying it at my company
apparently i can just upgrade the h2 database dependency to latest version , so long as i’m using postgresql driver, didn’t think it’d be that easy reading through chat logs
Note that the difficulty is that h2 itself is not compatible with its own db files across the major releases. I suspect this is why datomic has not bumped it: suddenly no one would be able to open their existing dev dbs
There’s also some API compatibility/ class errors if you try to use h2database v2.x with a dev/mem instance of datomic too from what i saw, i’m still using v1.x on dev builds
Hey there!
I have a development Transactor (1.0.6397) running in a docker container with the datomic dev protocol. I’ve set the transactor and peer passwords, and set it to allow remote connections.
When the docker container starts, it creates the data folder with the h2 database.
I try to connect from my repl (same datomic version) with connection string ..?password=[the-password]
, but I just get this error, no matter what I try:
1. Caused by org.h2.jdbc.JdbcSQLException
Wrong user name or password [28000-171]
SessionRemote.java: 568 org.h2.engine.SessionRemote/done
...
JdbcConnection.java: 109 org.h2.jdbc.JdbcConnection/<init>
JdbcConnection.java: 93 org.h2.jdbc.JdbcConnection/<init>
Driver.java: 72 org.h2.Driver/connect
PooledConnection.java: 266
...
sql.clj: 16 datomic.sql/connect
Any help is appreciated!!It sounds like you're doing things right. To double check, you have set two passwords in the transactor properties?
storage-admin-password=admin
storage-datomic-password=use-this-password
storage-access=remote
And you're using the datomic one in your connection string?
https://docs.datomic.com/on-prem/configuration/configuring-embedded-storage.htmlyeah, I’m doing it exactly the same. The passwords you wrote above, are they default passwords, or just random? This is from my config file:
storage-admin-password=pwd
storage-datomic-password=pwd
storage-access=remote
Just random passwords. It looks like the connections are working, but you are exposing both the transactor and dev storage ports from your container? e.g. defaults are transactor on 4334 and H2 on 4335 (usually +1 from transactor port).
Alright. Yes, I’m starting the container like this:
docker run -p 4334-4336:4334-4336 transactor-dev:latest
Looking in the logs, it also seems to start fine, this is currently the last entry:
2022-09-23 20:52:41.642 INFO default datomic.lifecycle - {:tid 25, :username "asdfasdf", :port 4334, :rev 59, :host "0.0.0.0", :pid 17, :event :transactor/heartbeat, :version "1.0.6397", :timestamp 1663966361621, :encrypt-channel true}
OHH!! It’s working now! Initially I set different passwords. Then I changed to pwd
and pwd
, and then changed other things (that presumably was wrong).
Then I tried to change the passwords back to being different - and now it works! 😄
So my conclusion is - the 2 admin/datomic passwords have to be different. This could be documented.
Thanks for your time @U02EP7NKPAL!
Is there ever a reason to prefer q
over qseq
? seems like qseq is strictly better (more powerful/expressive) in every way (zero loss in expressive power)
datomic.api, datomic.client.api, and datomic.client.api.async all seem to have slightly different versions of q
and qseq
(they all are documented to accept some subset of query-list, query-map, query-string).
One thing I am aware of - from a strict expressiveness view - is the query-map does not support returning a collection or scalar value in the find spec (i.e. :find [?a ...]
or :find ?a .
)
I do wonder if the internal implementations are different as to have different performance characteristics in greedy queries (e.g. when returning just a scalar or computing aggregates).
im also curious about the answer to the q
vs qseq
question.
and i also miss the scalar find spec a lot...
makes me wonder that im missing something...
the datalog query would be so nice and declarative, but it
the lack of these scalar find specs are like a fly in the soup. they are just so useful, so often. especially during interactive repl work.
to remedy the situation, i was considering to write some https://github.com/thunknyc/richelieu advice around q
& qseq
, which would rewrite the datalog query and do the necessary post-processing on the result.
im already advising d/transact
, d/with
& d/pull
to convert back and forth between java.util.Date
& java.time.Instant
, using [tick.core :as t]
:
(defn maybe-instant->inst [maybe-convertable-to-inst]
(if (or (t/instant? maybe-convertable-to-inst)
(t/zoned-date-time? maybe-convertable-to-inst)
(t/offset-date-time? maybe-convertable-to-inst))
(t/inst maybe-convertable-to-inst)
maybe-convertable-to-inst))
(defadvice ^:private transact-instants
"Replace java.time.Instants with Clojure instants (which are java.util.Date)
before transacting."
[transact conn arg-map]
(-> arg-map
(update :tx-data (partial walk/postwalk maybe-instant->inst))
(->> (transact conn))))
(defonce _transact-instants (advise-var #'d/transact #'transact-instants))
(defonce _with-instants (advise-var #'d/with #'transact-instants))
(defadvice ^:private transact-throw-txd
"Like d/transact, but attaches the tx-arg to its exceptions."
[transact conn arg-map]
(try (transact conn arg-map)
(catch Exception ex
(-> "Transaction failed"
(ex-info arg-map ex)
throw))))
(comment
(advise-var #'d/transact #'transact-throw-txd)
)
(defn- ^:deprecated maybe-inst->instant [i] (if (inst? i) (t/instant i) i))
(defadvice ^:private pull-instants
([pull db arg-map]
(->> (pull db arg-map)
(walk/postwalk maybe-inst->instant)))
([pull db selector eid]
(->> (pull db selector eid)
(walk/postwalk maybe-inst->instant))))
(defonce _pull-instants (advise-var #'d/pull #'pull-instants))
Is there now someone sufficiently knowledgeable here to provide an answer to this question?
> Is there ever a reason to prefer q
over qseq
?
>
> seems like qseq
is strictly better (more powerful/expressive) in every way (zero loss in expressive power)
I find https://docs.datomic.com/cloud/query/query-executing.html#qseq a bit lacking, and searching past discussions in this forum for qseq
seem to reflect that.
To observe differences, I put in place an A-B test in our codebase that blindly runs and compares all our q
calls to qseq
.
For now I observed two functional differences:
1. As stated, the fact that qseq
returns a seq implies that empty results are not []
like q
, but nil
instead. So I wrapped qseq in e.g. (or (qseq...) [])
to not affect the codebase that depends on results always being vectorized (in some areas), and continued observing.
2. I observed one case where a query returns results in a different order.
And some qualitative differences:
• Time: results get back significantly faster, as shown in screenshots.
• I didn't yet observe if our app's usage of these results is slowed down by lazy realization or not, though. I suppose I should.
• Didn't see an impact in CloudWatch metrics of the Datomic Cloud servers, but I didn't activate the detailed metrics to see memory. I suppose I should. :)
Do you mean that by simply accessing the returned results, it might actually sometimes trigger further queries??
Wow, curious... nav & datafy
are probably in play here?
Haha ok. So you seam to mean that when pull is executed, it is (only) then that a further automatic query could will happen.
It’s not the IO in the datalog query (the where clauses and the final result set). That is always finished when q or qseq returns
But if the find has pulls in it, q will evaluate then eagerly but qseq will delay to the time the entry is accessed
The data to satisfy the pulls is not guaranteed to be loaded, so you may incur additional IO
And if the time the pull entry is accessed is much later, then we might be holding to a much larger set of the DB's persistent representation in memory, but if we access it quite soon and are done with it, then, well, that's not an issue.
Phew, thanks a lot for clarifying my wrong intuitions about this! 😅
(->> (d/q '[:find ?x ...] db)
(map (fn [result-tuple]
(update result-tuple 0 #(d/pull db pull-expr %)))))
(->> (d/q '[:find ?x ...] db)
(mapv (fn [result-tuple]
(update result-tuple 0 #(d/pull db pull-expr %)))))
so you want qseq if you are memory constrained; q if you want to avoid blocking when processing the result
but most of the time if you just want a subset I’d say you want q, then sort, then get your subset, then pull
unfortunately qseq realizes the pull when the result item is realized, not when the individual slot in the item is accessed, so it’s not ok for getting the entire result set, doing something with the other fields, then looking at the pulled fields
in the peer model, decoupling all this for more control is usually fine; but in the client model each decoupling incurs another network hop, or has to keep a bunch of stuff retained in the peer-server.
Ok, so laziness' comeback (if we ever thought it was starting to be abandoned by e.g. more use of transducers), with some tradeoffs based on the specific implementation details.
While you were writing these last posts, I was writing this below, and I now realize how naïve the below is:
Then the following end-user advice would seem practical, IIUW:
• When loading info where the user might want to explore just some subparts, always use qseq
.
• When loading info where the user will definitely see all of its subparts, prefer q
.
so you either are partially consuming results and don’t care about order, or you are consuming all results but incrementally without head-holding.
again if your query has no pull in its find, there is no difference between q and qseq, they are exactly the same
pulls often take a long time and its results take a lot of memory relative to the datalog query evaluation and results. qseq lets you defer that work to when the result entry is read
oh, and on the client, they also take a lot of network and marshalling-unmarshalling all at once, because of those maps
so client api has an additional advantage of smoothing that out into more smaller payloads
Ok, thanks a lot, this will be quite useful to refer to!
As for observing the impact on our app's e.g. API handlers performance in time and space, I think the best way to do it would be to bring in cpu and heap metrics to compare when we toggle to q
or to qseq
(instead of running both blindly), grouped per... not by API handler, but by its constituent parts, b/c one handler might call many functions making many requests, each having its own distinct impact on the system. This should provide solid feedback to tune our intuitions and refine our choices. And you might say we'll have much more important optimization opportunities to make before that, like tuning our actual finds and pull repres'es.
I should think to report back about this when priorities allow this (I'll have to enable OpenTelemetry host metrics before I can make these kinds of correlations with confidence).
I wouldn’t overthink the difference here. This is a throughput vs latency, IO vs memory trade off
If you default to qseq, you are probably fine unless you process the result somewhere using some threadpool meant for non-blocking cpu workloads
IME that’s not what most web apps do and they would rather reduce the working set size, put less stress on the gc, etc
@U0514DPR7 thanks for resurrecting this thread and @U09R86PA4 for the clarification.
One last question to check my understanding: is there a tradeoff between doing a pull
inside a qseq
(vs first doing a q
for eids followed by a pull-many
)?
It sounds like in both cases the eids query is greedy and the pull would be lazy (unless I'm misremembering that pull-many is lazy). In the case of the client, I guess this would be one more network hop, but on-prem this should be equivalent. Correct?
Pull-many is not lazy. When I say “query then use pull-many” I’m imagining some kind of chunking
Pull-many seems to prefer aevt indexes as a data source when it can, and I’m not sure pull-inside-q does.
So it’s exactly as you describe except for the clarification about pull-many laziness and the uncertainty about which indexes will end up being used
This sounds like yet another knob one can twiddle when you’re not quite getting the performance you expect on your workload.
Copyright related question - if I implement a spec of the datomic query and pull API abstract syntax as appears in the official documentation, do I need to do anything regarding licensing, attribution, copyrights assignment, mentions, etc?
Is there now someone sufficiently knowledgeable here to provide an answer to this question?
> Is there ever a reason to prefer q
over qseq
?
>
> seems like qseq
is strictly better (more powerful/expressive) in every way (zero loss in expressive power)
I find https://docs.datomic.com/cloud/query/query-executing.html#qseq a bit lacking, and searching past discussions in this forum for qseq
seem to reflect that.
To observe differences, I put in place an A-B test in our codebase that blindly runs and compares all our q
calls to qseq
.
For now I observed two functional differences:
1. As stated, the fact that qseq
returns a seq implies that empty results are not []
like q
, but nil
instead. So I wrapped qseq in e.g. (or (qseq...) [])
to not affect the codebase that depends on results always being vectorized (in some areas), and continued observing.
2. I observed one case where a query returns results in a different order.
And some qualitative differences:
• Time: results get back significantly faster, as shown in screenshots.
• I didn't yet observe if our app's usage of these results is slowed down by lazy realization or not, though. I suppose I should.
• Didn't see an impact in CloudWatch metrics of the Datomic Cloud servers, but I didn't activate the detailed metrics to see memory. I suppose I should. :)