This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-04-22
Channels
- # announcements (2)
- # architecture (33)
- # babashka (4)
- # beginners (445)
- # bristol-clojurians (10)
- # calva (23)
- # cider (43)
- # clj-kondo (36)
- # cljs-dev (13)
- # cljsrn (20)
- # clojure (136)
- # clojure-argentina (8)
- # clojure-dev (1)
- # clojure-europe (18)
- # clojure-germany (1)
- # clojure-italy (5)
- # clojure-nl (45)
- # clojure-spec (66)
- # clojure-uk (29)
- # clojurescript (69)
- # conjure (157)
- # cursive (2)
- # datomic (216)
- # emacs (10)
- # exercism (4)
- # figwheel-main (8)
- # fulcro (30)
- # graphql (21)
- # hoplon (5)
- # kaocha (7)
- # leiningen (3)
- # luminus (4)
- # music (1)
- # off-topic (24)
- # pathom (10)
- # re-frame (19)
- # reagent (11)
- # reitit (16)
- # remote-jobs (1)
- # ring-swagger (5)
- # rum (7)
- # shadow-cljs (125)
- # spacemacs (8)
- # sql (9)
- # tools-deps (12)
I have an 8core iMac with 80GB RAM. Trying to import bigger amounts of data on it into an on-prem datomic dev storage. I see very little CPU utilization (~10-20%) What can I do to make a better use of the machine? I'm already doing this:
Launching with Java options -server -Xms4g -Xmx16g -XX:+UseG1GC -XX:MaxGCPauseMillis=50
and in my properties file for the txor, this:
## Guessed settings for -Xmx16g production usage.
memory-index-threshold=256m
memory-index-max=4g
object-cache-max=8g
I'm also not deref-ing the d/transact
calls.
I saw on the https://docs.datomic.com/on-prem/capacity.html#data-imports page,
that I should use the async API and do some pipelining, but not sure how.
Is there any example of such pipelining somewhere?
Am I hitting some limitation of the H2 store somehow?
i checked one import example:
https://github.com/Datomic/codeq/blob/master/src/datomic/codeq/core.clj#L466
but this doesn't use the async api, it's just not dereffing the d/transact
call...
i'm trying with d/transact-async
now and the utilization is slightly better, but then im not sure how to determine when has the import completed.
You get max utilization with pipelining plus back pressure. You achieve pipelining by using transact-async, leaving a bounded number in-flight (not dereffed) and backpressure by dereffing in order of submissions.
https://docs.datomic.com/cloud/best.html#pipeline-transactions explains and links to examples
Be warned that the impl they show there assumes no interdependence between transactions (core.async pipeline-blocking executes its parallel work in no particular order, but results are in the same order as input)
ah, i see! the on-prem docs also has that page: https://docs.datomic.com/on-prem/best-practices.html#pipeline-transactions thanks, @favila!
Look here for an project to study which includes retry and backpressure. https://github.com/Datomic/mbrainz-importer
I’m having a problem creating a database when running the datomic transactor in a docker container. I created the docker container as desribed https://hub.docker.com/r/pointslope/datomic-pro-starter/. Since I’d like to also run a peer server and a datomic-console dockerized, I configured the transactor with storage-access=remote
and set storage-datomic-password=a-secret
. The docker container exposes ports 4334-4336.
When connecting from the host via repl to the transactor (docker) I get an error:
Clojure 1.10.1-pro-0.9.6045 defa$ ./bin/repl-jline
user=> (require '[datomic.api :as d])
nil
user=> (d/create-database "datomic:")
Execution error (ActiveMQNotConnectedException) at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl/createSessionFactory (ServerLocatorImpl.java:787).
AMQ119007: Cannot connect to server(s). Tried with all available servers.
What does this error mean? With the wrong password I get:
Execution error (Exceptions$IllegalArgumentExceptionInfo) at datomic.error/arg (error.clj:79).
:db.error/read-transactor-location-failed Could not read transactor location from storage
the datomic transactor’s properties file needs a host=
or alt-host=
that has a name that other docker containers can resolve to the storage container
(in the dev storage case, the storage and transactor happen to be the same process, but that this is the general principle)
so connecting to “localhost” connects to the peer container localhost, which is not correct
datomic connection works like: 1) transactor writes its hostname into storage 2) d/connect on a peer connects to storage, retrieves transactor hostname 3) peer connects to transactor hostname
@favila not sure if I understand correctly… I changed host=localhost
to host=datomic-transactor
and log now says:
Launching with Java options -server -Xms1g -Xmx1g -XX:+UseG1GC -XX:MaxGCPauseMillis=50
Starting datomic: <DB-NAME>, storing data in: data ...
System started datomic: <DB-NAME>, storing data in: data
Since I’m connecting from the docker host, I altered /etc/hosts
to map datomic-transactor for 127.0.01 (localhost) … same problem when connecting to `
datomic:
…I will try from my docker peer server but thought that i hat to create a database first (before launching the peer)
try nc -zv datomic-transactor 4334
from a terminal running in the same context as your peer
$ nc -zv datomic-transactor 4334
found 0 associations
found 1 connections:
1: flags=82<CONNECTED,PREFERRED>
outif lo0
src 127.0.0.1 port 52204
dst 127.0.0.1 port 4334
rank info not available
TCP aux info available
Connection to datomic-transactor port 4334 [tcp/*] succeeded!
Just to see if I understand peer-servers corretly… can I start a peer-server without (d/create-database <URI>)
first? Because I get:
Execution error at datomic.peer/get-connection$fn (peer.clj:661).
Could not find my-db in catalog
Full report at:
/tmp/clojure-3528411252793798518.edn
where my-db
has not been created before.$ nc -zv datomic-transactor 4335
found 0 associations
found 1 connections:
1: flags=82<CONNECTED,PREFERRED>
outif lo0
src 127.0.0.1 port 52844
dst 127.0.0.1 port 4335
rank info not available
TCP aux info available
Connection to datomic-transactor port 4335 [tcp/*] succeeded!
if both of these work, your bin/repl-jline should succeed if you run it from the same terminal
so that means the transactor bound to the docker container’s localhost, 127.0.0.1; probably not the same as the peer’s?
Not sure but it does work now. Thank you very much @favila for your quick response and fruitful help!
I usually see and use host=0.0.0.0 alt-host=something-resolveable
so I don’t have to worry about how the host=
resolves on both transactor and peer
I'm trying to query out datom changes between a start and end date under a cardinality many attribute by doing this:
'[:find ?date ?tx ?w ?attr ?v ?op
:keys date tx db/id attr v op
:in $ ?container ?start ?stop
:where
[?container :my-ref-many ?w]
[?w ?a ?v ?tx ?op]
[?a :db/ident ?attr]
[?tx :db/txInstant ?date]
[(.before ^Date ?date ?stop)]
[(.after ^Date ?date ?start)]]
The query always times out. I assume it must be doing something very inefficient (e.g., full db scan). Is there a more efficient way to get this sort of data out?https://github.com/cognitect-labs/day-of-datomic-cloud/blob/master/tutorial/filters.repl#L102
I'm struggling figuring out how I'm supposed to join across these dbs. I'm trying:
'[:find #_?date ?tx ?w ?a ?v ?op
:keys #_date tx db/id attr v op
:in $as-of $since ?workspaces-group ?start ?stop
:where
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
[$as-of ?w _ ]
[$since ?w _]
[?w ?a ?v ?op ?tx]
#_[?tx :db/txInstant ?date]
#_[?a :db/ident ?attr]]
and get
Nil or missing data source. Did you forget to pass a database argument?
Is there an example of this somewhere?This?
'[:find ?tx ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
[$since ?w ?a ?v ?tx ?op]
[$as-of ?w ?a ?v ?tx ?op]]
Doesn't that only return datoms where ?a ?v ?tx ?op
in both since and as-of are the same?
I'm pretty sure this is what I want:
'[:find ?tx ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$since ?w ?a ?v ?tx ?op]
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]]
But I get
Execution error (ExceptionInfo) at datomic.client.api.async/ares (async.clj:58).
processing clause: (?w ?a ?v ?tx ?op), message: java.lang.ArrayIndexOutOfBoundsException
Not really sure what that exception means. Here's a larger stacktrace:
clojure.lang.ExceptionInfo: processing clause: (?w ?a ?v ?tx ?op), message: java.lang.ArrayIndexOutOfBoundsException {:cognitect.anomalies/category :cognitect.anomalies/incorrect, :cognitect.anomalies/message "processing clause: (?w ?a ?v ?tx ?op), message: java.lang.ArrayIndexOutOfBoundsException", :dbs [{:database-id "f3253b1f-f5d1-4abd-8c8e-91f50033f6d9", :t 105925, :next-t 105926, :history false}]}
at datomic.client.api.async$ares.invokeStatic(async.clj:58)
at datomic.client.api.async$ares.invoke(async.clj:54)
at datomic.client.api.sync$unchunk.invokeStatic(sync.clj:47)
at datomic.client.api.sync$unchunk.invoke(sync.clj:45)
at datomic.client.api.sync$eval50206$fn__50227.invoke(sync.clj:101)
at datomic.client.api.impl$fn__11664$G__11659__11671.invoke(impl.clj:33)
at datomic.client.api$q.invokeStatic(api.clj:350)
at datomic.client.api$q.invoke(api.clj:321)
at datomic.client.api$q.invokeStatic(api.clj:353)
at datomic.client.api$q.doInvoke(api.clj:321)
Got it. See the duplicate :find
here:
'[:find ?tx ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$since ?w ?a ?v ?tx ?op]
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]]
That's a nasty error message though 🙂I want all ?w added or retracted between 2 dates that were on the :aws-workspaces-group/monitored-workspaces
card many ref attr.
This query gives me some results
[:find ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
[$since ?w ?a ?v ?tx ?op]]
It appears to be missing retractions.No. Called like this:
(d/q
'[:find ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
[$since ?w ?a ?v ?tx ?op]]
(d/as-of db stop-date)
(d/since db start-date)
[:application-spec/id workspaces-group-id])
so this gives you ?w that were monitored at the moment of stop-date, then looks for datoms on those ?w entities since start-date (if you make that $since a history-db)
in particular, if there’s a ?w that used to be monitored between start and stop, you won’t see it
you want ones that started to be monitored after start, or those that were monitored at start or any time between start and stop?
(d/q '[:find ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
[$since ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w _ true]
[$since ?w ?a ?v ?tx ?op]]
(d/as-of db start)
(-> db (d/history) (d/as-of end) (d/since start))
workspaces-group)
then you look for groups again in $since for any that began to be monitored between start and end
then you look for any datoms added to ?w between start (not-inclusive) and end (inclusive)
it’s possible you want to include ?start there too, in which case you need to decrement start-t of $since by one
Won't [$since ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w _ true]
not work if ?workspaces-group
was not transacted within start and stop?
Why would this not work?
(d/q
'[:find ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
[$since ?w ?a ?v ?tx ?op]]
(d/as-of db stop-date)
(-> (d/history db) (d/as-of stop-date) (d/since start-date))
[:application-spec/id workspaces-group-id])
So perhaps query for all ?w at start-date and any added up to end-date. Pass that to a second query that uses (-> (d/history db) (d/as-of stop-date) (d/since start-date))
to get all datoms
(d/q '[:find ?w ?a ?v ?tx ?op
:in $as-of $since ?workspaces-group
:where
[$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w-at]
[$since ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w-since _ true]
(or-join [?w-at ?w-since ?w]
[(identity ?w-at) ?w]
[(identity ?w-since) ?w])
[$since ?w ?a ?v ?tx ?op]]
(d/as-of db start)
(-> db (d/history) (d/as-of end) (d/since start))
workspaces-group)
?if you know that set will be small across all time, you could filter by ?tx like you were doing before
so once asserted on an entity, it is never retracted and never asserted on a different entity
With 3 queries I'd do: 1. Query for all ?w that are monitored in as-of. 2. Query for all ?w added to monitored in since. 3. Pass the union of ?w in 1 and 2 to a history db and get all the datoms
correct; the db in 3 is either the same as 2 or just with a since adjusted 1 tx backward
If none are added then that query will throw. Guess I just catch that and return an empty set.
> the db in 3 is either the same as 2 or just with a since adjusted 1 tx backward Oh, right it would be the same. Since now we know all the ?w it's easy to search for the matching datoms.
this difference should only happenmatter if you ever change ?w and group membership in the same tx
Wait the db for 2 needs to include retracts. If a workspace was retracted between start and end, it would not be included in query 3.
I think that just means changing the passed in db to be (-> (d/history db) (d/as-of stop-date) (d/since start-date))
I also don't think the lookup ref for :application-spec/id
will be present in that db so I'll need to have the db/id for ?workspace-group
I could do it in query 1. Since query 2 is filtered by as-of and since, I don't think the :application-spec/id
attribute will be included since it would have been transacted before the since filter.
i.e., this query would never return any results given :application-spec/id
was transacted before start-date
(d/q '[:find ?w
:in $ ?workspaces-group-id
:where
[?workspace-group :application-spec/id ?workspaces-group-id]
[?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
(-> (d/history db) (d/as-of stop-date) (d/since start-date))
workspaces-group-id)
And this throws:
(d/q '[:find ?w
:in $ ?workspaces-group
:where
[?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
(-> (d/history db) (d/as-of stop-date) (d/since start-date))
[:application-spec/id workspaces-group-id])
Landed here:
(defn get-workspaces-over-time2
[db workspaces-group-id start-date stop-date]
(let [group-db-id (:db/id (d/pull db [:db/id] [:application-spec/id workspaces-group-id]))
cur-ws (->> (d/q '[:find ?w
:in $ ?workspace-group
:where
[?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
(d/as-of db start-date) [:application-spec/id workspaces-group-id])
(map first))
added-ws (->> (d/q '[:find ?w
:in $ ?workspaces-group
:where
[?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
(-> (d/history db) (d/as-of stop-date) (d/since start-date))
group-db-id)
(map first))
all-ws (set (concat cur-ws added-ws))
datoms (d/q '[:find ?w ?a ?v ?tx ?op
:in $ [?w ...]
:where
[?w ?a ?v ?tx ?op]]
(d/history db) all-ws)]
datoms))
But I'm back to where I started 😞
processing clause: [?w ?a ?v ?tx ?op], message: java.util.concurrent.TimeoutException: Query canceled: timeout elapsed
Using (-> (d/history db) (d/as-of stop-date) (d/since start-date))
hangs "forever". I've been letting it run since I sent the 874 message
Hmm, ok. That is a potential solution. Thank you for working with me on this. It's been incredibly insightful. Any idea why that last query is so expensive?
it probably won’t make a difference, but it increases the chance the next segment (in between datom calls) is already loaded
Interesting. A bit surprised by that. Would really like to know what's in there that would cause it to be so big 🙂 In this case it shouldn't be that big.
Oh wow, there is definitely an attribute in there that gets updated all the time that is useless here.
Would need to pull the db-ids of all the attrs to filter since those are also transacted outside the between-db.
Weird error doing that:
processing clause: {:argvars nil, :fn #object[datomic.core.datalog$expr_clause$fn__23535 0x11f3ef5d "datomic.core.datalog$expr_clause$fn__23535@11f3ef5d"], :clause [(ground $__in__3) [?a ...]], :binds [?a], :bind-type :list, :needs-source true}, message: java.util.concurrent.TimeoutException: Query canceled: timeout elapsed
I would like to find entities with an (cardmany) attribute with more than one value. A theoretical example is finding customers with more than n orders. What's the best way to go about this? Note - using cloud
I just get [] when trying this, so maybe I'm misunderstanding something. I just tried with the mbrainz database (to use a public dataset) to do something like find tracks with multiple artists (:track/artists is cardmany ref).
(d/q '[:find ?e
:where
[?e :track/artists ?a]
[?e :track/artists ?a2]
[(!= ?a ?a2)]]
db)
I'm new to Datomic and trying to learn, so I believe I am missing some knowledge here maybe?are you sure db is what you think it is? are you sure any track actually has multiple artists?
Here’s a minimal example:
(d/q '[:find ?e
:where
[?e :artist ?v]
[?e :artist ?v2]
[(!= ?v ?v2)]]
[[1 :artist "foo"]
[2 :artist "bar"]
[2 :artist "baz"]])
I'm sure there are multiple artists on some tracks, and I know of a few tracks specifically.
(d/q ’[:find ?e :where [?e :artist ?v] [?e :artist ?v2] [(!= ?v ?v2)]] [[1 :artist “foo”] [2 :artist “bar”] [2 :artist “baz”]]) => #{[2]}
oh, I bet it needs some kind of db somewhere in the data sources to know where to send the query
(d/q ’[:find ?e :in $ $db :where [?e :artist ?v] [?e :artist ?v2] [(!= ?v ?v2)]] [[1 :artist “foo”] [2 :artist “bar”] [2 :artist “baz”]] some-db)
I was just trying to demonstrate in a low-effort, db-agnostic way that the self-join should work
Am I right that datomic cloud query doesn’t let you look at the log? (tx-ids, tx-data)
log-in-query is not in the client API You can use tx-range, however: https://github.com/cognitect-labs/day-of-datomic-cloud/blob/master/tutorial/log.clj
Hmm. So this would require some sort of iterative approach? I'd need to query for the tx id for my start and end dates and the filter the :aevt index for datoms within the tx id range. Using that result, for all entity ids returned, I'd filter the :eavt for tx ids between my start and end dates. I would then resolve all attribute ids, giving me my list. Is this what you were thinking @ghadi?