datomic 2020-04-22 | Slack Archive

onetom07:04:03

I have an 8core iMac with 80GB RAM. Trying to import bigger amounts of data on it into an on-prem datomic dev storage. I see very little CPU utilization (~10-20%) What can I do to make a better use of the machine? I'm already doing this:

Launching with Java options -server -Xms4g -Xmx16g -XX:+UseG1GC -XX:MaxGCPauseMillis=50

and in my properties file for the txor, this:

## Guessed settings for -Xmx16g production usage.
memory-index-threshold=256m
memory-index-max=4g
object-cache-max=8g

onetom07:04:09

I'm also not deref-ing the d/transact calls. I saw on the https://docs.datomic.com/on-prem/capacity.html#data-imports page, that I should use the async API and do some pipelining, but not sure how. Is there any example of such pipelining somewhere? Am I hitting some limitation of the H2 store somehow?

onetom07:04:36

i checked one import example: https://github.com/Datomic/codeq/blob/master/src/datomic/codeq/core.clj#L466 but this doesn't use the async api, it's just not dereffing the d/transact call...

onetom07:04:06

i'm trying with d/transact-async now and the utilization is slightly better, but then im not sure how to determine when has the import completed.

favila09:04:47

You get max utilization with pipelining plus back pressure. You achieve pipelining by using transact-async, leaving a bounded number in-flight (not dereffed) and backpressure by dereffing in order of submissions.

favila09:04:30

https://docs.datomic.com/cloud/best.html#pipeline-transactions explains and links to examples

favila09:04:44

Be warned that the impl they show there assumes no interdependence between transactions (core.async pipeline-blocking executes its parallel work in no particular order, but results are in the same order as input)

👍 4

onetom10:04:35

ah, i see! the on-prem docs also has that page: https://docs.datomic.com/on-prem/best-practices.html#pipeline-transactions thanks, @favila!

ghadi13:04:52

the examples there don't retry either

Joe Lane13:04:26

Look here for an project to study which includes retry and backpressure. https://github.com/Datomic/mbrainz-importer

defa14:04:38

I’m having a problem creating a database when running the datomic transactor in a docker container. I created the docker container as desribed https://hub.docker.com/r/pointslope/datomic-pro-starter/. Since I’d like to also run a peer server and a datomic-console dockerized, I configured the transactor with storage-access=remote and set storage-datomic-password=a-secret. The docker container exposes ports 4334-4336. When connecting from the host via repl to the transactor (docker) I get an error:

Clojure 1.10.1-pro-0.9.6045 defa$ ./bin/repl-jline 
user=> (require '[datomic.api :as d])                                                   
nil
user=> (d/create-database "datomic:")
Execution error (ActiveMQNotConnectedException) at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl/createSessionFactory (ServerLocatorImpl.java:787).
AMQ119007: Cannot connect to server(s). Tried with all available servers.

What does this error mean? With the wrong password I get:

Execution error (Exceptions$IllegalArgumentExceptionInfo) at datomic.error/arg (error.clj:79).
:db.error/read-transactor-location-failed Could not read transactor location from storage

favila14:04:29

the datomic transactor’s properties file needs a host= or alt-host= that has a name that other docker containers can resolve to the storage container

👍 4

favila14:04:04

(in the dev storage case, the storage and transactor happen to be the same process, but that this is the general principle)

favila14:04:32

so connecting to “localhost” connects to the peer container localhost, which is not correct

favila14:04:17

datomic connection works like: 1) transactor writes its hostname into storage 2) d/connect on a peer connects to storage, retrieves transactor hostname 3) peer connects to transactor hostname

favila14:04:36

you appear to be failing at step 3 in your first error, step 2 in your second error

defa14:04:39

@favila not sure if I understand correctly… I changed host=localhost to host=datomic-transactor and log now says:

Launching with Java options -server -Xms1g -Xmx1g -XX:+UseG1GC -XX:MaxGCPauseMillis=50
Starting datomic:<DB-NAME>, storing data in: data ...
System started datomic:<DB-NAME>, storing data in: data

favila14:04:35

from where you peer is, can you resolve “datomic-transactor” ?

defa14:04:55

Since I’m connecting from the docker host, I altered /etc/hosts to map datomic-transactor for 127.0.01 (localhost) … same problem when connecting to `

datomic:

…

defa14:04:51

I will try from my docker peer server but thought that i hat to create a database first (before launching the peer)

favila14:04:52

try nc -zv datomic-transactor 4334 from a terminal running in the same context as your peer

defa14:04:33

$ nc -zv datomic-transactor 4334
found 0 associations
found 1 connections:
     1:	flags=82<CONNECTED,PREFERRED>
	outif lo0
	src 127.0.0.1 port 52204
	dst 127.0.0.1 port 4334
	rank info not available
	TCP aux info available

Connection to datomic-transactor port 4334 [tcp/*] succeeded!

defa14:04:25

Just to see if I understand peer-servers corretly… can I start a peer-server without (d/create-database <URI>) first? Because I get:

Execution error at datomic.peer/get-connection$fn (peer.clj:661).
Could not find my-db in catalog

Full report at:
/tmp/clojure-3528411252793798518.edn

where my-db has not been created before.

favila14:04:58

no, you need the db first

favila14:04:35

now try nc -zv datomic-transactor 4335

favila14:04:40

(4335 is storage)

defa14:04:04

$ nc -zv datomic-transactor 4335
found 0 associations
found 1 connections:
     1:	flags=82<CONNECTED,PREFERRED>
	outif lo0
	src 127.0.0.1 port 52844
	dst 127.0.0.1 port 4335
	rank info not available
	TCP aux info available

Connection to datomic-transactor port 4335 [tcp/*] succeeded!

favila14:04:20

if both of these work, your bin/repl-jline should succeed if you run it from the same terminal

favila14:04:50

(specifically the create-database you were trying before)

defa15:04:59

Now it does… just wondering why it didn’t before. But I tried with a new repl…

defa15:04:51

even works with localhost in the uri…

favila15:04:58

what was in your host= before?

defa15:04:13

host=localhost…

favila15:04:44

so that means the transactor bound to the docker container’s localhost, 127.0.0.1; probably not the same as the peer’s?

favila15:04:31

(i’m fuzzy on docker networking)

defa15:04:54

Not sure but it does work now. Thank you very much @favila for your quick response and fruitful help!

defa15:04:17

I’m fairly new to docker and datomic but your explanations do make perfect sense!

favila15:04:40

I usually see and use host=0.0.0.0 alt-host=something-resolveable so I don’t have to worry about how the host= resolves on both transactor and peer

defa15:04:19

Okay, will try this as well. Thanks again!

favila15:04:27

the transactor will use host= for binding, and advertise both for connecting

favila15:04:40

and the peers will end up using alt-host

kenny16:04:52

I'm trying to query out datom changes between a start and end date under a cardinality many attribute by doing this:

'[:find ?date ?tx ?w ?attr ?v ?op
  :keys date tx db/id attr v op
  :in $ ?container ?start ?stop
  :where
  [?container :my-ref-many ?w]
  [?w ?a ?v ?tx ?op]
  [?a :db/ident ?attr]
  [?tx :db/txInstant ?date]
  [(.before ^Date ?date ?stop)]
  [(.after ^Date ?date ?start)]]

The query always times out. I assume it must be doing something very inefficient (e.g., full db scan). Is there a more efficient way to get this sort of data out?

👍 4

marshall17:04:50

use a since-db

marshall17:04:51

https://github.com/cognitect-labs/day-of-datomic-cloud/blob/master/tutorial/filters.repl#L102

marshall17:04:29

you could use a since and an as-of db to get an ‘in-between’

kenny17:04:10

Oooo, ok! I'll try that.

kenny17:04:57

I'm struggling figuring out how I'm supposed to join across these dbs. I'm trying:

'[:find #_?date ?tx ?w ?a ?v ?op
  :keys #_date tx db/id attr v op
  :in $as-of $since ?workspaces-group ?start ?stop
  :where
  [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
  [$as-of ?w _ ]
  [$since ?w _]
  [?w ?a ?v ?op ?tx]
  #_[?tx :db/txInstant ?date]
  #_[?a :db/ident ?attr]]

and get

Nil or missing data source. Did you forget to pass a database argument?

Is there an example of this somewhere?

marshall17:04:48

you need to pass the “regular” db as well as the others (i believe

marshall17:04:14

so :in $ $asof $since

favila17:04:31

I think it’s just that this clause doesn’t specify a db

favila17:04:33

[?w ?a ?v ?op ?tx]

marshall17:04:39

oh, right

marshall17:04:42

yes that’s definitely why

marshall17:04:07

thx @favila

kenny17:04:15

But what is supposed to go there?

marshall17:04:26

which db value do you want that information from

kenny17:04:51

I think both?

marshall17:04:22

then you’d need 2 clauses

marshall17:04:23

one for each db

marshall17:04:30

and you’ll only get datoms that are the same in both

kenny17:04:36

This?

'[:find ?tx ?w ?a ?v ?tx ?op
  :in $as-of $since ?workspaces-group
  :where
  [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
  [$since ?w ?a ?v ?tx ?op]
  [$as-of ?w ?a ?v ?tx ?op]]

kenny17:04:32

Doesn't that only return datoms where ?a ?v ?tx ?op in both since and as-of are the same?

kenny17:04:11

I'm pretty sure this is what I want:

'[:find ?tx ?w ?a ?v ?tx ?op
  :in $as-of $since ?workspaces-group
  :where
  [$since ?w ?a ?v ?tx ?op]
  [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]]

But I get

Execution error (ExceptionInfo) at datomic.client.api.async/ares (async.clj:58).
processing clause: (?w ?a ?v ?tx ?op), message: java.lang.ArrayIndexOutOfBoundsException

kenny17:04:34

Not really sure what that exception means. Here's a larger stacktrace:

clojure.lang.ExceptionInfo: processing clause: (?w ?a ?v ?tx ?op), message: java.lang.ArrayIndexOutOfBoundsException {:cognitect.anomalies/category :cognitect.anomalies/incorrect, :cognitect.anomalies/message "processing clause: (?w ?a ?v ?tx ?op), message: java.lang.ArrayIndexOutOfBoundsException", :dbs [{:database-id "f3253b1f-f5d1-4abd-8c8e-91f50033f6d9", :t 105925, :next-t 105926, :history false}]}
	at datomic.client.api.async$ares.invokeStatic(async.clj:58)
	at datomic.client.api.async$ares.invoke(async.clj:54)
	at datomic.client.api.sync$unchunk.invokeStatic(sync.clj:47)
	at datomic.client.api.sync$unchunk.invoke(sync.clj:45)
	at datomic.client.api.sync$eval50206$fn__50227.invoke(sync.clj:101)
	at datomic.client.api.impl$fn__11664$G__11659__11671.invoke(impl.clj:33)
	at datomic.client.api$q.invokeStatic(api.clj:350)
	at datomic.client.api$q.invoke(api.clj:321)
	at datomic.client.api$q.invokeStatic(api.clj:353)
	at datomic.client.api$q.doInvoke(api.clj:321)

kenny18:04:50

Got it. See the duplicate :find here:

'[:find ?tx ?w ?a ?v ?tx ?op
  :in $as-of $since ?workspaces-group
  :where
  [$since ?w ?a ?v ?tx ?op]
  [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]]

That's a nasty error message though 🙂

favila18:04:48

what is the set of ?w you are interested in?

favila18:04:18

those currently monitored only? or ones that were ever monitored?

kenny18:04:38

I want all ?w added or retracted between 2 dates that were on the :aws-workspaces-group/monitored-workspaces card many ref attr.

favila18:04:01

the confusing here is there are two different entity histories to consider

kenny18:04:39

This query gives me some results

[:find ?w ?a ?v ?tx ?op
 :in $as-of $since ?workspaces-group
 :where
 [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
 [$since ?w ?a ?v ?tx ?op]]

It appears to be missing retractions.

favila18:04:15

are either of those history dbs?

kenny18:04:32

No. Called like this:

(d/q
  '[:find ?w ?a ?v ?tx ?op
    :in $as-of $since ?workspaces-group
    :where
    [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
    [$since ?w ?a ?v ?tx ?op]]
  (d/as-of db stop-date)
  (d/since db start-date)
  [:application-spec/id workspaces-group-id])

favila18:04:32

so this gives you ?w that were monitored at the moment of stop-date, then looks for datoms on those ?w entities since start-date (if you make that $since a history-db)

favila18:04:05

in particular, if there’s a ?w that used to be monitored between start and stop, you won’t see it

favila18:04:13

is that what you want?

kenny18:04:01

No. I want ?w the used to be monitored between start and stop included as well.

favila18:04:59

you want ones that started to be monitored after start, or those that were monitored at start or any time between start and stop?

Correct

…so both?

Yes

Then I think you need something like this:

favila18:04:14

(d/q '[:find ?w ?a ?v ?tx ?op
       :in $as-of $since ?workspaces-group
       :where
       [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
       [$since ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w _ true]
       [$since ?w ?a ?v ?tx ?op]]
     (d/as-of db start)
     (-> db (d/history) (d/as-of end) (d/since start))
     workspaces-group)

favila18:04:55

as-of-start gets you whatever workspaces were already being monitored at start moment

favila18:04:43

then you look for groups again in $since for any that began to be monitored between start and end

favila18:04:54

?w is now the set-union of both

favila18:04:42

then you look for any datoms added to ?w between start (not-inclusive) and end (inclusive)

favila18:04:15

it’s possible you want to include ?start there too, in which case you need to decrement start-t of $since by one

kenny18:04:29

Won't [$since ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w _ true] not work if ?workspaces-group was not transacted within start and stop?

favila18:04:59

doh you are right, this is unification not union

favila18:04:18

doing this efficiently might need two queries

favila18:04:28

you can’t use two different data sources in an or

kenny18:04:24

Why would this not work?

(d/q
  '[:find ?w ?a ?v ?tx ?op
    :in $as-of $since ?workspaces-group
    :where
    [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w]
    [$since ?w ?a ?v ?tx ?op]]
  (d/as-of db stop-date)
  (-> (d/history db) (d/as-of stop-date) (d/since start-date))
  [:application-spec/id workspaces-group-id])

favila18:04:09

It would miss ?w that were removed from workspaces-group between start and stop

favila18:04:27

it’s the first choice I offered you earlier

kenny18:04:34

Oh, right. That query also hangs for 10+ seconds. Didn't let it finish.

favila18:04:43

this is only ?w that were part of the group at the very moment of end-date

✔️ 4

favila18:04:51

using $since instead would miss ?w that were in the group at the moment of start-date

kenny18:04:56

So perhaps query for all ?w at start-date and any added up to end-date. Pass that to a second query that uses (-> (d/history db) (d/as-of stop-date) (d/since start-date)) to get all datoms

favila18:04:21

yes, so 3 queries

kenny18:04:32

The first one needs to be 2 queries, huh?

favila18:04:04

maybe you can unify later, let me think

favila18:04:55

(d/q '[:find ?w ?a ?v ?tx ?op
       :in $as-of $since ?workspaces-group
       :where
       [$as-of ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w-at]
       [$since ?workspaces-group :aws-workspaces-group/monitored-workspaces ?w-since _ true]
       (or-join [?w-at ?w-since ?w]
                [(identity ?w-at) ?w]
                [(identity ?w-since) ?w])
       [$since ?w ?a ?v ?tx ?op]]
     (d/as-of db start)
     (-> db (d/history) (d/as-of end) (d/since start))
     workspaces-group)

kenny18:04:19

Same problem as the other query, I think. ?workspaces-group isn't in $since

favila18:04:05

yeah, and if I solved that, there would be the same problem with as-of

kenny18:04:19

Right

favila18:04:59

ugh, maybe a sentinel value, like -1

favila18:04:25

if you know that set will be small across all time, you could filter by ?tx like you were doing before

kenny18:04:26

Using an unfiltered history db?

favila18:04:29

yeah

favila18:04:42

just to get ?w

favila18:04:57

you still want the $since to get changes to ?w entities themselves

kenny18:04:52

What would you consider small? < 1,000,00?

favila18:04:13

I would not consider that small…

favila18:04:27

but really “small” is just “this query is fast enough”

favila18:04:39

I wonder if we can step back

favila18:04:41

[:application-spec/id workspaces-group-id]

kenny18:04:51

I think it will often be in the 10-50 thousand range.

favila18:04:51

is that an immutable identifier?

kenny18:04:02

Yes

kenny18:04:09

i.e., a lookup ref?

favila18:04:18

so once asserted on an entity, it is never retracted and never asserted on a different entity

kenny18:04:27

Right

kenny18:04:29

> it is never retracted Unless the entity itself is retracted

kenny18:04:28

With 3 queries I'd do: 1. Query for all ?w that are monitored in as-of. 2. Query for all ?w added to monitored in since. 3. Pass the union of ?w in 1 and 2 to a history db and get all the datoms

favila18:04:15

correct; the db in 3 is either the same as 2 or just with a since adjusted 1 tx backward

favila18:04:22

(depending on what you want)

kenny18:04:42

Should 2 be querying a (-> db (d/as-of stop-date) (d/since start-date))?

favila18:04:12

yes. the since sets an exclusive outer range

favila18:04:31

txs that occur exactly at start-date are excluded

kenny18:04:08

If none are added then that query will throw. Guess I just catch that and return an empty set.

kenny18:04:03

> the db in 3 is either the same as 2 or just with a since adjusted 1 tx backward Oh, right it would be the same. Since now we know all the ?w it's easy to search for the matching datoms.

favila18:04:41

that will omit changes to ?w that occurred exactly at start-date.

favila18:04:05

this difference should only ~~happen~~matter if you ever change ?w and group membership in the same tx

kenny18:04:17

Haha, right. Adjusting 1 tx back is easy though

kenny19:04:34

Wait the db for 2 needs to include retracts. If a workspace was retracted between start and end, it would not be included in query 3.

kenny19:04:36

I think that just means changing the passed in db to be (-> (d/history db) (d/as-of stop-date) (d/since start-date))

kenny19:04:15

I also don't think the lookup ref for :application-spec/id will be present in that db so I'll need to have the db/id for ?workspace-group

favila19:04:44

yes, sorry I misread your earlier db constructor. it needs d/history

favila19:04:58

you can look up the id in the query

kenny19:04:08

In query 2?

favila19:04:30

both?

kenny19:04:21

I could do it in query 1. Since query 2 is filtered by as-of and since, I don't think the :application-spec/id attribute will be included since it would have been transacted before the since filter.

kenny19:04:35

Unless there is some special condition for lookup refs

favila19:04:03

unsure how lookup refs are resolved with history or filtered dbs

kenny19:04:32

i.e., this query would never return any results given :application-spec/id was transacted before start-date

(d/q '[:find ?w
       :in $ ?workspaces-group-id
       :where
       [?workspace-group :application-spec/id ?workspaces-group-id]
       [?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
     (-> (d/history db) (d/as-of stop-date) (d/since start-date))
     workspaces-group-id)

kenny19:04:06

And this throws:

(d/q '[:find ?w
       :in $ ?workspaces-group
       :where
       [?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
     (-> (d/history db) (d/as-of stop-date) (d/since start-date))
     [:application-spec/id workspaces-group-id])

kenny19:04:47

So I think that means I need the ?workspace-group db/id before I do query 2.

favila19:04:05

but it may not exist at that time, right?

kenny19:04:17

Which time?

favila19:04:55

as-of. the time for query 1

favila19:04:12

a group can be created and destroyed in between start and end time

kenny19:04:45

Ah. If ?workspace-group doesn't exist at time 1, we would never need to run this query

kenny19:04:56

Landed here:

(defn get-workspaces-over-time2
  [db workspaces-group-id start-date stop-date]
  (let [group-db-id (:db/id (d/pull db [:db/id] [:application-spec/id workspaces-group-id]))
        cur-ws (->> (d/q '[:find ?w
                           :in $ ?workspace-group
                           :where
                           [?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
                         (d/as-of db start-date) [:application-spec/id workspaces-group-id])
                    (map first))
        added-ws (->> (d/q '[:find ?w
                             :in $ ?workspaces-group
                             :where
                             [?workspace-group :aws-workspaces-group/monitored-workspaces ?w]]
                           (-> (d/history db) (d/as-of stop-date) (d/since start-date))
                           group-db-id)
                      (map first))
        all-ws (set (concat cur-ws added-ws))
        datoms (d/q '[:find ?w ?a ?v ?tx ?op
                      :in $ [?w ...]
                      :where
                      [?w ?a ?v ?tx ?op]]
                    (d/history db) all-ws)]
    datoms))

But I'm back to where I started 😞

processing clause: [?w ?a ?v ?tx ?op], message: java.util.concurrent.TimeoutException: Query canceled: timeout elapsed

favila19:04:18

so you have a set of ?w at this point?

Right

how large is it?

874

your history db is unfiltered?

kenny19:04:40

Yes

kenny19:04:05

Using (-> (d/history db) (d/as-of stop-date) (d/since start-date)) hangs "forever". I've been letting it run since I sent the 874 message

kenny19:04:43

It also caused the datomic solo instance to spike to 2000% cpu 🙂

favila20:04:33

well, last resort you can use d/datoms :eavt for each ?w

favila20:04:40

with your filtered db

kenny20:04:24

Yeah... That results in ?w number of DB queries, right?

favila20:04:57

you can run them in parallel, but yes

favila20:04:06

they are lazily tailed though

favila20:04:22

queries are eager, datom-seeking is lazy

favila20:04:31

it could be the problem is result-set size

favila20:04:37

(mapcat #(d/datoms filtered-history-db :eavt %) (sort all-ws))

kenny20:04:52

Hmm, ok. That is a potential solution. Thank you for working with me on this. It's been incredibly insightful. Any idea why that last query is so expensive?

kenny20:04:19

Why'd you sort all-ws?

favila20:04:14

it probably won’t make a difference, but it increases the chance the next segment (in between datom calls) is already loaded

favila20:04:21

(the entire index is sorted, so fetching 1 2 3 4 5 is better than 5 2 1 4 3)

favila20:04:18

> Any idea why that last query is so expensive?

favila20:04:31

my suspicion is the result set size is large

kenny20:04:37

Interesting. A bit surprised by that. Would really like to know what's in there that would cause it to be so big 🙂 In this case it shouldn't be that big.

favila20:04:11

well if your instance ever calms down that mapcat will tell you for sure

favila20:04:16

I’m not saying it will be fast, but it will use almost no memory

favila20:04:55

(just make sure you don’t hold the head on your client…)

kenny20:04:54

Doing a count on it... Also hung. Must be huge.

kenny20:04:36

748650

kenny20:04:39

Oh wow, there is definitely an attribute in there that gets updated all the time that is useless here.

kenny20:04:08

That one should probably even be :db/noHistory

kenny20:04:11

I wonder if restricting the query to the attrs I'm interested would increase the perf.

kenny20:04:45

After filtering out those high churn attrs, I get a coll of 576 datoms

kenny20:04:56

Would need to pull the db-ids of all the attrs to filter since those are also transacted outside the between-db.

favila20:04:38

with a whitelist (or even blacklist) of attrs, you may be able to retry your query

favila20:04:05

i.e. not use datoms

kenny20:04:12

Weird error doing that:

processing clause: {:argvars nil, :fn #object[datomic.core.datalog$expr_clause$fn__23535 0x11f3ef5d "datomic.core.datalog$expr_clause$fn__23535@11f3ef5d"], :clause [(ground $__in__3) [?a ...]], :binds [?a], :bind-type :list, :needs-source true}, message: java.util.concurrent.TimeoutException: Query canceled: timeout elapsed

Cas Shun17:04:31

I would like to find entities with an (cardmany) attribute with more than one value. A theoretical example is finding customers with more than n orders. What's the best way to go about this? Note - using cloud

favila17:04:28

[?e ?card-many-a ?v] [?e ?card-many-a ?v2] [(!= ?v ?v2)]

Cas Shun17:04:39

I just get [] when trying this, so maybe I'm misunderstanding something. I just tried with the mbrainz database (to use a public dataset) to do something like find tracks with multiple artists (:track/artists is cardmany ref).

(d/q '[:find ?e
       :where
       [?e :track/artists ?a]
       [?e :track/artists ?a2]
       [(!= ?a ?a2)]]
     db)

I'm new to Datomic and trying to learn, so I believe I am missing some knowledge here maybe?

favila17:04:54

are you sure db is what you think it is? are you sure any track actually has multiple artists?

favila17:04:14

Here’s a minimal example:

(d/q '[:find ?e
       :where
       [?e :artist ?v]
       [?e :artist ?v2]
       [(!= ?v ?v2)]]
     [[1 :artist "foo"]
      [2 :artist "bar"]
      [2 :artist "baz"]])

Cas Shun18:04:12

I'm sure there are multiple artists on some tracks, and I know of a few tracks specifically.

Cas Shun18:04:46

the official cloud docs even have an example showing multiple artists on a track

Cas Shun18:04:17

however, your example returns []

favila18:04:06

favila18:04:14

(d/q ’[:find ?e :where [?e :artist ?v] [?e :artist ?v2] [(!= ?v ?v2)]] [[1 :artist “foo”] [2 :artist “bar”] [2 :artist “baz”]]) => #{[2]}

favila18:04:19

(from my repl)

Cas Shun14:04:41

This query doesn't work for me at all. Is this an on-prem thing?

favila14:04:48

I don’t think so? what happens?

favila14:04:19

oh, I bet it needs some kind of db somewhere in the data sources to know where to send the query

favila14:04:26

hm, not sure how I feel about that

favila14:04:56

try this:

favila15:04:05

(d/q ’[:find ?e :in $ $db :where [?e :artist ?v] [?e :artist ?v2] [(!= ?v ?v2)]] [[1 :artist “foo”] [2 :artist “bar”] [2 :artist “baz”]] some-db)

favila15:04:47

it shouldn’t matter what db you provide since it’s not read

favila15:04:12

I was just trying to demonstrate in a low-effort, db-agnostic way that the self-join should work

Cas Shun16:04:10

Unable to resolve symbol: "foo" in this context

favila16:04:11

that sounds like copy-paste error?

ghadi17:04:11

@kenny use the datoms API

ghadi17:04:12

https://docs.datomic.com/client-api/datomic.client.api.html#var-datoms

favila17:04:31

Am I right that datomic cloud query doesn’t let you look at the log? (tx-ids, tx-data)

marshall17:04:42

log-in-query is not in the client API You can use tx-range, however: https://github.com/cognitect-labs/day-of-datomic-cloud/blob/master/tutorial/log.clj

kenny17:04:13

Hmm. So this would require some sort of iterative approach? I'd need to query for the tx id for my start and end dates and the filter the :aevt index for datoms within the tx id range. Using that result, for all entity ids returned, I'd filter the :eavt for tx ids between my start and end dates. I would then resolve all attribute ids, giving me my list. Is this what you were thinking @ghadi?

joshkh22:04:40

is the cloud async client library useful for optimising some function which combines the results of more than one parallel query?

2020-04-22

Channels