Fork me on GitHub

it would be nice if datomic provided a reader literal for squuids #squuid "etc"


aren't they just uuids when you're reading them?


if you already have them, what part needs to know whether they were generated sequentially or not?


gosh. you're right. i'm a dork. i guess what i meant is it would be nice to generate squuids via a tag in edn


kinda like temp ids


no, i thought that's what you meant, just got confused by the "etc" i think


yeah, that was incorrect


I was curious, looks like these are the data readers defined by datomic {db/id datomic.db/id-literal db/fn datomic.function/construct base64 datomic.codec/base-64-literal}


#squuid might be handy too, I don't see why not


if we could do the same as we do with temp ids, e.g. #squuid[1], so that you could create relationships with squuids the same you can with db ids


that would be awesome


Anyone done a query/pull-exp cheat sheet? Because I would use that thing every day...

robert-stuttaford13:08:16 and are pretty comprehensive. i found that good old practice embedded the concepts quickly


Yeah, those are my go-tos. Still, it'd be nice to have a one-pager to quickly refer back to, especially if I haven't been doing it for a while & I'm forgetting particular details of syntax.


@robert-stuttaford: the trouble with a squuid tag is that the generated uuids would not be deterministic... at this point I'd say the content of the EDN file has stopped being 'just data',


sounds like a good opportunity to contribute 🙂


@val_waeselynck: true, but this is already the case with #db/id


IF you supported determinism with e.g. #squuid -1 so that multiple uses of the token resolve to the same value


using a basic cache


I know, I feel there's a difference with tempids though, not sure how to express it


at least with tempids it's always the same datoms that ends up in storage, not so with random uuids


so it's kinda more deterministic


which is probably why we don't have a reader tag 🙂


@robert-stuttaford: I guess so. Even the #db/id tag felt weird to me in the beginning TBH


@robert-stuttaford: btw, I recently stumbled on your podcast about Datomic and Onyx, I really liked it


thanks! which one? on


yeah that's the one


that was a fun chat. Vijay and Ray are a blast


I'm looking for solutions to make my analytics faster and more scalable, so definitely looking into tools like Onyx


@iwillig: enjoying your episode 🙂


you mentioned about how you're having to think differently about historical data


have you started to realise the difficulty of technical debt in your data ? 🙂


oh man. I am already stressing out about this, and I haven't had any problems yet.


i'm busy working on an epic to rebuild our database, transaction by transaction


initial analysis of the first 2mil txes yields ±120k txes i want to keep. the rest is either schema, data we no longer want, or bad programming


the bad programming and old data are about equal!


I'm worried about the ever-increasing complexity of historical queries that have to deal with schema changes


you mean having to query across all the versions of the schema?


yes, correct


which exacerbates my tendency to bikeshed such things


yeah. we've handled that in a couple ways. small data sets, we just re-transact and lose the time information. larger ones, we've continued to query across


It may not even become a problem in practice, I'm not yet sure how far back we'll need to go. But here I am worrying about it 🙂


i'm looking forward to unifying all that in the rebuild


the primary driver for doing this is to be prepared to shard in future, by building good tooling now


10 billion datoms is the theoretical upper limit for a db. we're at around 100mil, which means we have 99 copies to go. that's what's worrying me 🙂


I have a looooooooong way to go before I'm there 🙂


also, i get to re-partition the data according to the read patterns we've since discovered we have


that kind of thing keeps me from worrying about partitioning too much - I just don't know how it'll be. For some reason, that reasoning works on me for partitions, but not for future schema changes.


@bhagany: curious about your specific problem. Is it that you are querying on asOf dbs and need to compensate for "future schema change" in your queries that go too far in the past ?


yes, that's right


I don't actually have that problem yet. But I am trying to anticipate future schema needs now (and at the same time trying to not try, because that kind of thing can get you in trouble too)


@bhagany: my take on this was to actually stop using asOf in application code


history is not programmable


I can see your point there. I may come around to endorsing it, depending on how this goes.


@bhagany: That's a very interesting problem actually. I think what you could do in a technology like Apache Samza is derive a new Log of facts from an old Log of facts, adding the migration, and using the new Log as the data source in the application code.


That'd be an indirection between facts-recording and querying which Datomic does not have (yet)


interesting idea. I'll have to give that some thought.

Ben Kamphaus14:08:00

@robert-stuttaford: have you found in testing how much re-partitioning could possibly speed up query patterns that are currently problematic for you? Just curious.


@bkamphaus: not yet. i haven't managed to actually rebuild the db yet. it's a big task -- 58mil txes, ~4 years worth. the first 2mil txes yielded over 100 transaction shapes to reason through alone


i'll be certain to share any findings, though. this gon' be fun!

Ben Kamphaus15:08:56

yeah, one of the big value props for Datomic for me is being able to support arbitrary query without over-engineering any particular aspect of the schema/model for particular query patterns. Obviously you never quite hit that point 100% with any database, but I’m curious if in the wild people end up needing to solve pain points with partitioning, or something like a reasonable set of partition across a few logically grouped domains is usually sufficient.


@val_waeselynck im not sure that's true.. #db/id[:db.part/user] isn't deterministic, it's using a counter behind the scenes which is increased on each transaction


#db/id[:db.part/user -1] would be deterministic


or it depends on the basis-t of the db-after the transaction, im not sure it uses a counter or not


@danielstockton: yes but I would argue that it's the same datoms [eav] that end up in storage, so it's more deterministic in a way


e.g you can rely on transacting your edn file being idempotent


but the tempid determines the e in a datom, which can be different?


it depends when you transact and against what db


but if you're importing one edn file on a fresh database, then i guess it is...

Ben Kamphaus15:08:06

the idempotent aspect of the schema comes from upsert for a :db/ident att/val pair (which is a unique identity) and special rules for tempid resolution in that case.

Ben Kamphaus15:08:34

which is dependent on the fact that an entity id is not an attr/val pair but its own thing, and which is not true of e.g. a uuid attribute.

Ben Kamphaus15:08:18

it’s less what #db/id[:db.part/user -1] resolve to across all invocations versus the fact that they will resolve to the same tempid within a particular transaction, meaning that it will result in the implied link/relation/join for tempids being fulfilled by the resulting entity id generation.


is it possible to have a function to get-or-create an entity? I wanted to write a function that does a lookup, if it finds the criteria, it returns the ent-id of the match otherwise it creates the entity and returns the ent-id of the newly created entity. I can do the lookup in a regular datalog query but I believe that would not be thread-safe, e.g. if I'm importing big datasets and the query is run on multiple machines, it will rapidly create multiple duplicate entities. My function looks like this:


my function returns tx-info like so {:db-before datomic.db.Db@2f39cc32, :db-after datomic.db.Db@4be3c61a, :tx-data #object[java.util.ArrayList 0x787bfc5c [datomic.db.Datum@1953ce9d]], :tempids {}}.... but i didn't actually transact anything... do i access the returned query via tx-data ?


also... im calling the function like so... @(d/transact @db/conn [[:person/namer oid name]])... so i guess i am transacting... i'm a bit confused obviously on this subject

Ben Kamphaus17:08:27

@jdkealy: you’re crossing a couple of concerns that are decoupled in Datomic. I would split the logic somewhat.

Ben Kamphaus17:08:29

Do the query to see if what you’re looking for exists yet, if not, go through either a transaction function to create it or rely on assigning entity a unique identity so you can rely on Datomic’s upsert behavior

Ben Kamphaus17:08:44

eventually to figure out the outcome of the transaction to get the entity that was created you’ll want:


right.. but datomic's uniqueness constraint is only on a single attribute as far as i know

Ben Kamphaus17:08:38

If something has a unique identity in Datomic, it will handle that race for you, i.e. it will resolve the transaction to the existing entity ( )

Ben Kamphaus17:08:45

composite uniqueness isn’t a thing in Datomic at this point in time, yeah.


i thought that this kind of thing was the point of datomic functions

Ben Kamphaus17:08:33

yes, it is, though there’s an advantage to taking opportunities to rely on predefined behaviors rather than explicitly program your own with transaction functions.

Ben Kamphaus17:08:47

but composite uniqueness would preclude being able to rely on the default behavior for this case.


indeed 🙂 so is there any way to do what i'm trying to do ?


like... return the entity id or else create it in a single-threaded way? i'm worried about creating dozens of dupes as i'm going to be running this code on like 4 servers


@stuartsierra: hi 🙂 in the latest Cognicast, Craig mentioned your predilection for "decanting databases". it sounds like you've done this a couple times. i'm embarking on a rather large decanting of my own soon, and i wonder if you have any tips, or perhaps even generalised code that may be useful?


i.e. can the datomic function return the result of a query or does it only return data related to a transaction

Ben Kamphaus17:08:08

if this is basically a big import and you can provide a unique identifier from the domain or by pre-generating uuids for everything prior to import, the default unique identity upsert behavior gets you there for free.

Ben Kamphaus17:08:19

a transaction function (note this isn’t the only kind of database function but the typical one) returns transaction data that are then transacted on the transactor (provided it doesn’t throw an exception), but the results are standard transaction result maps. I.e. you can’t change the behavior of what happens on the other side.

Ben Kamphaus17:08:16

but you could define things like for example, attempt to create this thing it it doesn’t exist, throw an exception, rely on that exception on the peer to know that if you get/sync a database value after your attempted transaction you can get the entity via query.


ok... so perhaps instead of returning the entity id and then transacting with the id i should focus on doing the full transaction in the function ?


or... another way would be ... if i do call the thread-safe transact function, i can do a lookup directly after and it's guaranteed to be unique right ?

Ben Kamphaus17:08:50

yes though that implies a blocking deref on the transaction and inspecting the :db-after, which is fine but may slow down import logic considerably if you’re doing this e.g. on every typical transaction.


it would be like.... 20k times a day maybe ? tops


i'm not as worried about slowness as i am about my app crashing 😕

Ben Kamphaus17:08:36

My first pass (knowing nothing else of the domain) would probably be the transaction function that tries to transact the thing and if it already exists, aborts the transaction via exception, and then either uses A. tempid resolution for a successful transaction result and a query to find it if the transaction aborts, or possibly B. just query to find it on a database value after the transaction attempt (successful or not) since it should be there either way.


awesome... i think B sounds pretty straightforward... many thanks!


basically: find, or try: create-via-tx-fn, catch: find


(or (d/q ...) (try (d/transact ... [[:your-make-fn-which-first-also-does-the-d/q-thing ...]]) (catch ... (d/q)) (d/q ...))


you'd move the query bit to a function of its own to keep things DRY of course


distributed systems are hard 🙂


if i have multiple (pull) expressions inside of the :find clause in a query, is it possible for Datomic to not return nil if one of the pull queries doesn’t return anything?


no. nil is not a thing that datalog does at all


sorry, not nil, in this case just []


strangely enough the value the find returns is nil


sounds like a good candidate for breaking your code apart


i may not fully understand how you're getting an empty vec though


[:find (pull ?e […]) (pull ?e […]) :where [?e …]]


if one of those pulls doesn’t return any data, even if the other one does, the query will return [[nil]]


i'd put the pulls outside of d/q in a separate fn call


and deal just with ids in d/q


i don't know the answer to your actual question, though


what happens if you explicitly include :db/id in your pull expressions?


yeah, the underlying problem is a complex query for various data and metadata associated with certain entities, some of which can be potentially missing, and i still want to get all the data back, instead of constraining the result set


i feel like i might have to settle for making n separate queries


nothing wrong with separate queries


it's all in local memory anyway 🙂


yeah, maybe not the first request but perhaps it’s not such a big deal


it's a non issue; datalog is working with sorted sets of datoms in local memory, always


very often it's better to decouple things!


thanks! makes sense...


thanks for your help @bkamphaus... i went with solution B... it appears to work on


What is the S3 backup-uri format? I tried and <s3p://bucket-name>.


Ah, found it. Never mind 😛

Ben Kamphaus18:08:46

for reference, which is probably what you just found 🙂


Yes. Where backup-name is a folder or an actual backup?


Hmm.. Is it possible to use backup/restore to copy one DB to another? I tried, however, I got this exception:

java.lang.IllegalArgumentException: :restore/collision The database already exists under the name '...'

Ben Kamphaus18:08:27

Can’t copy one db to two different names in the underlying storage. You can overwrite a db by restoring to the same name, or restore that db to new name on a different storage.


Ah I see, thanks


[:find (min ?e) (max ?e)
 :in $
 :where [?e :some/attr “some-value"]]
does it make sense to interpret the two values returned above as the earliest entity associated with that value vs. the latest entity associated with that value, assuming that there are multiple entities that share that same attribute-value pair?

Ben Kamphaus19:08:38

@pheuter: leading part of entity id is from partition, so multiple partitions can break that strategy.

Ben Kamphaus19:08:10

I would bind the 4th position (tx) and use that if it’s what you mean specifically. Also note that unless the parameter $ is a history db, you won’t find the earliest association if it has since been retracted.


Good points, thanks for the heads up!


so my question is then how does it resolve aggregating multiple entities, and then for each entity multiple tx-entities?


if i do a max on the ?tx, will that be across all entities?

Ben Kamphaus19:08:57

If you do a (max ?tx) on [?e :some/attr "some-value” ?tx] it will be the most recent tx in which the datom matching the leading portion [?e :some/attr “some-value …] was asserted. (and is still true as of the most recent database value). If you pass a history db, it will be most recent tx to touch it (even a retraction) unless you also bind the 5th position to true, i.e. [?e :some/attr “some-value” ?tx true].


in my particular case i’m looking to use ?e in a subsequent :where clause to get a related entity, how can i know that i’m getting the entity associated with the latest tx?


basically, there are two entities, a and b. b has an attribute that’s a ref to a, and it’s possible to have multiple b entities that ref to the same a


given a, i’d like to get the latest transacted b that links to a


that’s the general problem

Ben Kamphaus19:08:11

I’m not sure I follow what you’re asking as it looks like your concern is covered. The where clause limits the results, so you only get the relation from entity to transaction constrained by the presence of that attribute and value, for the most recent transaction.

Ben Kamphaus19:08:18

:where [?b :some/ref ?a ?tx] aggregated on the max value of the ?tx returns that datom. I guess you could get a set of matches in the event that there are multiple b entities which assert :some/ref a-id in one transaction.

Ben Kamphaus19:08:20

Oh, grouping behavior.


what i need is something like: :where [?e :some/attr “some-value” (max ?tx-id)]


where ?e would represent the entity associated with the latest tx


right, it seems like a workaround now is to manually build a map of tx-ids to entity-ids, find the max, then get the entity-id associated with it

Ben Kamphaus19:08:09

if the grouping behavior runs afoul of what you need, as is the case here (just tested it), I would just return the ?e ?tx tuple and apply max-key second on the result.


that seems like what i need

Ben Kamphaus19:08:57

the aggregation in query always realizes the intermediate set in memory on the peer anyways, so it doesn’t save you any performance cost to avoid the seq manipulation, really.

Ben Kamphaus19:08:35

sorry for initial detour, forgot that ?e (max ?tx) only shows you max ?tx grouped by e, not what you wanted in this case. It’s also possible to use a subquery, if you’re stuck with REST API or don’t have clojure manipulations and don’t want to realize the whole thing in a query, but if you’re in clojure I’d stick with a single query and a sequence manipulation.


hello! how to recover from a connectexception? I wanted to create a db if there's none so I made the follow command

  (def conn (d/connect uri))
  (catch ConnectException e (d/create-database uri)))


(:import ( ConnectException)))


Show: Clojure Java REPL Tooling Duplicates All  (3 frames hidden)

3. Unhandled java.util.concurrent.ExecutionException
2. Caused by org.h2.jdbc.JdbcSQLException
1. Caused by
   Connection refused


@bkamphaus: thanks for the patience and help, makes a lot of sense now 🙂


oh I guess I need to start the transactor


I am having trouble connecting a third peer. We currently have a license for 10 peers. Two of the peers are being used by a staging and production server. I want to query the Datomic instance running in the cloud from the REPL. However, when I try and connect to my transactor running in the cloud from the REPL, I get clojure.lang.ExceptionInfo: Error communicating with HOST ... or ALT_HOST ... on PORT 4334. Both the staging and production servers are able to connect to the Datomic instance. Do I need to set a username and password locally somewhere or change a local license key?


I also cannot connect to the database from the shell on a server running in the cloud


in cases like these, network configuration is always my first stop. have you checked that the transactor is reachable, ports are open, etc?


It is reachable. Both my staging and production servers can connect to it


but is it reachable from the machine you're on?


I mean, obviously you can't connect with the peer library. But can you, say, telnet to it?


telnet ip 4334
Trying ip...
Connected to ip.
Escape character is '^]'.


alright, to be honest, that just about exhausts my advice. it's always the network for me 🙂


It would be nice if there was a different exception thrown if it was a peer problem or a network problem


Shutting down my staging server allows me to connect to the db from the REPL.


Is it possible there is an issue with the license?


that exception would really surprise me, if that's the case


not sure I can explain what you're seeing any other way, though


Hmm.. Will someone from the Datomic team see these messages or should I email them directly?


they're usually on here, but if you're paying, I'm pretty sure that comes with direct support


@kenny: Sent you a private message so we can get a support case going 🙂