Fork me on GitHub

greetings, very basic, or rather, fundamental question: what are tradeoffs of using "type as an attribute" vs. "type as an enum value (ident) of an attribute"?

{:some.event/subtype-foo? true
 :some.event/subtype-bar? false}
;; vs.
{:some.event/subtype :some.event.subtype/foo}


i prefer enum entities, @misha, because you can leverage datomic’s VAET index to find items-with-that-value very quickly


(map :e (d/datoms db :vaet (d/entid db [:db/ident :some.event.subtype/foo]))


disadvantage is with d/pull; you get {:db/ident :some.event.subtype/foo} instead of simply :some.event.subtype/foo as a value


however, a little extra code is worth paying for that perf benefit


important to note that Datalog et al will leverage VAET in this way as well


semantically, it boils down to the idea that that enum value is reified as an entity in its own right, and you can leverage that


so far "a little extra code" is all I see in my project, and started to question whether I should prefer "flat is better than nested".


can't I "leverage datomic’s VAET AVET index to find items-with-that- value attribute very quickly" though?


in my first example, :some.event/subtype-bar? false would actually be :some.event/subtype-bar? nil, and pretty much absent, resulting in just {:some.event/subtype-foo? true}


(although, absence of other "subtype" attributes will not be enforced as in :db/ident case, which might yield false-positives in AVET-results)


also saving precious datoms out of that 100B datoms limit :)


i think you know enough to make a value judgment for yourself 🙂


actually, with idents, you’re storing N longs and 1 value. with flat values, you’re re-storing that value N times, which is probably ok for bools, but maybe not so ok for keyword or string values?


i suppose Rich’s advice holds true here : “why worry, when you can measure”


is the current limit 100B datoms or 10B datoms?


I think 10, but I believe you could shard using multiple dbs


No, it's 100B


100B with planning.


If you're going over 10B, call @marshall


If you're putting more than 100B datoms in datomic, just don't - Stu Halloway ^^ my favourite quote from the day of datomic


@robert-stuttaford ofc, datoms comment is absolutely irrelevant.

Petrus Theron15:05:59

How to pull :db/txInstant in Datalog query so I can sort entities by transaction insertion date?

Petrus Theron15:05:30

This seems to work: [:find ?tz (pull ?e [*]) :where [?e :variant/name _ ?tx] [?tx :db/txInstant ?tz]], but is there a way to make it part of the (pull...) result?


@petrus nope. think about it. entities are not related to transactions via an “A”


pull walks EAV, it doesn’t walk T

Petrus Theron15:05:57

Isn't :db/txInstant an A?


of the transaction entity, yes


but the relation between the datom and the transaction doesn’t follow an A


again, think about it. where would you put it? you have different T values for each EAV combination on the same E


how would you model that in a pull result, which is nested maps?

Petrus Theron15:05:10

I see - and pull only operates on one entity? So this works: :find (pull ?e [*]) (pull ?tx [:db/txInstant]) ... but they are from separate entities


you can do that yes

Petrus Theron15:05:43

(but not necessary, can just :find ?tz)

Petrus Theron15:05:25

I find it awkward to work with vectors coming from Datomic. Order is complex. Wish I could get back a hash-map, i.e :find {:entity (pull ?e [*]) :time ?tz} :where ...


pull can only follow attribute references to other entities. the link between an e+a+v and its tx does not follow an attribute reference


(defn entity-attribute-date [db e attr]
  (let [eavt (comp first #(d/datoms db :eavt %))]
    (-> (eavt (id e) (d/entid db attr))
        (eavt (d/entid db :db/txInstant))


tx is out-of-band info, as it were


this code assumes a cardinality/one attr

Petrus Theron15:05:43

@robert-stuttaford what is symbol id in that snippet?

Petrus Theron15:05:07

@robert-stuttaford is that (comp first #(d/datoms ...)) right? I'm getting clojure.lang.ArityException: Wrong number of args (2) passed to: handler/entity-attribute-date/fn--43720 when I try your entity-attribute-date fn in the REPL


it’s right; we use it all the time

Lone Ranger21:05:30

I'm doing some back of the envelope math about whether or not a particular problem will "fit" into a datomic DB (without having to call marshall (as in, am I going over 10B datoms))... hypothetically if I put in, say, 1B datoms (let's say I use apprioprate pipelining, respect transactor limits, etc) is that what it means by 1B datoms or does it mean X datoms + Y datoms created by datomic for internal purposes == 1B datoms?

Lone Ranger21:05:23

another way of phrasing the question would be, if I transact X datoms, is there a rule about how many Y datoms are created by datomic (and is that something I should worry about when considering my calculations)?


Silly question: does Datomic need to be installed and/or running in order to run an app that uses Datomic with an in-mem DB? Trying to do some troubleshooting, and I've never actually tried running it on a box that didn't have Datomic installed.

Lone Ranger21:05:13

@eggsyntax are you talking about proper datomic or DataScript (for front end clojurescript apps?)


Actual Datomic. We typically run in staging with a connection to another server that provides the DB, but we're just doing a quick experiment with running in-mem on the staging box.

Lone Ranger21:05:31

I love those "quick" experiments 😂


Heh. Part of the troubleshooting process...


afair, in-mem just means no persistent storage, but you still need to run transactor to write into mem. I might be wrong, so many things changed in 20 months


I know. Thanks @misha, appreciate it.

Lone Ranger21:05:08

I feel like I've run some demo apps that have somehow created an in-memory datomic DB without me starting my transactor, but I might be mistaken

Lone Ranger21:05:55

I think the catalysis demo does it (maybe) but I'm still picking through the codebase to figure out how it works


Cool, I'll look into that a bit.

Lone Ranger21:05:28

fair warning: it won't make your "quick" experiment any quicker 😂


Possibly not 🙂


I might mix up mem and disc, as in "w/o extra sql storage"

Lone Ranger21:05:08

EDIT: (whoops put wrong one fixed)

Lone Ranger21:05:41

has anyone here done this? "Web after tomorrow" style syncing client-side datascript-style datomic-db with server-side datomic-db?


I am doing something with datascript-cache/datascript/datomic, but there is so much on my plate, that it is fair to say "I just started"


you might want to go through precursor-app's source, as guys use datomic/datascript. I haven't :)


@misha @eggsyntax in memory datomic DBs do not require a transactor


@favila cool, thanks 🙂


@eggsyntax @misha you may be thinking of "dev" or "free" storages, which do require transactors


"mem" requires nothing


most certainly, yes


@favila and just to get total clarification: Datomic doesn't even need to be installed to run an app with an in-mem DB?


I don't know what "installed" means


You need datomic peer api in your classpath


of whatever process wants an in-mem db


@goomba catalysis is too scary for me at the moment. At this point I am much more comfortable reinventing my own wheel

Lone Ranger22:05:49

@misha I feel you on that. It's a pretty clever piece of engineering but unfortunately I don't understand enough of the API and the alpha state of it means not suitable for production unless I make my own changes

Lone Ranger22:05:33

another way of phrasing the question (that I re-spammed below) would be, if I transact X datoms, is there a rule about how many Y datoms are created by datomic (and is that something I should worry about when considering my calculations)? (sorry to spam this just afraid might have gotten burried)


I am not even sure what would be declared "suitable for production" at this point. Therefore I can't even evaluate it to a certain degree

Lone Ranger22:05:28

"suitable for production" means I'm not going to have to worry about getting a call in the middle of the night that our servers are down because API changed/dependencies broke etc etc 😅


@goomba I think "10 billion datoms" refers to number of unique datoms, not number of datoms counted by all indexes


The soft limit of 10 billion is because the index structure segments get big


so having a datom in multiple indexes doesn't matter


its how many are in a single index that matters


I'm sure partitioning and/or creating entites in a specific order that increases read locality would significantly aid performance at large numbers of datoms

Lone Ranger22:05:11

okay... so when doing capacity planning I shouldn't worry about the indexes datomic creates and just worry about the ones I create?


well, not exactly


eavt and aevt are created for every datom


vaet is created for datoms with ref attributes


avet is created for datoms with index=true


and a fulltext index is created for fulltext=true


the presence/absence of these indexes affects storage space and query speed


noHistory=true is another consideration, since changes to those e+a datoms are not stored

Lone Ranger22:05:44

well... might have to do some experimenting and see how it goes

Lone Ranger22:05:04

I know for my use case the data will never change and don't need full text but query power is a must


import a % of your db, backup-db, measure bytes and multiply


that is the smallest amount of storage you will need

Lone Ranger22:05:07

storage isn't an issue, just performance

Lone Ranger22:05:38

and also development ease. don't want to shard the hell out of the DB if I don't have to


The biggest predictor of perf in my exp is datom read locality

Lone Ranger22:05:10

has cognitect gotten you a job application yet @favila ? 😄


if you have a static dataset and can predict locality, you can customize your import process to maximize locality


this will improve peer performance significantly

Lone Ranger22:05:54

locality as in data locality or geographic locality?


data locality


i.e., when you read from an index, you want your reads to cluster together


this means the peer doesn't need to pull+decompress as many index segments


partitions were a feature designed to aid locality