Fork me on GitHub
#datomic
<
2017-05-30
>
misha11:05:05

greetings, very basic, or rather, fundamental question: what are tradeoffs of using "type as an attribute" vs. "type as an enum value (ident) of an attribute"?

{:some.event/subtype-foo? true
 :some.event/subtype-bar? false}
;; vs.
{:some.event/subtype :some.event.subtype/foo}

robert-stuttaford11:05:33

i prefer enum entities, @misha, because you can leverage datomic’s VAET index to find items-with-that-value very quickly

robert-stuttaford11:05:18

(map :e (d/datoms db :vaet (d/entid db [:db/ident :some.event.subtype/foo]))

robert-stuttaford11:05:50

disadvantage is with d/pull; you get {:db/ident :some.event.subtype/foo} instead of simply :some.event.subtype/foo as a value

robert-stuttaford11:05:06

however, a little extra code is worth paying for that perf benefit

robert-stuttaford11:05:34

important to note that Datalog et al will leverage VAET in this way as well

robert-stuttaford11:05:16

semantically, it boils down to the idea that that enum value is reified as an entity in its own right, and you can leverage that

misha11:05:48

so far "a little extra code" is all I see in my project, and started to question whether I should prefer "flat is better than nested".

misha11:05:42

can't I "leverage datomic’s VAET AVET index to find items-with-that- value attribute very quickly" though?

misha11:05:54

in my first example, :some.event/subtype-bar? false would actually be :some.event/subtype-bar? nil, and pretty much absent, resulting in just {:some.event/subtype-foo? true}

misha11:05:02

(although, absence of other "subtype" attributes will not be enforced as in :db/ident case, which might yield false-positives in AVET-results)

misha11:05:46

also saving precious datoms out of that 100B datoms limit :)

robert-stuttaford12:05:28

i think you know enough to make a value judgment for yourself 🙂

robert-stuttaford12:05:55

actually, with idents, you’re storing N longs and 1 value. with flat values, you’re re-storing that value N times, which is probably ok for bools, but maybe not so ok for keyword or string values?

robert-stuttaford12:05:44

i suppose Rich’s advice holds true here : “why worry, when you can measure”

kschrader13:05:50

is the current limit 100B datoms or 10B datoms?

mpenet13:05:05

I think 10, but I believe you could shard using multiple dbs

dominicm14:05:00

No, it's 100B

dominicm14:05:04

100B with planning.

dominicm14:05:13

If you're going over 10B, call @marshall

dominicm14:05:52

If you're putting more than 100B datoms in datomic, just don't - Stu Halloway ^^ my favourite quote from the day of datomic

misha14:05:34

@robert-stuttaford ofc, datoms comment is absolutely irrelevant.

Petrus Theron15:05:59

How to pull :db/txInstant in Datalog query so I can sort entities by transaction insertion date?

Petrus Theron15:05:30

This seems to work: [:find ?tz (pull ?e [*]) :where [?e :variant/name _ ?tx] [?tx :db/txInstant ?tz]], but is there a way to make it part of the (pull...) result?

robert-stuttaford15:05:29

@petrus nope. think about it. entities are not related to transactions via an “A”

robert-stuttaford15:05:43

pull walks EAV, it doesn’t walk T

Petrus Theron15:05:57

Isn't :db/txInstant an A?

robert-stuttaford15:05:03

of the transaction entity, yes

robert-stuttaford15:05:21

but the relation between the datom and the transaction doesn’t follow an A

robert-stuttaford15:05:46

again, think about it. where would you put it? you have different T values for each EAV combination on the same E

robert-stuttaford15:05:59

how would you model that in a pull result, which is nested maps?

Petrus Theron15:05:10

I see - and pull only operates on one entity? So this works: :find (pull ?e [*]) (pull ?tx [:db/txInstant]) ... but they are from separate entities

robert-stuttaford15:05:26

you can do that yes

Petrus Theron15:05:43

(but not necessary, can just :find ?tz)

Petrus Theron15:05:25

I find it awkward to work with vectors coming from Datomic. Order is complex. Wish I could get back a hash-map, i.e :find {:entity (pull ?e [*]) :time ?tz} :where ...

favila15:05:54

pull can only follow attribute references to other entities. the link between an e+a+v and its tx does not follow an attribute reference

robert-stuttaford15:05:35

(defn entity-attribute-date [db e attr]
  (let [eavt (comp first #(d/datoms db :eavt %))]
    (-> (eavt (id e) (d/entid db attr))
        :tx
        (eavt (d/entid db :db/txInstant))
        :v)))

favila15:05:48

tx is out-of-band info, as it were

robert-stuttaford15:05:55

this code assumes a cardinality/one attr

Petrus Theron15:05:43

@robert-stuttaford what is symbol id in that snippet?

Petrus Theron15:05:07

@robert-stuttaford is that (comp first #(d/datoms ...)) right? I'm getting clojure.lang.ArityException: Wrong number of args (2) passed to: handler/entity-attribute-date/fn--43720 when I try your entity-attribute-date fn in the REPL

robert-stuttaford17:05:18

it’s right; we use it all the time

Lone Ranger21:05:30

I'm doing some back of the envelope math about whether or not a particular problem will "fit" into a datomic DB (without having to call marshall (as in, am I going over 10B datoms))... hypothetically if I put in, say, 1B datoms (let's say I use apprioprate pipelining, respect transactor limits, etc) is that what it means by 1B datoms or does it mean X datoms + Y datoms created by datomic for internal purposes == 1B datoms?

Lone Ranger21:05:23

another way of phrasing the question would be, if I transact X datoms, is there a rule about how many Y datoms are created by datomic (and is that something I should worry about when considering my calculations)?

eggsyntax21:05:09

Silly question: does Datomic need to be installed and/or running in order to run an app that uses Datomic with an in-mem DB? Trying to do some troubleshooting, and I've never actually tried running it on a box that didn't have Datomic installed.

Lone Ranger21:05:13

@eggsyntax are you talking about proper datomic or DataScript (for front end clojurescript apps?)

eggsyntax21:05:05

Actual Datomic. We typically run in staging with a connection to another server that provides the DB, but we're just doing a quick experiment with running in-mem on the staging box.

Lone Ranger21:05:31

I love those "quick" experiments 😂

eggsyntax21:05:59

Heh. Part of the troubleshooting process...

misha21:05:09

afair, in-mem just means no persistent storage, but you still need to run transactor to write into mem. I might be wrong, so many things changed in 20 months

eggsyntax21:05:26

I know. Thanks @misha, appreciate it.

Lone Ranger21:05:08

I feel like I've run some demo apps that have somehow created an in-memory datomic DB without me starting my transactor, but I might be mistaken

Lone Ranger21:05:55

I think the catalysis demo does it (maybe) but I'm still picking through the codebase to figure out how it works

eggsyntax21:05:11

Cool, I'll look into that a bit.

Lone Ranger21:05:28

fair warning: it won't make your "quick" experiment any quicker 😂

eggsyntax21:05:42

Possibly not 🙂

misha21:05:46

I might mix up mem and disc, as in "w/o extra sql storage"

Lone Ranger21:05:08

EDIT: (whoops put wrong one fixed)

Lone Ranger21:05:41

has anyone here done this? "Web after tomorrow" style syncing client-side datascript-style datomic-db with server-side datomic-db?

misha21:05:05

I am doing something with datascript-cache/datascript/datomic, but there is so much on my plate, that it is fair to say "I just started"

misha21:05:54

you might want to go through precursor-app's source, as guys use datomic/datascript. I haven't :)

favila21:05:00

@misha @eggsyntax in memory datomic DBs do not require a transactor

eggsyntax21:05:28

@favila cool, thanks 🙂

favila21:05:49

@eggsyntax @misha you may be thinking of "dev" or "free" storages, which do require transactors

favila21:05:57

"mem" requires nothing

misha21:05:25

most certainly, yes

eggsyntax21:05:55

@favila and just to get total clarification: Datomic doesn't even need to be installed to run an app with an in-mem DB?

favila21:05:18

I don't know what "installed" means

favila21:05:37

You need datomic peer api in your classpath

favila21:05:46

of whatever process wants an in-mem db

misha21:05:58

@goomba catalysis is too scary for me at the moment. At this point I am much more comfortable reinventing my own wheel

Lone Ranger22:05:49

@misha I feel you on that. It's a pretty clever piece of engineering but unfortunately I don't understand enough of the API and the alpha state of it means not suitable for production unless I make my own changes

Lone Ranger22:05:33

another way of phrasing the question (that I re-spammed below) would be, if I transact X datoms, is there a rule about how many Y datoms are created by datomic (and is that something I should worry about when considering my calculations)? (sorry to spam this just afraid might have gotten burried)

misha22:05:43

I am not even sure what would be declared "suitable for production" at this point. Therefore I can't even evaluate it to a certain degree

Lone Ranger22:05:28

"suitable for production" means I'm not going to have to worry about getting a call in the middle of the night that our servers are down because API changed/dependencies broke etc etc 😅

favila22:05:41

@goomba I think "10 billion datoms" refers to number of unique datoms, not number of datoms counted by all indexes

favila22:05:20

The soft limit of 10 billion is because the index structure segments get big

favila22:05:33

so having a datom in multiple indexes doesn't matter

favila22:05:49

its how many are in a single index that matters

favila22:05:41

I'm sure partitioning and/or creating entites in a specific order that increases read locality would significantly aid performance at large numbers of datoms

Lone Ranger22:05:11

okay... so when doing capacity planning I shouldn't worry about the indexes datomic creates and just worry about the ones I create?

favila22:05:15

well, not exactly

favila22:05:35

eavt and aevt are created for every datom

favila22:05:55

vaet is created for datoms with ref attributes

favila22:05:10

avet is created for datoms with index=true

favila22:05:37

and a fulltext index is created for fulltext=true

favila22:05:09

the presence/absence of these indexes affects storage space and query speed

favila22:05:02

noHistory=true is another consideration, since changes to those e+a datoms are not stored

Lone Ranger22:05:44

well... might have to do some experimenting and see how it goes

Lone Ranger22:05:04

I know for my use case the data will never change and don't need full text but query power is a must

favila22:05:37

import a % of your db, backup-db, measure bytes and multiply

favila22:05:49

that is the smallest amount of storage you will need

Lone Ranger22:05:07

storage isn't an issue, just performance

Lone Ranger22:05:38

and also development ease. don't want to shard the hell out of the DB if I don't have to

favila22:05:55

The biggest predictor of perf in my exp is datom read locality

Lone Ranger22:05:10

has cognitect gotten you a job application yet @favila ? 😄

favila22:05:22

if you have a static dataset and can predict locality, you can customize your import process to maximize locality

favila22:05:36

this will improve peer performance significantly

Lone Ranger22:05:54

locality as in data locality or geographic locality?

favila22:05:16

data locality

favila22:05:52

i.e., when you read from an index, you want your reads to cluster together

favila22:05:34

this means the peer doesn't need to pull+decompress as many index segments

favila22:05:05

partitions were a feature designed to aid locality