Fork me on GitHub
#datomic
<
2018-07-30
>
curtosis19:07:37

I’m working on adding a list “tagged values” as a value and am looking for advice on how to structure it most “datomically”. Essentially, the tagged value entity has tagged-value/key and tagged-value/value, both Strings.

curtosis19:07:02

I hypothetically want to be able to search by key or value, but they’re otherwise essentially freeform. They’re not logically components of the parent object.

curtosis19:07:26

So far I’ve thought of either a) making them all unique — “shared” only by key equality, or b) building them as full-fledged entities, possibly via a tx-fn that looks for an existing one.

curtosis19:07:44

Anyone solved something similar, or have suggestions for what will bite me least downstream?

Mark Addleman20:07:55

Hi - We have a similar requirement. We have a “key” attribute and 6 “value” attributes (string-value, keyword-value, boolean-value, etc) and a value-type attribute which tells us which value type was written

Mark Addleman20:07:46

we then wrote a rule named kv-value which makes querying against that data structure not too painful

curtosis21:07:03

ah, that makes sense

curtosis21:07:23

do you create each instance as a distinct entity and then link by value in your kv-value?

Mark Addleman21:07:40

yes. we have a parent entity which has a cardinality/many ref to the individual key - value pairs

curtosis12:07:33

Super, that’s exactly how I have it modeled. Thanks!

👍 4
eraserhd20:07:17

Well, it's probably not true, but just in case, first consider that your keys are really attributes. You can add a :mything/is-tagged? true attribute to tag attributes.

Mark Addleman20:07:07

I have a performance monitoring question: In our workload, we transact 200k datoms every 30 minutes. We expect this to increase over time. I wouldn’t be surprised if we are in a position to transact 2 million datoms every 30 minutes in a month or two. My understanding of the Datomic Cloud transact pipeline is that clients transact data into a single node. The transacting node must perform some CPU operation on the tx-data and then it writes into EFS, S3 and DynamoDB simultaneously. The transact operation complete when the transacting node is finished writing to all three storage surfaces. Is my understanding correct?

curtosis21:07:28

@eraserhd yeah, that would be the easy solution, but the idea here is specifically to capture user-defined “stuff” that’s outside the formal schema attributes.

johnj22:07:52

@mark340 curious, at ~110 writes per second ( your current load), how is datomic handling it? That already seems too much for a single db.

Mark Addleman22:07:05

After a little back and forth with Datomic support (great dealing with them, BTW!), it’s handling it just fine. The trick was to raise our DynamoDB provisioning to 250 write units.

Mark Addleman22:07:35

I’m still not sure I understand the performance relationship between the Datomic log, EFS, S3 and DynamoDB. I have a question into Support about it and I expect an answer soon.

Mark Addleman22:07:53

I was hoping for an answer from the community as well 🙂

johnj22:07:18

Ok, nice, I haven't try the cloud stuff, but itsn't it supposed to scale automatically for you for stuff like write units?

johnj22:07:15

also, is this for a single db ?

Mark Addleman22:07:00

it will scale automatically within limits that you get to set. the default limits are pretty good to get started and can be easily changed

Mark Addleman22:07:11

yes, it is for a single db.

Mark Addleman22:07:29

fyi - in Datomic Cloud, you you are still limited to a single transactor per database but you can have multiple databases and thus multiple transactor nodes. we might end up scaling that way but, as it stands right now, the single transactor seems be to holding up

👍 4
johnj22:07:21

will you be able to single query to multiple databases if you scale up to that? or you don't need that?

Mark Addleman23:07:30

yes, datalog allows you to query multiple databases without much difficulty. i don’t know the performance implications of it, though

octahedrion09:07:07

but Datomic Cloud doesn't support multiple dbs for query yet does it ?

Mark Addleman13:07:22

I don’t know if Cloud supports it yet. I don’t remember reading about the limitation in the docs

octahedrion14:07:12

last time I tried it didn't work

Mark Addleman23:07:08

Oh, interesting. I’ll add it to my list of things to investigate 🙂

bmabey22:07:39

Does anyone know if the recommended "10 billion datum" limit has increased or if datomic cloud changes this? Trying to figure out if datomic will scale to our problem.

Mark Addleman23:07:59

I had a similar question a couple of weeks ago. My memory of the answer: Cognitect tests up to 10 billion datoms. Performance implications vary widely based on structure of your data.

Mark Addleman23:07:41

The 10 billion is not a limit

jeroenvandijk08:07:58

Yeah +1. We have reached 40 billion or so with on-premise. We do see increasing issues with transactor timeouts at this level. And we haven't tried to fully understand if we can circumvent these issues. Part of the reason might be that we also have extra indices on the biggest part of these datoms. For us it's no big of an issue as we solve the problem by starting with a fresh db when the problems get too bad (or the dynamodb costs too high). I would like to spent some more time to figure out how to scale properly to this amount some day though

👍 4
jd-white15:07:32

💪 Thanks guys! This helps greatly with our capacity planning.