Fork me on GitHub
#datomic
<
2017-07-17
>
danielcompton00:07:15

@matan http://docs.datomic.com/monitoring.html#transactor-metrics peer metrics (a little further down the page) have some metrics for object cache hit rate and a few others. That might be an OK proxy for memory use of a single query

val_waeselynck07:07:56

@matan If I wanted to get a precise measurement, I'd just use any tool which can monitor the memory usage of a JVM method call

matan09:07:54

thanks @danielcompton! @val_waeselynck these tools are kind of notorious for being hard to configure, expensive or unsafe in production, but thanks anyway, I guess this is a general JVM thing, datomic just uses plain objects rather than manage its memory like e.g. spark does. I guess I'd spin up an extra peer node to do this kind of instrumentation on.

daveliepmann10:07:05

Count me as a +1 on the "Retract with no value supplied" feature request, nearly old enough for kindergarten: https://groups.google.com/d/msg/datomic/MgsbeFzBilQ/NBXjQEDRzk4J

devth17:07:54

trying to figure out what do you do when you need a lookup ref that's based on 2 attributes instead of 1.. not supported of course but need to work around.

devth17:07:18

can't make either :db.unique/value because the uniqueness is composite

devth17:07:33

add an extra composite attribute to set :db.unique/value that's the combination of the other 2 i guess.

matthavener17:07:16

is it possible to use :db/excise on an in-memory db?

matthavener17:07:16

ah, totally missed that, thank you

matan19:07:24

A twofold question about caching and pushing data: 1. Am in trouble if my database no longer fits in memory? will the peers constantly thrash or is it just a normal situation where one would rely on the most recent data being the most relevant data and the majority of the data seldom needing to sit in peers' memory?

matan19:07:07

And 2) I'm not sure I follow on the optional role of memcached, I mean datomic already caches data on the peers so why would it help much? what am I missing?

marshall19:07:59

@matan 1: generally you don’t have to worry about thrashing of the peer cache, most use cases don’t rely on having the full dataset in memory; you can also treat multiple peers as a heterogeneous set. if you segment your requests to various peers, each peer’s individual cache will then reflect the workloads it handles

marshall19:07:34

so, for instance, you can have one peer for ‘back office’ analytics, a separate peer (or set of peers) for your web app, and maybe a third set for batch processing

marshall19:07:23

you can get even fancier if you want to, for example, route customer traffic through a smart LB that can segment your incoming traffic to multiple peers (either by load or, even better, by something like customer ID)

marshall19:07:54

question 2: reads from memcached are an order of magnitude faster than reads from storage (in general)

marshall19:07:25

so, if a read is satisfied by a segment in memory you’re looking at ns to fetch it. memcached would be order of 1ms to fetch. a storage read is order of 10ms

marshall19:07:19

and if you end up having to do a storage read, you’re effectively now at the “same” latency as a traditional RDB, which always has to do a roundtrip

apsey20:07:49

Does anyone know about issues regarding storage when running Datomic backed by Postgres?

apsey20:07:29

We suddenly used our last 87gb of storage in only 4 days

marshall20:07:01

@apsey are you running gc-storage regularly?

apsey20:07:31

Peers didnt change significantly, but I was wondering if someone doing lots of queries could explain that?

marshall20:07:52

queries will not affect storage use at all

marshall20:07:00

only transactions

apsey20:07:12

No temporary tables are created?

apsey20:07:34

AFAIK, we run gc periodically (every week or every month, I will have to double-check this)

marshall20:07:40

no. Datomic only creates and uses a single table in postgres - the one you create with the setup scripts

marshall20:07:59

you may have to use postgres-level vacuum to reclaim space that is released by gc-storage

marshall20:07:24

likely depends on the version of PG you’re running and what your autovacuum settings are

apsey20:07:46

so, I checked if there was anything different regarding the amount of datoms, but the slope is the same

apsey20:07:59

the weirdes thing is this storage claim all out of sudden

marshall20:07:01

how many datoms?

apsey20:07:26

around 1 bi

marshall20:07:00

and is it possible another system is using the same postgres instance? you might want to check pg-level metrics (i.e. the stuff in the postgres internal catalog tables)

apsey20:07:57

write iops look stable in rds

danielcompton22:07:53

How do people here deal with schema migrations in dev, where you may want to choose a different schema approach after already transacting one? I'm using conformity which is nice, but doesn't have any way to "roll-back" the schema (because Datomic doesn't have this feature either).

val_waeselynck05:07:05

We do it all in memory, including using a fork of a dev-hosted production backup, so we pretty much never need to retract

danielcompton08:07:58

Can you explain this setup a bit more? If you make a bad schema migration, do you just restore from a backup?

val_waeselynck22:07:20

essentially, we're pretty-much always developing against either an mem conn (we call it 'lab'), or an in-memory fork (using Datomock) of a dev conn which is a recent enough backup of our production database ('dev-fork'); sometimes, in order to get the freshest data, we just use a fork of our production database directly ('prod-fork')

val_waeselynck22:07:04

So the connections we use for development are not durable; the only time we commit a migration durably is to production.

danielcompton22:07:35

hmm, I think the part I was missing was the in memory fork of a dev conn

val_waeselynck22:07:46

(The exception is to our staging environment, which is obtained by restoring backups of our production environment periodically)

val_waeselynck22:07:29

@danielcompton preaching for my own church here, but I believe it is a powerful tool indeed 🙂 https://github.com/vvvvalvalval/datomock

val_waeselynck22:07:05

Final tip: use a local memcache to have all these environments share a common cache on your machine, you may see pretty good speedups when going from one conn to the other

devth22:07:35

we drop and recreate the db in staging fairly often

danielcompton22:07:48

ah, I thought that might have been the case

danielcompton22:07:01

when you recreate, are you restoring from a backup, or from fixtures, or?

devth22:07:25

generators or importers

devth22:07:10

we use clojure spec to infer datomic schema via generators, so it's easy to use those generators to create test entities to play with

devth22:07:44

and importers are workers that are loading data in hourly, so just depends on what data we need for a feature that's being worked on

danielcompton22:07:16

importers are loading data in from the prod db?

devth22:07:27

no, external data sources (apis) in this case

hmaurer23:07:57

@marshall Are there any plans to allow for community-developed storage and backup implementations? (to support arbitrary storages and arbitrary backup targets)