Fork me on GitHub

@matan peer metrics (a little further down the page) have some metrics for object cache hit rate and a few others. That might be an OK proxy for memory use of a single query


@matan If I wanted to get a precise measurement, I'd just use any tool which can monitor the memory usage of a JVM method call


thanks @danielcompton! @val_waeselynck these tools are kind of notorious for being hard to configure, expensive or unsafe in production, but thanks anyway, I guess this is a general JVM thing, datomic just uses plain objects rather than manage its memory like e.g. spark does. I guess I'd spin up an extra peer node to do this kind of instrumentation on.


Count me as a +1 on the "Retract with no value supplied" feature request, nearly old enough for kindergarten:


trying to figure out what do you do when you need a lookup ref that's based on 2 attributes instead of 1.. not supported of course but need to work around.


can't make either :db.unique/value because the uniqueness is composite


add an extra composite attribute to set :db.unique/value that's the combination of the other 2 i guess.


is it possible to use :db/excise on an in-memory db?


ah, totally missed that, thank you


A twofold question about caching and pushing data: 1. Am in trouble if my database no longer fits in memory? will the peers constantly thrash or is it just a normal situation where one would rely on the most recent data being the most relevant data and the majority of the data seldom needing to sit in peers' memory?


And 2) I'm not sure I follow on the optional role of memcached, I mean datomic already caches data on the peers so why would it help much? what am I missing?


@matan 1: generally you don’t have to worry about thrashing of the peer cache, most use cases don’t rely on having the full dataset in memory; you can also treat multiple peers as a heterogeneous set. if you segment your requests to various peers, each peer’s individual cache will then reflect the workloads it handles


so, for instance, you can have one peer for ‘back office’ analytics, a separate peer (or set of peers) for your web app, and maybe a third set for batch processing


you can get even fancier if you want to, for example, route customer traffic through a smart LB that can segment your incoming traffic to multiple peers (either by load or, even better, by something like customer ID)


question 2: reads from memcached are an order of magnitude faster than reads from storage (in general)


so, if a read is satisfied by a segment in memory you’re looking at ns to fetch it. memcached would be order of 1ms to fetch. a storage read is order of 10ms


and if you end up having to do a storage read, you’re effectively now at the “same” latency as a traditional RDB, which always has to do a roundtrip


Does anyone know about issues regarding storage when running Datomic backed by Postgres?


We suddenly used our last 87gb of storage in only 4 days


@apsey are you running gc-storage regularly?


Peers didnt change significantly, but I was wondering if someone doing lots of queries could explain that?


queries will not affect storage use at all


only transactions


No temporary tables are created?


AFAIK, we run gc periodically (every week or every month, I will have to double-check this)


no. Datomic only creates and uses a single table in postgres - the one you create with the setup scripts


you may have to use postgres-level vacuum to reclaim space that is released by gc-storage


likely depends on the version of PG you’re running and what your autovacuum settings are


so, I checked if there was anything different regarding the amount of datoms, but the slope is the same


the weirdes thing is this storage claim all out of sudden


how many datoms?


around 1 bi


and is it possible another system is using the same postgres instance? you might want to check pg-level metrics (i.e. the stuff in the postgres internal catalog tables)


write iops look stable in rds


How do people here deal with schema migrations in dev, where you may want to choose a different schema approach after already transacting one? I'm using conformity which is nice, but doesn't have any way to "roll-back" the schema (because Datomic doesn't have this feature either).


We do it all in memory, including using a fork of a dev-hosted production backup, so we pretty much never need to retract


Can you explain this setup a bit more? If you make a bad schema migration, do you just restore from a backup?


essentially, we're pretty-much always developing against either an mem conn (we call it 'lab'), or an in-memory fork (using Datomock) of a dev conn which is a recent enough backup of our production database ('dev-fork'); sometimes, in order to get the freshest data, we just use a fork of our production database directly ('prod-fork')


So the connections we use for development are not durable; the only time we commit a migration durably is to production.


hmm, I think the part I was missing was the in memory fork of a dev conn


(The exception is to our staging environment, which is obtained by restoring backups of our production environment periodically)


@danielcompton preaching for my own church here, but I believe it is a powerful tool indeed 🙂


Final tip: use a local memcache to have all these environments share a common cache on your machine, you may see pretty good speedups when going from one conn to the other


we drop and recreate the db in staging fairly often


ah, I thought that might have been the case


when you recreate, are you restoring from a backup, or from fixtures, or?


generators or importers


we use clojure spec to infer datomic schema via generators, so it's easy to use those generators to create test entities to play with


and importers are workers that are loading data in hourly, so just depends on what data we need for a feature that's being worked on


importers are loading data in from the prod db?


no, external data sources (apis) in this case


@marshall Are there any plans to allow for community-developed storage and backup implementations? (to support arbitrary storages and arbitrary backup targets)