This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-01-20
Channels
- # announcements (30)
- # babashka (118)
- # beginners (23)
- # calva (68)
- # cljdoc (10)
- # clojars (13)
- # clojure (90)
- # clojure-bangladesh (1)
- # clojure-europe (27)
- # clojure-gamedev (1)
- # clojure-nl (11)
- # clojure-uk (4)
- # clojurescript (59)
- # community-development (3)
- # cursive (13)
- # datomic (39)
- # defnpodcast (1)
- # emacs (10)
- # figwheel-main (1)
- # fulcro (18)
- # graalvm (21)
- # honeysql (1)
- # introduce-yourself (1)
- # juxt (1)
- # lsp (197)
- # malli (19)
- # off-topic (28)
- # practicalli (2)
- # re-frame (42)
- # reagent (4)
- # reitit (7)
- # releases (2)
- # sci (35)
- # shadow-cljs (13)
- # spacemacs (4)
- # vim (3)
another metaschema question... It seems like the metaschema data is not actually ETL'ing out of Datomic, rather querying Datomic directly - if this is the case, is there a way to specify a specific value of the db you want to query? so if I want to run a SQL query against yesterday's db state, for example..
Yes it is querying and no there is no way. They must be getting that feature request a lot though
Has anyone here had to "decant" a Datomic database to clear out some bad data in the tree?
If you can tolerate it being in history, just retraction should be fine. (If you canât, make sure youâre not falling into this trap https://vvvvalvalval.github.io/posts/2017-07-08-Datomic-this-is-not-the-history-youre-looking-for.html)
if excision isnât possible, then iterating and transforming the transaction log is your only option
Thanks @U09R86PA4, the angle here is not data privacy but rather performance. We've identified some nodes in our tree that have some unfortunately large string values (megabytes +) and we're observing some very high read and write impacts in ddb. I've read that excision does not necessarily help in this case. We do not need the history for business logic
but fortunately the decant should be relatively straightforward if you have a solid plan for putting those strings somewhere else
we have a rule in our database (checked by test suite) that every string must have an attribute predicate that limits length
When you had this issue, where do you see most of the impact? What we're seeing is periodic storage IO spikes that align with the transactor index rebuilds
Unfortunately it was really difficult for us to put our finger on it. It manifested as drag on the entire system
the biggest problem we had was fulltext indexes would occasionally produce huge index merges, but thatâs only for the fulltext-indexed values
thereâs no way to drop fulltext from an attribute, so we had to actually physically move those values to a different attribute
but we also saw unpredictable big index sizes, large uncacheable segments in memcached, and inspecting the segments often they were DirNodes > 1MB
we just stopped putting them to stop future writes. Excision at our scale would have been untenable
on the decant, I'm assuming you're using some global identifier to negotiate the entity IDs?
Yeah fortunately we did a decant ~2-3 years ago, the purpose of which was to renumber entity ids with partitioning for performance (I wasnât there). During that time all :db/id dependencies were shaken out of the code so itâs resilient to decants using ids we control ourselves
the decant itself would just keep track of the entity id mapping; we also injected an additional assertion (long-type not ref type) with the entity-id of the entity in the previous system, which was good for correlating later
do 100% of your entities have unique IDs? we have some obvious high level domain entities which get UUIDs, but there are certain things like a referenced list of settings that do not. Seem like those would need them, or should be refactored to be flattened onto the main entities
Something you might consider if you really donât need history is to copy every assertion at a time T, then do a partial decant from T to now. That might be better or worse depending on your circumstances
The âcopy assertions at time Tâ part is to throw away history. You get a smaller target db, itâs faster than replaying every tx, and you avoid having to deal with any weirdness in the distant past of your tx log. The partial decant is just to reduce downtime--whatever happened in the db while you were doing the bulk copy. If you can tolerate downtime you donât need it.