Fork me on GitHub
#datomic
<
2022-06-06
>
pieterbreed10:06:04

Hi all 👋 - I'm looking for guidance on setting up datomic cloud, esp in environments where prod and dev are separated into different AWS accounts. Anybody here have something for me to look at, or wanting to offer advice?

Daniel Jomphe13:06:02

👋:skin-tone-3: Hi Pieter, that's our setup here too. It's very easy because each account is isolated; therefore it's very easy to bring symmetry to your tooling. When you'll have specific questions, I invite you to post them all separately in the channel (just like you did about the GitHub Action), and some of us will definitely be happy to step in to provide advice.

👍 1
pieterbreed13:06:39

Also - I found this github action in the marketplace for deploying ions (https://github.com/marketplace/actions/datomic-ions-deploy) Is there somebody here that uses it and can vouch for it?

Daniel Jomphe13:06:47

I don't remember finding this one a few years ago when we started with Datomic Cloud and Github Actions. They use a custom Docker image that they own. Here I preferred to follow GH's advice to keep our Actions in their syntax if possible, to benefit from quicker actions (if I remember well). This adds a small learning curve about GH's yaml config, though. If you already configure your developers with Docker, you might profitably go ahead with a Dockerfile in GH Actions. (We do, but with VS Code dev containers.) What we ended up doing is creating ourselves a wrapper script to handle our deployments from our local machines, and then call that same script from the GH Action, using GH Secrets to authorize those calls. We needed to do quite an amount of AWS IAM config to authorize GH's machines to do that, though. And we needed to repeat it for each one of our AWS accounts since each one of our Datomic Cloud environments is in a distinct AWS account.

pieterbreed13:06:06

OK, thank you for the feedback. I guess I have one more question related to this: I am used to the idea of "promoting" artifacts between environments. I'm curious how you guys set up your flow. Specifically, if you deploy from github how do you decide which branches/tags go to which environments?

pieterbreed13:06:36

(From what I understand of datomic cloud; the artifact is part of the system, so is stored in the same AWS account as the ions, thereby making "promoting" of artifacts difficult)

Daniel Jomphe13:06:10

Yeah, we don't store artifacts to test, vet and promote. We start a new build for each environment/account So one could say we test-vet-promote git commits. • development branch is auto-deployed to an env • development branch is nightly-deployed to another env • main branch is auto-deployed to main env (after merges from development branch)

pieterbreed13:06:46

awesome, thank you! :thumbsup:

1
Daniel Jomphe13:06:50

If you want to eliminate builds, you could copy from the S3 code bucket of Datomic in one account to the other one, I suppose, but this is clearly out of supported territory. You'd need to make sure you understand how what you do plays in AWS Code Build territory. But I think it's workable and completely possible to make it practical.

Daniel Jomphe14:06:33

From my perspective, Datomic Cloud's tools do just that, starting from a git repo. Connect them to this or that AWS account, and they perform the copy themselves, if you look at it this way.

pieterbreed14:06:08

Yeah - that was my understanding too. Every combination of tools and service providers are every-so-slightly different, and hearing you describe your setup affirmed that 1) I wasn't going off track (much) and 2) the solution I have in mind is at least workable.

Kris C14:06:04

Anyone of you using Datomic in production? Could you please share your experiences? Primarily interested in negative experiences, since I came upon this comment on HN:

I went to a Clojure meetup one time and they all went on about how using Datomic in production is a nightmare and it's generally an over-engineered product that isn't worth the trouble in the end. Do most people who have dealt with Datomic in production feel this way?
We have adopted Datomic (on-prem) for a project and, so far, I really like it a lot, but I want to prepare myself for any future problems...

👀 1
favila15:06:55

The biggest gotchas I’ve had (on-prem) are all operational; “over-engineered” is definitely not what I would call it. A big strength of datomic is also a big headache: it’s difficult to get rid of stuff (bad schema, too-large values, mis-partitioned data, too much data, etc) as you scale; network consumption and storage consumption and object-cache locality from index churn (or just sheer volume of data) become big problems and cost drivers that can’t be solved easily. Often you have to essentially start over (i.e. decant), which is not a casual operation.

dvingo15:06:22

I've worked with Datomic cloud at two jobs and overall experience was not positive. Major cons: • closed source - your hands are tied when trying to solve your own problems ◦ all you can do is reach out for support which in the places I've worked the feedback cycles were very slow (multiple days) • no introspection to the query engine ◦ any query performance problems are incredibly difficult to analyze because you have no data - you have to guess and check ◦ related to this: clause order matters in the version of datalog used by datomic - thus your query performance may depend (very significantly) on clause order - and because it's not just clause order but number of datoms that match each clause that affect performance, you can see a query that is performant today become slow as the distribution of datoms changes - this means you have to worry about this all the time when writing queries as a sort of low-level background paranoia • This is not really about the design per-se but perhaps the marketing and messaging: ◦ I've come to believe that the history API is best to be avoided for application features - it is wonderful for operational insights and post-hoc investigations of provenance, but because all you have is transaction time and not a concept of "valid time", you're screwed if you want to, for example, migrate a DB using a tx import (your tx times will be mutated) Those are the really big ones. The wonderful gift that datomic brought to us (well me at least) was reviving datalog as a query language combined with the attribute model of representing information. I am completely confused by the closed source nature of datomic, especially when there are lots of examples of a dual open/closed setup (mongo, neo4j, cockroachdb). At this point though, we have XTDB, datalevin, asami, and datahike, which provides us with lots of (open source and gratis) options to utilize datalog and attribute modeling in our software

3
favila16:06:29

Things I wish someone told me years ago (on-prem-specific to some degree): • do not ever put any largish string into datomic (4k is the largest I would ever contemplate, preferably much shorter), and especially do not fulltext index them. (Maybe don’t fulltext index anything, because you can’t drop it!) These are hard to get rid of once you put them in. • Pay attention to identifier schemes and entity id partitioning to increase data locality, it will save you later. • Pay attention to routing in a partition-aware way to increase peer locality. • Do not rely on history features for customer-facing functionality on a long timescale: materialize your history (the problem here is being tied to old schema and “fixing the past”). • Have a plan for regular (but targeted) excision to control history size once storage gets expensive. (this may take years, though)--not all attributes have equal churn, and the value of history often decays over time and even becomes a liability (cost, compliance, exposure to breaches, etc). • Avoid d/entity (prefer query or pull) unless you know what you are doing. • Use attribute predicates and entity predicates early on. • Think carefully about how you design your transactions for data races--any dependent read in a peer is a race waiting to happen, and datomic doesn’t have many tools out of the box for managing this. This is a warning especially for those used to traditional single-process, blocking-transaction databases (i.e. any SQL db)

1
🙏 1
favila16:06:05

I concur with @U051V5LLP pretty much.

favila16:06:38

The good parts: • Attribute-centric modeling plus datalog querying is amazing, even with the occasional badly-ordered query. It’s at least good to know that what you write is what you’ll get, but even accepting that there’s room for improvement: knowing what index a clause will use, or knowing what clauses/rules are contributing most to a result set size or CPU time. • Having an “easy” transaction queue for stream-based processing. • The peer model for scaling reads. • History for internal-facing auditing and debugging. It’s a blessing and a curse. I really wish there were more knobs here other than history/no-history. Some attributes you really want everything forever, some are valuable for a few weeks or months and then just contribute to history index cost and churn. But I don’t agree with e.g. datalevin that it shouldn’t exist.

favila16:06:51

BTW, when I mean “at scale”, this is from maintaining a (now) 16 billion datom database over 7+ years with a multi-tenant workload.

Kris C18:06:34

Thanks for great info, guys 🙏

Kris C07:06:09

Anyone else cares to share their experience with Datomic (on-prem) in production?

Ivar Refsdal07:06:58

I'll share some during the day 🙂

👍 1
octahedrion08:06:07

@U09R86PA4 what do you mean by "materialize your history" ?

octahedrion10:06:57

WRT long texts, is there a case to be made for storing long texts as structures of smaller texts ? For example, storing a document as entities like nested HTML elements (paragraphs, lists etc), or even going so far as to represent texts as structures of words.

favila12:06:24

“Materialize your history” = represent history that is customer facing explicitly with schema and data you design. You would read this “history” data using the current database, not a history database. (Or you could represent it out of datomic entirely)

👍 1
favila12:06:00

Re: storing large texts as structures of smaller texts: unless you have some use for that, probably not. Semi structured text doesn’t usually assign identity to its elements so updates would be hard.

favila12:06:15

Just seems like unnecessary complexity most of the time

dvingo12:06:29

@U09R86PA4 i'm interested why you suggest not using entity API (I'm curious, I don't have any strong opinions here)? is it due to performance?

favila12:06:07

It encourages code patterns that don’t have an easy to re-examine boundary between a “data access” layer (where you can put an interface you change at a different cadence from the schema) and the data consuming code. This also makes people use entity walking with Clojure code to implement queries instead of actual queries (more familiar but usually less clear and inefficient because it always uses EAVT indexes). And it’s an invisible source of lazy IO which makes reasoning about performance and profiling hard

favila12:06:41

And it makes it impossible to use the client api

dvingo12:06:18

very useful info - thanks for explaining

favila12:06:11

It’s sometimes exactly what you need though. Eg it’s a good replacement for any time someone would normally be reaching for a data source pattern

favila12:06:42

Eg it’s a great fit for lacinia resolvers

favila12:06:15

Where it’s difficult to predict what you will need

favila12:06:06

If you can’t predict what you’ll need it’s a very performant alternative to what is usually done, which is n+1 madness

👍 1
favila12:06:39

But it’s better to know what you’ll need and make a query or pull expression up front for code organization purposes

jdkealy18:06:26

if i set up memcached, can i set my memoryindex and objectcache to 0 ?

favila18:06:12

memcache/valcache is to reduce pressure on your storage, not on your peer size. objectcache must still be big enough for the working set of your queries, and memoryindex controls how frequently you index (you can’t index continually).

Vishal Gautam19:06:13

Hello 👋, I am trying to learn more about datomic rules by following this tutorial: https://www.youtube.com/watch?v=7lm3K8zVOdY&amp;t=864s&amp;ab_channel=ClojureTV While invoking owns? function I am getting this error

java.lang.IllegalArgumentException
: 
"Cannot resolve key: 24a96e20-f526-4f7f-ba38-4f684caa5607"
Here is the full code. 🙏
(def owner-rules
  '[[(owns? ?cus-id ?e)
     [?e :customer/id ?cus-id]]
    [(owns? ?cus-id ?e)
     [?e ?ref-attr ?r]
     (owns? ?cus-id ?r)]])


(defn owns? [cid pid db]
  (d/q '{:find [?pur]
         :in [$ ?cus-id ?pur %]
         :where
         [(owns? ?cus-id ?pur)]}
    db cid [:purchase/id pid] owner-rules))

;; throws error :(
(comment
  (owns?
    #uuid "0fb7ea94-44af-46fa-98ca-0ddb5eb23123"
    #uuid "24a96e20-f526-4f7f-ba38-4f684caa5607"
    (d/db conn)))

Keith13:06:46

Can you post the full stack trace? It looks like Datomic is failing to resolve your lookup ref [:purchase/id #uuid "24a96e20-f526-4f7f-ba38-4f684caa5607"] to an entity id, but it's hard to tell for sure without the full stack trace. Also: • Does the :purchase/id attr have a value for :db/unique? • Does your db have an entity with #uuid "24a96e20-f526-4f7f-ba38-4f684caa5607" for :purchase/id?

Vishal Gautam14:06:09

@U424XHTGT Here is the full source https://github.com/Novus-School/novus/blob/master/novus/src/main/novus/superpowers.clj#L21 > Does the :purchase/id attr have a value for :db/unique? Yep :db/unique :db.unique/identity > Does your db have an entity with #uuid "24a96e20-f526-4f7f-ba38-4f684caa5607" for :purchase/id? Yep, if you look at the line 88, it is transacted using that ID

Vishal Gautam14:06:06

Error Trace

java.lang.IllegalArgumentException : "Cannot resolve key: 24a96e20-f526-4f7f-ba38-4f684caa5607"
in  datomic.core.datalog/resolve-id (datalog.clj:330)
in datomic.core.datalog/resolve-id (datalog.clj:327)
in datomic.core.datalog/fn--24749/bind--24761 (datalog.clj:442)
in datomic.core.datalog/fn--24749 (datalog.clj:619)
in datomic.core.datalog/fn--24749 (datalog.clj:399)
in datomic.core.datalog/fn--24599/G--24573--24614 (datalog.clj:119)
in datomic.core.datalog/join-project-coll (datalog.clj:184)
in datomic.core.datalog/join-project-coll (datalog.clj:182)
in datomic.core.datalog/fn--24672 (datalog.clj:289)
in datomic.core.datalog/fn--24672 (datalog.clj:285)
in datomic.core.datalog/fn--24578/G--24571--24593 (datalog.clj:119)
in datomic.core.datalog/eval-clause/fn--25333 (datalog.clj:1460)
in datomic.core.datalog/eval-clause (datalog.clj:1455)
in datomic.core.datalog/eval-clause (datalog.clj:1421)
in datomic.core.datalog/eval-rule/fn--25365 (datalog.clj:1541)
in datomic.core.datalog/eval-rule (datalog.clj:1526)
in datomic.core.datalog/eval-rule (datalog.clj:1505)
in datomic.core.datalog/eval-query (datalog.clj:1569)
in datomic.core.datalog/eval-query (datalog.clj:1552)
in datomic.core.datalog/eval-clause/fn--25333 (datalog.clj:1477)
in datomic.core.datalog/eval-clause (datalog.clj:1455)
in datomic.core.datalog/eval-clause (datalog.clj:1421)
in datomic.core.datalog/eval-rule/fn--25365 (datalog.clj:1541)
in datomic.core.datalog/eval-rule (datalog.clj:1526)
in datomic.core.datalog/eval-rule (datalog.clj:1505)
in datomic.core.datalog/eval-query (datalog.clj:1569)
in datomic.core.datalog/eval-query (datalog.clj:1552)
in datomic.core.datalog/eval-clause/fn--25333 (datalog.clj:1477)
in datomic.core.datalog/eval-clause (datalog.clj:1455)
in datomic.core.datalog/eval-clause (datalog.clj:1421)
in datomic.core.datalog/eval-rule/fn--25365 (datalog.clj:1541)
in datomic.core.datalog/eval-rule (datalog.clj:1526)
in datomic.core.datalog/eval-rule (datalog.clj:1505)
in datomic.core.datalog/eval-query (datalog.clj:1569)
in datomic.core.datalog/eval-query (datalog.clj:1552)
in datomic.core.datalog/qsqr (datalog.clj:1658)
in datomic.core.datalog/qsqr (datalog.clj:1597)
in datomic.core.datalog/qsqr (datalog.clj:1615)
in datomic.core.datalog/qsqr (datalog.clj:1597)
in datomic.core.query/q* (query.clj:664)
in datomic.core.query/q* (query.clj:651)
in datomic.core.local-query/local-q (local_query.clj:58)
in datomic.core.local-query/local-q (local_query.clj:52)
in datomic.core.local-db/fn--27457 (local_db.clj:28)
in datomic.core.local-db/fn--27457 (local_db.clj:24)
in datomic.client.api.impl/fn--13153/G--13146--13160 (impl.clj:41)
in datomic.client.api.impl/call-q (impl.clj:150)
in datomic.client.api.impl/call-q (impl.clj:147)
in datomic.client.api/q (api.clj:393)
in datomic.client.api/q (api.clj:365)
in datomic.client.api/q (api.clj:395)
in datomic.client.api/q (api.clj:365)
in clojure.lang.RestFn.invoke (RestFn.java:486)
in novus.superpowers/owns? (superpowers.clj:135)
in novus.superpowers/owns? (superpowers.clj:134)
in novus.superpowers/eval108861 (/Users/vishalgautam/projects/novus/novus-server/novus/src/main/novus/superpowers.clj:157)
in novus.superpowers/eval108861 (/Users/vishalgautam/projects/novus/novus-server/novus/src/main/novus/superpowers.clj:157)
in clojure.lang.Compiler.eval (Compiler.java:7181)
in clojure.lang.Compiler.eval (Compiler.java:7171)
in clojure.lang.Compiler.eval (Compiler.java:7136)
in clojure.core/eval (core.clj:3202)
in clojure.core/eval (core.clj:3198)
in unrepl.repl$i9hjMxfOQ2IzbCA5TVia2QQEJNg$start$interruptible_eval__25579$fn__25580$fn__25581$fn__25582.invoke (NO_SOURCE_FILE:803)
in unrepl.repl$i9hjMxfOQ2IzbCA5TVia2QQEJNg$start$interruptible_eval__25579$fn__25580$fn__25581.invoke (NO_SOURCE_FILE:803)
in clojure.lang.AFn.applyToHelper (AFn.java:152)
in clojure.lang.AFn.applyTo (AFn.java:144)
in clojure.core/apply (core.clj:667)
in clojure.core/with-bindings* (core.clj:1977)
in clojure.core/with-bindings* (core.clj:1977)
in clojure.lang.RestFn.invoke (RestFn.java:425)
in unrepl.repl$i9hjMxfOQ2IzbCA5TVia2QQEJNg$start$interruptible_eval__25579$fn__25580.invoke (NO_SOURCE_FILE:795)
in clojure.core/binding-conveyor-fn/fn--5772 (core.clj:2034)
in clojure.lang.AFn.call (AFn.java:18)
in java.util.concurrent.FutureTask.run (FutureTask.java:264)
in java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1136)
in java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:635)
in java.lang.Thread.run (Thread.java:833)

Vishal Gautam15:06:57

ex-data error message

{:cognitect.anomalies/category :cognitect.anomalies/incorrect, :cognitect.anomalies/message "processing clause: [?e :customer/id ?cus-id], message: Cannot resolve key: 24a96e20-f526-4f7f-ba38-4f684caa5607"}