Fork me on GitHub
#datomic
<
2019-01-08
>
dmarjenburgh08:01:25

Is there a way to subscribe to the live index feed in datomic cloud? I really want this and I believe this is possible in the on-prem version.

Joe Lane15:01:11

Currently no, not without rolling your own solution. There are several possible ways to do it with the result of a transaction by placing that onto a queue or topic of some sort.

Ben Hammond16:01:38

I'm investigating {:db/error :db.error/transactor-unavailable} errors that seem to be triggered by a series of 9kB (237 datom) transactions to a on-prem txtor (hosted on t2.large connecting to db.t2.small RDS Postgres) and I want to make sure I've covered my bases of Things To Think About. Are there any resources I can be pointed at? so far I've onlly really found https://groups.google.com/d/msg/datomic/88eo9lV8jXE/NTjuxk1oBwAJ

markbastian16:01:52

Does transaction size affect indexing time? For example, if I have 10,000 datoms does Datomic perform best when it comes to indexing if I transact 1 datom x 10,000 transactions, 100 datoms x 100 transactions, or a single 10,0000 datom transaction?

Alex Miller (Clojure team)16:01:33

I have no knowledge of the correct answer, but I’d place my bet strongly on the middle one :)

Alex Miller (Clojure team)16:01:14

and secondly on the first one

marshall16:01:37

@markbastian yes and you should prefer the middle one

marshall16:01:59

@ben.hammond how many is a series? in general that message indicates the transactor is busy - i would look at metrics to see whether you’re hitting storage backpressure, wouldn’t surprise me with a small RDS instance

Ben Hammond16:01:51

i see a surprising volume of writes on the Postgres end ~91Mb

marshall17:01:46

you’re likely in an indexing job, which will write significant amounts of data to storage

Ben Hammond17:01:18

around 3 of these transactions running simultaneously

Ben Hammond17:01:41

twice a minute

markbastian17:01:16

Any sort of a sweet spot? Like "shoot for a few hundred datoms at a time."

markbastian17:01:37

On a related note (and providing some background), I am doing an initial large import of about a million records (each being a map of 50 entries) into a prod cloud instance. After a few hundred thousand items are transacted (300k) I start getting "Busy Indexing" transaction failures. I am applying a 60 second backoff then retrying and eventually everything picks up again. My goal would be to write as fast as possible with as few transaction failures as possible. I think I have 2 parameters to play with to smooth this out: 1) the batch size. I'm doing 10 at a time ATM, but can adjust that. 2) the number of async threads that are working on the job. Is there a good value for #2? Seems like you'd want > 1 to keep the transactor busy, but a much larger number will eventually saturate the transactor and cause indexing or other issues. I'm currently using 4 async threads, but am going to try dialing it down to two. Just wondering if anyone had any particular advice on this as well. Thanks!

Alex Miller (Clojure team)17:01:57

you might try asking on the forums too, I think this has been discussed there in the past

markbastian17:01:57

cool, thanks!

markbastian17:01:52

"Pipelining 20 transactions at a time with 100 datoms per transaction is a good starting point for efficient imports." - I think that's what I'd seen before. I'll try dialing down the size of my writes and do some experiments with number of transactions as well.

marshall18:01:51

Yes, that ^ is a good starting point for Cloud (despite being in on-prem docs)

tony.kay18:01:09

Hi, I’ve got an existing on-prem Datomic instance, and want to spin up another app that can just use the datomic client library so that it is a much lighter weight process. For deployment cost purposes, it would be nice if one of my existing Peers (which is also a running application) could act as the peer server. I can figure out how to start the peer (just get the classpath right, and invoke the peer server main), but I’m wondering if this is going to cause problems. I’m hoping the peer server will share the same resources with the Peer it is running within (connections are singletons if I remember right), but there is nothing in the docs about running it this way, so I’m wondering if someone that has more “internal authority” can tell me if that is a “sane” thing to do. TL;DR; Will a peer-server running within another Datomic Peer (application) share the database resources (if the peer server is providing API for the same db that the in-process Peer application is using)?

marshall18:01:53

@tony.kay do you mean run the peer server and your peer application on the same instance as two separate JVMs?

marshall18:01:13

or run them both in the same JVM?

tony.kay18:01:17

literally add the peer server stuff to classpath and run it on alt port from what else is running in JVM

marshall18:01:37

interesting Never considered it. I suspect it will work, but it’s an unsupported configuration Part of the reason for process isolation in Datomic is for reliability and problem isolation. If you run them together, then a problem in one will affect the other

marshall18:01:52

i.e. a runaway query, memory issue, etc

tony.kay18:01:17

sure…in this particular case the API server is going to be very lightly used at first…if it becomes more important, then of course spinning up another VM will make more cost sense.

marshall18:01:59

depending on your instance size/type I might consider at least running 2 separate JVMs

tony.kay18:01:18

that’s the thing…we’re already running 2 for the main app for redundancy…and since the peer server is going to be used very little, spinning up 2 more for it seems like a lot of peers from a provisioning cost perspective

marshall18:01:37

but in answer to your initial question, yes connections are shared and thread-safe

tony.kay18:01:39

esp. since they are high-memory instances

johanatan19:01:06

does anyone have experience with/ tips for using a datomic instance as a substack of another CF template? is it possible?

marshall20:01:53

Cloud or On-Prem?

markbastian20:01:05

Is index memory (IndexMemMb) in Datomic Cloud only used for generating indexes or for all index storage? Meaning, once all data has been transacted does it ever go down or does it grow with your data? As I write large amounts of data it flatlines at what looks like 1.1GB in the Cloudwatch mgmt console and then returns "Busy Indexing" failures from there on out. I'm wondering if I have a fixed limit on my index capacity at this point or if I can back off for a while and it will recover. The docs at https://docs.datomic.com/cloud/operation/monitoring.html#metrics lead me to believe that's where all the indexes are stored, but that seems like a pretty limiting factor on how much data datomic can store.

marshall20:01:39

the memory index works just like it does in Datomic On-Prem The memory index holds novelty that has been transacted until it gets merged into the persistent disk index via an indexing job

marshall20:01:56

You should definitely implement exponential backoff

marshall20:01:35

Datomic will return busy indexing messages until it has merged enough novelty into the persistent disk index to free up memory index space for more transactions

markbastian22:01:48

Thanks for the info. Based on what you've said and what I read I think I need to wait longer for the in-mem indexes to be written to disk. Here's a plot of my current index db mem. When I hit the flat line at 1.1GB I reliably stop writing and wait for indexing. I have waited 10s of minutes for that to occur, though. Does that seem reasonable? How long should it take for the indexing process to catch up?

markbastian22:01:58

FYI, the time span from about 19:40-20:05 was to write 100,000 records. The next run was an attempt to write another (separate) 100,000 records after waiting a few minutes in the hopes that indexing would stop.

markbastian22:01:30

Is there a way to determine when indexing has completed? That seems to be my issue. Even an hour or so after my last attempt to transact if I try a new transaction it still reports the "Busy Indexing" anomaly.

markbastian14:01:21

Here's some new data: I ran 2 async threads that write about 150 datoms/transaction. It was able to successfully write ~1.4 million records in a couple hours with very few indexing delays. Thanks again for all the help in getting this to work.

markbastian14:01:13

And for completeness, here's the IndexMemMb profile for the import.

marshall14:01:23

Is this a solo system?

marshall14:01:33

I have a couple of suspicions about why it behaves that way

marshall14:01:13

if it is indeed solo, the initial waiting was probably due to waiting for dynamo db scaling - you can look at your ddb graph to see if that’s the case

marshall14:01:41

the other thing you can do to speed this up is to move to a production system

markbastian14:01:41

This is a prod system (i3.large).

marshall14:01:06

ok. then yes I think it’s just a matter of providing enough time for the system to do the indexing job(s)

markbastian14:01:27

Yeah, I saw the dynamodb scaling issue on a prior attempt to load the data. I think what I was seeing yesterday that got me confused was some extremely long wait times for the indexing jobs, but I think I was just doing way too many writes so it couldn't catch up.

meanderingcode20:01:57

I am new to clojure and datomic, and I am having the darndest time finding information about how to manage schema migrations. The project I am playing with has a function init-database in a file dev/user.clj. This requires running (init-database) in lein repl, and is obviously not ideal for initializing a "production" database or running migrations on a "production" instance. But my google-fu has not surfaced clear guides on how to manage deployments, initialization, migrations. Do people put migration functions that auto-fire into their core code, like many frameworks in other languages? Is there a good script-based way to do it? I thought about the possibility to modify the code to allow s'th like lein run -m user --init-database, as i've seen some other functions in this project do [from a different file, env/dev/user.clj]

meanderingcode20:01:12

Does anyone have advice or a simple post/guide that I just missed?

lilactown20:01:51

@meanderingcode I think the 80+% case is you just transact the schema every time your app starts up

lilactown20:01:39

transacting the schema is idempotent; if the schema attributes already exist, nothing happens

lilactown20:01:42

there are a few operations that aren't idempotent. you should endeavor not to use them 🙂 if you do need them, then they're pretty special and you should write some special case code to handle them appropriately

eraserhd20:01:24

What @lilactown said. We used a migration tool and it caused more pain than it was worth. We now just compute our schema from our internal description and transact it. This sometimes fails, but only when existing data doesn't fit the new schema shape. We fix the data manually and redeploy.

lilactown20:01:01

I haven't done it

chrisblom21:01:28

I've used CFN's export functionality to expose a datomic instance to other stacks

chrisblom21:01:07

This was with a custom CFN template though, not the one provided with Datomic

chrisblom21:01:46

i'd recommend againt using substacks for this

chrisblom21:01:01

i'd prefer using Exports and Fn::ImportValue for sharing

meanderingcode20:01:21

Thanks @lilactown @eraserhd. Is there an example or doc about where and how to load that? Completely new to clojure, over here 🙂

johanatan21:01:21

@meanderingcode it's one step in the "getting started" tutorial

meanderingcode22:01:52

@johanatan That makes sense, i'm just not familiar enough with clojure to really understand the application lifecycle and where to call that.

johanatan22:01:52

@meanderingcode call it on startup. You’re going to want your db object to be lazy initted. So call the schema update when your db is constructed

meanderingcode22:01:59

Alright. I think this is where it would go: In src/clj/myapp/system.clj

(defn system [env]
  (component/system-map
    :conn
    (datomic/new-datomic
      (if-let [datomic-url (:datomic-url environ/env)]
        (str datomic-url "?aws_access_key_id=" (environ/env :datomic-access-key) "&aws_secret_key=" (environ/env :datomic-secret-key))
        (when true #_(= :dev env)
          (println "WARN: no :datomic-url environment variable set; using local dev")
          "datomic:)))

meanderingcode22:01:11

hmmm, maybe not

meanderingcode22:01:34

I might go on the next line as a function call, or maybe after the db connection is created in src/clj/myapp/datomic.clj

(ns orcpub.datomic
  (:require [com.stuartsierra.component :as component]
            [datomic.api :as d]))

(defrecord DatomicComponent [uri conn]
  component/Lifecycle
  (start [this]
    (if (:conn this)
      this
      (do
        (assoc this :conn (d/connect uri)))))
        (DB SCHEMA TRANSACTION HERE)
  (stop [this]
    (assoc this :conn nil)))

(defn new-datomic [uri]
  (map->DatomicComponent {:uri uri}))

meanderingcode22:01:15

I really appreciate the guidance. Clojure is my first lisp, other than simple changes to config in Emacs I am so new to.

johanatan22:01:40

Yes that’s right

johanatan22:01:54

Should be fine there

meanderingcode22:01:03

Thanks! Looks like i got the parens wrong, i'm guessing. take two off the preceding line and put them at the end of the transaction line? Going to test when i get a minute