datahike

silian 2025-09-14T22:40:39.035389Z

From my local machine I connect to live/prod datahike postgres db (Heroku) and establish conn. From local, I transact new schema and new tx against that conn. On Heroku, I need to restart dynos to see most recent version of my datahike [peer? db?]. Can someone point to article/reading so I can understand the architecture?

timo 2025-09-15T08:33:00.839449Z

it is not entirely clear to me how your setup is. I am making the assumption that you have an instance of an application running on heroku (dyno?) that has an in process datahike instance running both inside the jvm (means the default use case datahike was made for, as a dependency inside your clj-app). So this application is running and is connected to a postgres instance (on heroku). Now you say that you connect from your local machine to the postgres db to transact? If you transact from local the index on the app running on heroku is not updated and it does not know about the just written data. Datahike has no peers in the sense Datomic has, only with datahike-http-server you can have something similar. Please look at the distributed docs: https://github.com/replikativ/datahike/blob/main/doc/distributed.md#single-writer I don't understand what you're saying regarding the restart of your dyno(??) I don't know nothing about heroku but do you now see the transacted data after restarting your application on heroku?

✅ 1
silian 2025-09-15T08:34:31.954869Z

Thanks timo

silian 2025-09-15T08:43:17.499789Z

Heroku dyno is “isolated environments that provide compute, memory, an OS, and an ephemeral filesystem.” Thing is, my cfg (and therefore @conn) only specify address of my Heroku hosted postgres db (which is separate from dyno and persists).

silian 2025-09-15T08:46:08.783339Z

Since I am always deref'ing the conn, I don't understand why tx does not immediately reflect in app but requires me to restart the dyno (container is destroyed and everything re-built and evaluated/loaded).

silian 2025-09-15T08:51:55.050279Z

Here is Heroku schematic of simple app structure:

silian 2025-09-15T08:53:21.149199Z

(By "immediately reflected" I mean on web page refresh, apologies.)

timo 2025-09-15T08:53:36.901199Z

in datahike the store is the only centralized part which was an early design-decision. each datahike instance has an index which is updated only when you transact in this very instance. if you want to transact and read from multiple instances of datahike (e.g. local repl and remote app) you will have to use the datahike-http-server which distributes the index-updates to multiple datahike-instances.

silian 2025-09-15T08:56:53.811959Z

Wow, I see. So despite the "same" conn, my REPL has one instance of datahike and the remote app has entirely different instance that only "sees" whatever transactions available on dyno or container formation.

timo 2025-09-15T08:57:54.887459Z

yes, right... I am not entirely up to date with datahike atm but I am pretty sure you would have to use http-server to solve that

timo 2025-09-15T08:58:08.019719Z

maybe @whilo can chime in

silian 2025-09-15T08:58:27.339969Z

When I recreate the dyno (with restart) datahike instance retrieves again from postgres store.

timo 2025-09-15T08:59:01.459969Z

I am pretty certain it does a complete reindex but not 100% sure right now

silian 2025-09-15T08:59:49.389379Z

Ok, but this is helpful. Thank you

timo 2025-09-15T09:21:36.370329Z

I will test it myself this afternoon probably

whilo 2025-09-15T09:25:42.169459Z

@feedmyinbox02_clojuri does this graphic make sense https://github.com/replikativ/datahike/blob/main/doc/distributed.md ?

whilo 2025-09-15T09:27:38.328269Z

when working correctly all connections will virtually point to the same db snapshot. in the distributed setup we have so far each connection reading from the remote store to get the latest snapshot when you deref it.

whilo 2025-09-15T09:27:55.845199Z

the most critical part is that transactions happen in a single writer process

whilo 2025-09-15T09:28:13.315719Z

multiple writers would overwrite each other

whilo 2025-09-15T09:29:35.942839Z

for readers a problem can be if they think they see all changes automatically and don't do a fresh read from the shared store. when you configure a writer process in the config the clients should automatically be configured correctly and always fetch fresh from the store on deref

✅ 1
whilo 2025-09-15T09:31:50.464619Z

i haven't used heroku in a long time and haven't used it with datahike; is there a way to have single worker dyno for the writer?

silian 2025-09-15T09:33:49.657539Z

In the graphic, each black box is a runtime instance of datahike or app containing an instance?

whilo 2025-09-15T09:35:48.365949Z

yes

whilo 2025-09-15T09:36:42.081299Z

the runtime can serve all reads (queries) directly from the store, only write operations are serialized to the writer

silian 2025-09-15T09:42:36.794969Z

I will check to see if I can establish a single worker dyno for a writer; by defining :writer and setting :url to some address provided by Heroku that exclusively locates a worker dyno to handle writes. Interesting!

whilo 2025-09-15T11:17:41.748119Z

yes, that should work. lmk how it goes

silian 2025-09-14T22:42:14.870519Z

(This is the part I understand least; I don't grasp peers vs. conns vs. db vs. what datahike really is maybe.)

silian 2025-09-14T22:44:19.270109Z

After dyno restart, I don't see transacted data in my live web app (though it appears when querying conn locally). In the web app, I d/q against what should be the same @conn I transacted against in my REPL.

silian 2025-09-14T22:50:56.788569Z

Response from chatGPT 5: > Store: where bits live (you’re using JDBC/Postgres). > > Conn (conn): a process-local atom that points at the current db value. Transactions update this atom; deref (@conn or (d/db conn)) to read the latest value known to that process. > > Db value (db): an immutable snapshot. If you capture it once (e.g., (def db (d/db conn))) and reuse it, it never “sees” later txs. > > Multi-process: Each process has its own conn and caches. You can either let every process connect to the same Postgres store, or funnel all requests through a Datahike server so caches are shared.