Fork me on GitHub
#datomic
<
2024-03-07
>
Brett Rowberry04:03:48

We have very little Datomic expertise at my company, but a decent amount of interest. All our current persistence is MySQL and PostgreSQL. Let’s pretend we get a critical mass of developers proficient in Datalog. What else do we need to be successful with Datomic cloud?

Brett Rowberry04:03:12

Some things that worry me: needing to replicate data to Snowflake for reporting and maintaining Datomic operationally.

chrisblom09:03:16

First of all make sure Datomic is a good fit for you use case, it's great for complex relations and read heavy workloads, but less suited for write heavy workloads, storing large blobs or number crunching. Replicating data to Snowflake should not be to hard, you can subscribe to all changes easily, or replay the transaction log, and forward these to Snowflake. In Datomic this functionality is part of the API, and much simpler than the change-data-capture solutions for most databases, which require all kinds of ad-hoc solutions. Maintaining Datomic operationally was not that complex in my experience, but I was using DynamoDB for persistence, which is more hands off than Postgres/MySQL. If you are on AWS they provide templates to set up everything.

Brett Rowberry02:03:52

> First of all make sure Datomic is a good fit for you use case, it's great for complex relations and read heavy workloads, but less suited for write heavy workloads, storing large blobs or number crunching. We make loans. The frequency of writes is quite low since almost all actions are triggered by users.

Brett Rowberry02:03:34

> Replicating data to Snowflake should not be to hard, you can subscribe to all changes easily, or replay the transaction log, and forward these to Snowflake. In Datomic this functionality is part of the API, and much simpler than the change-data-capture solutions for most databases, which require all kinds of ad-hoc solutions. This kind of seems like a Lisp curse thing. It’s easy, so there tend not to be ready-made solutions. Are you aware of how this can be done?

Brett Rowberry02:03:23

> Maintaining Datomic operationally was not that complex in my experience, but I was using DynamoDB for persistence, which is more hands off than Postgres/MySQL. Sounds like it should be simpler than PostgreSQL on RDS.

Brett Rowberry02:03:35

Thanks for your response!

chrisblom12:03:19

The change capture can be done with d/tx-range and d/tx-report-queue, see https://gist.github.com/ChrisBlom/940fdeab7066802b203be4380d5ea1ae

Ivar Refsdal10:03:56

I got an OutOfMemoryError while doing a Datomic on-prem incremental restore:

# ...
#Copied 12 segments, skipped 342309 segments.
#Exception in thread "async-dispatch-3" java.lang.OutOfMemoryError: Java heap space
#        at datomic.future$add_bounding_warning$fn__9788.invoke(future.clj:86)
#        at clojure.lang.AFn.run(AFn.java:22)
#        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
#        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
#        at clojure.core.async.impl.concurrent$counted_thread_factory$reify__765$fn__766.invoke(concurrent.clj:29)
#        at clojure.lang.AFn.run(AFn.java:22)
#        at java.base/java.lang.Thread.run(Thread.java:840)
#Copied 12 segments, skipped 342315 segments.
#Exception in thread "async-dispatch-6" java.lang.OutOfMemoryError: Java heap space
#java.lang.OutOfMemoryError: Java heap space
Is that a known issue? Is there something other than increasing memory one can do about it?

Joe Lane12:03:21

I don’t think there is an issue here. Just add ram. I can’t tell you how much unless I know what you’re restoring ( billion datoms) and how you have the restore job configured

Ivar Refsdal16:03:54

IIRC it's ~300 million datoms. Launching with -Xmx4g -Xms4g

Joe Lane18:03:06

How much ram does the machine have?

Ivar Refsdal08:03:26

8 GB total, 7~ GB available. It did work when launching with 6GB. I still find it strange it OOMEs though.

jaret15:03:47

That is a little strange. My recommendation is that you run restore with the same heap as production transactor. Is that (4g) what you run your production transactor on? And are you passing any other flags to restore?

Joe Lane17:03:49

How many cpus are on the box?

Joe Lane17:03:28

Could you take a JFR recording if you still have issues persist?

Ivar Refsdal13:03:40

Is that (4g) what you run your production transactor on?We are migrating from an old db & transactor to a new db & transactor. Old transactor had -Xm(x|s)8g. New transactor has -Xm(x|s)4g (yes, that's less memory on the new transactor). > And are you passing any other flags to restore? Not of importance I would believe. Full cmd is:

./bin/datomic -Xmx6g -Xms6g -DLOG_LEVEL="INFO" restore-db "file:./backup/${DB}" "${DEST_DB_URI}"
> How many cpus are on the box? The "box" is an azure VM of type https://learn.microsoft.com/en-us/azure/virtual-machines/dv3-dsv3-series: 2 vCPU, 8 GiB.
$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
stepping        : 7
microcode       : 0xffffffff
cpu MHz         : 2593.904
cache size      : 36608 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 21
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves vnmi md_clear
vmx flags       : vnmi invvpid tsc_offset vtpr ept vpid
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit mmio_stale_data retbleed gds
bogomips        : 5187.80
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
stepping        : 7
microcode       : 0xffffffff
cpu MHz         : 2593.904
cache size      : 36608 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 21
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves vnmi md_clear
vmx flags       : vnmi invvpid tsc_offset vtpr ept vpid
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit mmio_stale_data retbleed gds
bogomips        : 5187.80
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
Anyhow... Restore worked with 6GiB, and new transactor is running fine, it seems.

jjttjj16:03:22

I'm tempted to add schema like this:

{:db/ident       :external/id
 :db/valueType   :db.type/tuple
 :db/tupleTypes  [:db.type/keyword :db.type/string]
 :db/cardinality :db.cardinality/one
 :db/unique      :db.unique/identity}
and then just use, e.g. [:other-company.order/id "order123"] And use this as basically the "universal" unique id key for foreign ids in my system. Any particular reasons not to do this?

ghadi16:03:53

it's not a bad idea and I've done stuff like this in the past

ghadi16:03:18

but I used a one character string for the external key type, not a keyword

ghadi16:03:59

but not :db.unique/identity, or card-one

ghadi16:03:11

I made a card many tuple of two strings

jjttjj16:03:08

hmm I'm not quite sure how that could represent a similar thing but am curious. My intent is to have this single schema for all cases where some org I don't control has some unique id, and it's useful to treat those as unique entities in my system. I think I could see why not using :db.unique/identity could have pros and cons, but you lost me with cardinality many. Why would the same entity in my system have multiple ids in other external systems? [It's possible I'm just too stuck thinking in terms of my particular domain]

ghadi16:03:19

yes that was exactly the use case. same entity with different ids

cch117:03:11

I'm trying to read between the lines, but why not "promote" the tuple to a proper attribute (`:other-company.order/id`) and value (`"order123"`)? I see pros and cons... Tuple Pros/Db Attr Cons: • One-time setup of schema and then adapt to many varieties of foreign keys on-the-fly. Tuple Cons/Db Attr Pros: • Semantics live outside the database in your convention... so a little less self-evident. • All external keys are forced to be strings (very common, but not strictly universal) All told, I think it's a good paradigm, especially given the painful restrictions on composite tuples.

👍 1
onetom20:03:41

Interesting idea. I would add an attribute predicate on :external/id , to reject invalid key attribute keywords. What if you would use :db/tupleTypes [:db.type/ref :db.type/string] ? Would that be more or less efficient?

jjttjj18:03:18

@U086D6TBN Thanks, the attribute predicate is a good idea. I gave the :db.type/ref thing a shot. I can't speak to efficiency, but the way I understand it, it makes it inconvenient to use the external-id as a lookup vector, which I would want. Because we'd have to look up the :db/id of the ref to use in a lookup vector. Unless there's some way to improve this.

(let [client (d/client {:server-type :datomic-local
                          :system      "dev"
                          :storage-dir :mem})
        _      (d/delete-database client {:db-name "db1"})
        _      (d/create-database client {:db-name "db1"})
        conn   (d/connect client {:db-name "db1"})
        schema [{:db/ident       :external/id
                 :db/valueType   :db.type/tuple
                 :db/tupleTypes  [:db.type/ref :db.type/string]
                 :db/cardinality :db.cardinality/one
                 :db/unique      :db.unique/identity}

                {:db/ident :other-company.order/id}

                ]]
    (d/transact conn {:tx-data schema})
    (d/transact conn {:tx-data [{:external/id [:other-company.order/id "order123"]}]})

    ;; this just returns {:db/id nil}
    (d/pull (d/db conn) '[*] [:external/id [:other-company.order/id "order123"]])

    (let [attr-id (:db/id (d/pull (d/db conn) '[:db/id] :other-company.order/id))]
      (d/pull (d/db conn) '[*] [:external/id [attr-id "order123"]]))
    )

jjttjj21:03:59

Is io-stats supposed to work for :server-type :datomic-local? adding :io-context to a query and transaction isn't doing anything for me (no stats in the result) but io-stats are described in the docstrings so I think I'm on the right version at least. Using latest versions,

com.datomic/client-cloud {:mvn/version "1.0.125"}
  com.datomic/local        {:mvn/version "1.0.277"}