This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-07-21
Channels
- # aws (2)
- # bangalore-clj (14)
- # beginners (20)
- # boot (20)
- # cider (7)
- # cljs-dev (38)
- # cljsrn (13)
- # clojure (487)
- # clojure-argentina (3)
- # clojure-dev (15)
- # clojure-gamedev (4)
- # clojure-italy (3)
- # clojure-poland (1)
- # clojure-russia (1)
- # clojure-spec (25)
- # clojure-uk (47)
- # clojurescript (127)
- # datomic (125)
- # defnpodcast (1)
- # hoplon (27)
- # jobs (4)
- # lein-figwheel (2)
- # leiningen (1)
- # luminus (5)
- # off-topic (3)
- # om (25)
- # onyx (9)
- # parinfer (3)
- # pedestal (20)
- # planck (65)
- # re-frame (43)
- # reagent (4)
- # remote-jobs (1)
- # ring-swagger (2)
- # rum (9)
- # spacemacs (1)
- # unrepl (37)
- # vim (1)
What are the procedure and "costs" of migrating from one storage backend to another? e.g. between postgres and Cassandra or vice versa?
Yep, I guess so. I assume all functionality will be maintained, except that moving from a consistent (postgress) storage to an eventually consistent one (cassandra default configuration) will introduce application bugs if the application assumed strict consistency before.
@matan application should be completely unchanged by the choice of storage for datomic
@matan Datomic only mutates a tiny number of records. everything else is immutable, so no opportunity for inconsistency
@favila last I asked on the mailing list, it was my understanding of the response I got, that query results will be different based on which cassandra node answered to datomic ― and as I recall the default setup scenario with cassandra is to "commit" a change before all cassandra nodes have updated (maybe I am wrong there) ― I think that with high throughput/activity, I might get different query results depending on which cassandra node answered to datomic
what may happen is that NO server available to you has the record (in case of network partition)
that would be some kind of failure or retry, but the application would not keep going silently with different results
So from an application point of view, say I query for all movies (borrowing from the tutorial minimalist scenario), and a movie was added, and it is not there yet when datomic runs the query, on the cassandra node that was used by datomic for this query.
If the application can't retreive the movie, this will reflect a wrong "world" from the user's point of view
if one peer writes while another peer queries, the querying peer may not know about the latest transaction
As an aside, I should de-complect what the user tells the app, and what the app agrees to enter into the world
the key is there is only one transactor. the transactor writes and informs peers of the latest t
oh, right, so a query coming after the data transacting, would not need to go all the way down to the storage layer, it will get the latest as long as the transactor already finished updating it about the transaction? is that it?
Though, if the peer forgets about T, because it has been taken out of its cache in order to satisfy some larger query?
Imagine the entire database is an atom containing {:current-t T :transaction-log [...] :indexes {:eavt [...] :avet [...] :aevt [...] ...}
so the result of every transaction is an immutable database value with access to all history too
implementation-wise of course this is not how it's done, but operationally that is the experience you get
Yes, I write clojure code, e.g. https://github.com/Boteval/compare-classifiers
furthermore, that T never changes, so there is no chance that an eventually-consistent storage has different values for the same T
I should revisit after being done with the tutorial and having a complete grasp of datums, attributes etc.. the data model. Will be back at that point...
@matan this may interest you if you want to know more about internals: http://tonsky.me/blog/unofficial-guide-to-datomic-internals/
Well I think I get it. So reading the most current data boils down to chasing the latest time handle (if searching for an entity) or having the right transaction id at hand to begin with, or waiting for them to arrive courtesy of the transactor. Framed as such, the notion of "consistency" is reduced to an easy to satisfy definition, one that is closer to the ACID definition than the CAS one.
Which does not make an application using datomic behave consistently without some effort in the form of judicious use of datomic api. I can live with that, maybe, in exchange for not using SQL nor a lame NoSQL database.
matan: if you’re worried about a client reading their own writes, you can use d/sync
to make sure you read the basis t of the last write from that user
@matthavener with postgres as a backend this won’t be an issue, right?
@matthavener yep, so I came to gather, thanks.
afaik, you can lag reads on any type of storage
if you write with peer B and read from peer A, there’s no guarantee that the transactor has updated A with the basis T that was just transacted from B
(hence, d/sync
, which would allow you to force A to sync to some T)
@matthavener but from within the same peer, there is guarantee, right?
I don’t know if that’s a guarantee datomic makes, but it would seem reasonable to me 🙂
if its the same peer, you can read the value of d/transact
‘s future to get the db-after
which has that guarantee
If I were to write a service whose purpose is to follow the transaction log through the log API and update some internal state
is it possible to get, at every point, the transaction ID that was immediatly preceding it
so my service can keep track of the “latest transaction ID it processed”, and before processing a new transaction it can check that the “latest tx ID” it has recorded matches the “previous transaction ID” of the transaction he’s about to process
TL;DR; how do I reliably write a service that follows the transaction log, ensuring that it doesn’t miss any transaction
there is unfortunately no cheap way to get the previous tx, because there's no way to walk indexes backward
@favila ok, thanks. I wish there was a way for me to manually check it though, e.g. when getting a new transaction to process be able to get a reference to the transaction that immediatly precedes it
@favila oh wait; records from the report queue contain “db-before”. Would it be expensive to call basis-t
on that, then t->tx
?
@hmaurer you can also use the Log API, for an architecture where the consumer decides when it catches up instead of being notified by the tx-report-queue in real-time
@hmaurer have you looked at the docs e.g. http://docs.datomic.com/log.html ?
Thanks, that's clear by now. The ACID definition of consistent is easy to accomplish in this architecture and the paradigm implied by the time-oriented API.
Going through the unofficial internals doc suggested above, my thoughts are twofold: 1. Most these things should have been on the official docs, right after the introductory parts. It gives a sense of what performance to expect in different scenarios, and thus whether or how to use datomic for a scenario. Maybe they are already mentioned there. 2. I think datomic has the upper hand on data modelling compared to what else we have out there, but possibly at a dire cost of being prohibitively slower than other options for some standard scenarios, due to all the translation taking place for weaving transactions into non-transactional storage, the unoptimized (?) nature of datalog v.s. SQL, and involving external storage layers which act as databases not just storage, thus incurring additional overhead.. I'd be really happy to see some intelligent benchmarks or refutations of the assumptions sprinkled in this quick note.
“prohibitively slower” and “standard” both would need definition
there is no doubt that SQL query engines have decades of clever performance optimizations
and it almost feels like cheating to win some read scenarios via the architectural advantage of immutability + multi-tier caching
@matan in Datomic, external storages act as block stores, not as databases
@matan Datomic happily accepts the write overhead needed for ACID transactions, and the application fits and misfits this implies
@matan I think the important thing missing from your note is the horizontal scaling and caching advantages enabled by peers + immutability
@stuarthalloway from your experience, has there ever been cases where the single-writer process becomes a performance problem? And if yes, what would be the recommended way to deal with this?
Over the top of my head I thought the application could then be split into multiple databases, which might allow the transactor to operate in parallel
@hmaurer sure! Datomic is not a fit for write scale, as the FAQ states: http://www.datomic.com/faq.html
@stuarthalloway Yep I saw that, and I wouldn’t hold Datomic guilty for not dealing with large volumes of write. I am just wondering if there are potential workarounds
@hmaurer You can split the application into multiple databases with a separate transactor for each. Remember that peers can query (with join!) across databases, and peers do not care what transactor they come from
Also I might just be delusional but I feel I understand its internals much better than other databases I have used before, which is comforting
If you are looking at that kind of scaling trick, please stay in touch here and/or on the mailing list. Happy to help vet and bench your ideas.
Or that, even if I do not understand its internals, I won’t need 10 years of experience to understand them if explained properly
Great, thanks! I don’t have any practical application for which I would need to scale writes though; it was just out of curiosity
Cheers!
Ah, quick other question while you are here @stuarthalloway 🙂