Fork me on GitHub
#datomic
<
2016-05-30
>
vmarcinko05:05:50

hi, just started playing with datomic last few days, and noticed some thing which IMO is very important but not much discussed about around... Unlike SQL dbs where you do bunch of mutation operations on db during unit of work, these are not "transacted" until db commit is called. In datomic, since we don't have TX open/commit/rollback mechanism, does this mean that it simply forces the developer to structure the code in functional way so that every function which is called during this unit of work "adds" its own datomic TX data to this "unit of work global vector of TX data", so that top level function, which should be enforce atomicity of db change, should finally take this global vector and transact this to datomic?

hans05:05:01

@vmarcinko: yes. your application needs to organize transactions into small logical units.

vmarcinko05:05:54

because top level function are nothing but composition of low level functions where each of these low level ones can mutate db in its own way, so I must actually prevent this (for lo level fns to call d/transact), but only to collect their TX data and execute this transact in top level one at the end?

vmarcinko05:05:48

meaning, this requirement substantially affects style of code structure one can get used to when working with sql dbs

hans05:05:25

Yes. and it can be a major obstacle when you're used to dealing with large transactions, because Datomic does not support large transactions.

vmarcinko05:05:08

what is meant by large TX? even sql dbs suggest not to make TX too large

vmarcinko05:05:27

how many datoms aproxximately is considered large?

hans05:05:17

Ten thousand is large. The sweet spot for transaction size is in the low hundred Datoms, I'd say.

vmarcinko05:05:04

and if I make TX contains say 30,000 datoms, does it mean transact wil ltake a bit longer, or some other thing will actually prevent this from executing successfully at the end?

vmarcinko05:05:40

i'm OK if slowness is the only problem in such rare cases

vmarcinko05:05:30

because in sql dbs, i sometimes, rarely, commit 5000 new records, which is say roughly the same as 30000 datoms, and it takes sometimes few seconds, and I'm OK with that

hans06:05:06

You will need to make sure that you don't run into timeouts, and that will sometimes require tuning.

hans06:05:00

That tuning comes at the expense of increasing failover time in a redundant configuration, i.e. if you allow larger transactions by increasing the heartbeat interval, it takes longer for a new transactor to detect whether another transactor is already active.

hans06:05:45

Overall, dealing with large transactions in Datomic is a tricky issue and the system itself does not cater for you very much.

vmarcinko06:05:34

@hans: ok, thanx..Anyway, back to influence on code structure... Do datomic users sometimes use some dynamic var to represent this datomic TX data for single unit of work, thus relieving the low level functions the burden of adding this TX data as return value (beside some toher things)?

vmarcinko06:05:25

I know this dynamic var aproach is not the cleanest from functional perspective, but just wanting to see if there is some way to offer old way of structuring code everyone is used to with sql dbs

hans06:05:55

There are many ways to organize collection of transaction data, and it is certainly possible to use a dynamic variable for that. I'm not a big fan of dynamic variables (anymore) because they make testing difficult and don't play well with threading.

vmarcinko06:05:34

are there any docs about some aproaches to collection of TX data around..I coudln't find any one the web from quick glance?

hans06:05:36

We collect data that we want in a transaction in a top-level function that transacts it.

potetm06:05:02

@vmarcinko: No, you don't want to build up some global state like that.

potetm06:05:08

That's just asking for a mess.

potetm06:05:13

I'm having trouble seeing why something like

(concat (gen-one-set-of-facts ...)
        (gen-two-set-of-facts ...) ...)
wouldn't work.

potetm06:05:41

If you're worried about transaction size, then you need to break it up, not build it up.

vmarcinko06:05:55

@potetm: Thanx, I understand the benefits of functional approach, it's just that I find functional way sometimes difficult with code modularization... Sturart sierra's component and "dependency injection" it provides is mostly used for that, and I sometimes dunno if some lower level module (component) and its functions are wanting to transact some data, so I guess I should make much of these functions that belong to lower level modules, and can be plugged polymorphically , shoudl return map that has optional :datomic.tx-data key, thus higher level code can check if some tx data from lower level function exists and should be conjoined to global unit of work TX data vector

vmarcinko06:05:38

and this looks like major influence on code structure

vmarcinko06:05:19

on simple code bases it is easy to reason with purely functional way, but large codebases sometimes bring problems due to modularization requirement

vmarcinko06:05:14

because there can be cases when one impl of some component wants to transact some new data, and some other impl doesn't

potetm06:05:57

Right, so why wouldn't this work:

potetm06:05:16

(defn my-transacting-fn [uri args]
  (d/transact (d/connect uri)
              (concat (ns-1/set-of-facts args)
                      (ns-2/set-of-facts args)
                      (ns-3/set-of-facts args)
                      (ns-4/set-of-facts args))))

potetm06:05:39

Each set-of-facts fn can make whatever decision it wants based on args.

potetm06:05:18

It can return [], it can return [[:db/add 1 :my-attr "val"]]

potetm06:05:36

How is that not modular?

vmarcinko06:05:02

this is your top level fn that does transact, right?

potetm06:05:35

Yeah, that's the logical unit of work.

hans06:05:45

I guess what @vmarcinko is looking for is a way to make several distinct modules add to a transaction that is then committed at the end of the request, for example.

potetm06:05:57

that's what that does^

hans06:05:59

Which is a common pattern with SQL databases.

hans06:05:02

@potetm: Not quite, because you're enforcing a direct call relationship between the request handler and the modules that create data to transact.

hans06:05:22

@potetm: All that @vmarcinko wanted to have confirmation for is that it is common to structure applications to cater for Datomic's requirements, which I'd say can be confirmed. Datomic requires significant architectural support from the application. It is not at all a drop-in replacement for an SQL database.

potetm06:05:43

> I find functional way sometimes difficult with code modularization I think there's a fundamental misunderstanding about the value referential transparency provides. That's what I'm trying to get after.

hans06:05:57

Okay, but that's no longer something that is Datomic specific.

potetm06:05:58

If you go make a global mutable var you've thrown away all of the leverage datomic and clojure have to offer.

potetm06:05:55

You should take the decision that seriously. That's way I'm pushing a bit on this.

hans06:05:04

I don't agree at all. We're really talking about databases, and databases by their nature are about effects. It can certainly not be said that the only proper way to deal with effects is to organize them as a call chain.

vmarcinko06:05:10

@potetm: Thanx, though I know the value f referential transparency and functional way, sometimes I struggle a bit to organize large code bases whcih are modular in their very nature to allow purely functional aproach..SQL dbs don't enforce me to use functional way, while datomic seems to do just that, which as @hans nicely put, it makes datomic not drop-in replacement to SQL dbs in one's code

vmarcinko06:05:12

I have top level fn which calls in other pluggable module function which is called resolve-country (phone number), and this fn returns coutnry code

vmarcinko06:05:54

but you wouldn't believe, although this polymorphic function doesn't imply to return anything else, there were cases where in some deployments, meaning, other implementations of this functions, we have to register something in db

hans06:05:55

vmarcinko: Instead of a dynamic var, collecting the transaction data in a stream or queue-like structure might be better.

vmarcinko06:05:05

meaning, one impl of this function shoudl return datomic TX data

vmarcinko06:05:20

whereas in SQL db, we touch global state (SQL db) and nothing is seen from the caller

vmarcinko06:05:07

I don't have any problem with datomic pushing some application structure, just that I haven't seen this documented around in contrast to all pervasive SQL dbs

stijn12:05:36

I have a couple of questions about running datomic on google cloud

stijn12:05:47

1/ anyone any experience with running the transactor in google container engine (or aws container service). HA looks possible to me.

stijn12:05:17

2/ is there any chance that Google's Cloud DataStore will ever become a Datomic backend?

casperc13:05:38

I am wondering, is there any way to have Datomic participate in a two-phase commit? I am writing a file to a file store and some metadata in Datomic, but I want to make sure that the transaction doesn’t commit if the push to file storage fails or vice versa (creating an inconsistency between the two).

marshall13:05:12

@casperc: The Reified Transactions video here http://www.datomic.com/videos.html discusses approaches to solving that issue

marshall13:05:03

@vmarcinko: incidentally, that video ^ also touches on solutions to the large transaction issue

casperc13:05:56

@marshall: Thanks, I’ll take a look at that.

hans13:05:51

@casperc: generally, Datomic's transaction model does not blend well with the transaction model that two-phase commit systems usually prescribe.

hans13:05:51

@casperc: as Datomic's transactions are basically determined by the serialized execution of the transaction code in the transactor, you cannot really wait for other transactional systems to commit their work before you do.

hans13:05:53

@casperc: reified transactions can help grouping large operations together, but that is a very different thing from the isolation that traditional transactional systems provide.

casperc14:05:37

@hans: Yeah, I was afraid of that. I am thinking something along the lines of reverting the transaction if the file storage fails.

hans14:05:32

@casperc: we're committing to our file storage before we transact in datomic and live with the potential garbage that we accumulate that way.

casperc14:05:01

Yeah that is probably a good option

casperc14:05:31

Inconsistencies can still happen though, as we need to change files and metadata, so it is not just a missing reference to a file in our case.

Ben Kamphaus14:05:11

Annotate transactions by file or a hash on state of file they refer to, if file store commit fails then retract all tx data for all transactions with that annotation.

Ben Kamphaus14:05:28

That and other strategies can be trivially adapted from the examples in the second half of that video.

casperc14:05:18

Cool, I am looking through the video so I’ll keep watching.

Ben Kamphaus14:05:08

I would say the semantics in the domain and the other file store's consistency have more to do with the problem than Datomic's constructs.

Ben Kamphaus14:05:35

I.e. If you can only transact to Datomic with a push from an event where the file is definitely stored, it's a simple problem.

Ben Kamphaus14:05:35

If you need Datomic to know you tried to commit something elsewhere and then somehow update it when the file is available for sure, and you can only poll after some vector clock time or something to build an expectation that he file is there for good, that's more complicated.

hans14:05:51

@bkamphaus: The problem is not so much whether one can undo something that is not wanted because of an error that occurs later, it is more that the burden of filtering out "uncommitted" data is on the other readers of the database.

Ben Kamphaus14:05:51

Yep, totally understand that limitation and it's true for other aspects of use as well. I would say it's a trade off of the distributed query/read model that applies to other domains as well (I.e. filtering for permission to access)

Ben Kamphaus14:05:23

Depending on the domain as well it may make more sense to just annotate metadata entities as whether their file backing has been verified.

vinnyataide23:05:41

hello, as I was searching through ways to expose my datomic through graphql I stumbled upon the rest api. Firstly I thought it would expose the models, but I saw that it exposes the system itself, so my question is, is it somehow possible to use it as an complete api or is the graphql the best option for non optionated back end apis?