Fork me on GitHub
#sql
<
2021-03-06
>
athomasoriginal16:03:30

Hello. I ran into a situation where my tests were failing because rollback was not happening. Digging into the code, I realized I was nesting transactions which, according to my readings, is not a great idea. The code roughly looks like this:

;; the application code
(defn create-product!
  [db ,,,]
  (jdbc/with-transaction [tx dbs]
   (do-thing-1 tx)
   (do-thing-2 tx)
   (do-thing-3 tx)))

;; the test
(deftest test-product! 
  (testing "create product"
    (jdbc/with-transaction [tx testdbs {:rollback-only true}]
      ;; generate mock data 
      (create-product! tx))))
      ;; assertions
What are some patterns that others use for dealing with this?

seancorfield17:03:00

There's a dynamic var you can bind to either disallow or ignore nested TX. In your case you want it to ignore nested TX.

3
athomasoriginal17:03:03

In general, what is the advise on where to use these transactions? I added mine at what I would call the “application” layer, but maybe the transactions are better in general in the handler? Perhaps this is an architecture question though. :thinking_face:

seancorfield18:03:51

@tkjone I almost never use transactions. I use them only if I have a specific sequence of mutations that I need to either all succeed or all fail. In practice, that is pretty rare.

athomasoriginal18:03:10

hmmmm. Okay, I will take note of this. In my use case, it’s being used after a successful sale where a order is generated, cart cleared etc. I wouldn’t want one to succeed while the others fail. would you say this is a valid use case? (apologies for the limited information)

seancorfield18:03:56

Yeah, e-commerce is sometimes a good case for a transaction. I tend to point folks to https://ieeexplore.ieee.org/document/1407829

seancorfield18:03:41

Transactions work well "in the small" but they don't scale well (and can't be nested in JDBC)

seancorfield18:03:30

clojure.java.jdbc slightly ignored nested TX (so your test code would likely have worked with c.j.j) but it also meant that accidental nesting were not detectable.

seancorfield18:03:15

next.jdbc takes the "do what I say" approach and that's why it specifically documents use named and unnamed save points on connections instead of simply relying on transactions.

seancorfield18:03:52

And it's also why you can tell it to throw an exception if a nested TX is attempted or to simply ignore nested TX, just like c.j.j silently did.

seancorfield18:03:39

In general, even in e-commerce, I tend to explicitly control mutations, with try/`catch` and manually undo things if part of a "transaction" fails -- that way I can control all the logging (as in, to the database) separate from the actual e-commerce actions.

seancorfield18:03:18

It's more work, but it gives you more control, and you can see a "paper trail" in the (logging tables) of what was attempted and what failed.

seancorfield18:03:20

After all, what happens if you charge the customer and then fail to insert the records in your local DB? Or you've performed some remote actions and then others fail? Or your e-commerce process is a mixture of several remote interactions and several local database ones?

seancorfield18:03:49

None of those scenarios can be handled just be JDBC transactions.

seancorfield18:03:11

It's why, at work, we have processes that reconcile remote e-commerce logs (PayPal, Braintree, etc) with what ends up in our DB, so we can detect "orphan" transactions (in the financial sense) caused by partial failures during the overall process.

athomasoriginal23:03:06

Thanks, Sean! That was a great response and I see the approach you take makes more sense in many ways :man-bowing:

seancorfield23:03:27

It's tricky because whenever we learn about SQL, RDBMS, and JDBC, it is always emphasized how important transactions are to data integrity -- but the real world has grown a lot more complicated and a simple, in-database transaction just isn't sufficient a lot of the time these days (which in turn also makes our lives so much more complicated too!).

3
athomasoriginal23:03:12

You mentioned “logging tables”. How do you mean?

athomasoriginal23:03:54

And in your model, if something fails, and you have the logs, at what point do you course correct?

seancorfield00:03:04

Pretty much every action we take is recorded in a database table -- as an append-only log -- in addition to whatever transactional data we record (in the business sense). That includes actions we take on behalf of our members (customers) as well. That means that if we perform cleanup on transactional data, because some part of the process failed -- i.e., we "manually" roll back the changes -- we can still see in the logging tables a) what actions we successfully took up to that point and b) what actions failed (because we log all failures).

seancorfield00:03:33

Some people just stream their logs to some other system, or a non-traditional database, but we've found value in having them all together so we can more easily run reports that contain both transactional and non-transactional data. The "downside" is that we have several tables with hundreds of millions of rows (but MySQL has been surprisingly good, even at that scale).

seancorfield00:03:24

It also makes it easier to design and run batch processes that can identify discrepancies and reconcile them after the fact if needed. We have a number of "data migration" processes we've built over the years that have allowed us to correct transactional data that reflected bugs in our systems, even if the bug wasn't discovered for a couple of years, because we often also have the raw action/event logs, and can then patch the system forward from that point. An ad hoc event streaming system, in effect.

athomasoriginal00:03:15

If I understand correctly you would have “business tables” (cart, product etc) and “log tables”. For every transaction to any of the business tables, you have a corresponding log to the “log tables”? Then, if something fails during a transaction, you manually rollback (thanks to try/catch) but know what you were originally trying to do because they are recorded in the “log tables”. yes? Would you also log to the log tables when performing a manual cleanup?

athomasoriginal00:03:03

> It also makes it easier to design and run batch processes that can identify discrepancies and reconcile them after the fact if needed Yes, this makes sense and was one of the Q’s I had left over. With the logs though, it would be easier to fix the issues, I imagine.

seancorfield00:03:31

Right, yes, we log actions we are taking and we log failures of those actions (and successes if we need the result of those actions). The "manual" recovery in the code usually has the data on hand to know what to do in terms of updating the transactional data but we don't rollback the log table data -- we may well log the failure that occurred, but we don't necessarily need to query the log table to figure out how to recover.

seancorfield00:03:20

So we update transactional data but we do not update log data, we just write additional log data saying what update action we took. Does that help?

👍 3
seancorfield00:03:51

So a transaction table may get a row describing a pending transaction and a logtransaction table may get a row describing the transaction we intend to perform. Then the payment settles and we update transaction to settled or whatever and logtransaction gets a new row indicating the settlement information.

seancorfield00:03:10

(reduce transactional-change initial-state event-stream) 🙂

seancorfield00:03:01

(our system isn't quite like that but that's the parallel I would draw -- we're not a true event sourcing setup, because we got started with this a long time ago: the v1 was built in 2001, v2 broke ground in 2009, and I'd say we're most of the way into v3 at this point, having migrated and transformed all that data from v1 to v2 to v3 -- the joy of long-lived systems)

athomasoriginal01:03:23

Yes, I believe that’s enough to go on 🙏

mg22:03:14

Just want to say, I used c.j.jdbc for a long time, but started switching to next.jdbc in recent projects, and absolutely loving it. Reducing over plan is super powerful. I haven't dived into the nav/datafy parts yet but very excited about it

mg22:03:35

(So far the one thing that I haven't liked is the way connection/component works. Having to call the resulting component as a function to get the datasource is a bit awkward, as it means callers need to be aware of the difference between using that or using something that has value semantics. I worked around this with my own version:

(defn new-hikari-connection-pool
  "Returns new hikari connection pool component from db spec."
  [db-spec]
  (let [;; Hikari wants :username key rather than :user
        db-spec (set/rename-keys db-spec {:user :username})]
    (with-meta {}
      {`component/start
       (fn [_]
         (let [pool (connection/->pool HikariDataSource db-spec)]
           (reify
             component/Lifecycle
             (start [this] this)
             (stop [this]
               (.close pool)
               (new-hikari-connection-pool db-spec))

             p/Sourceable
             (get-datasource [_] pool))))})))

mg22:03:36

At the same time probably this reflects even better on the core design - the fact that central next.jdbc protocols made this so easy

seancorfield23:03:09

@michael.gaare Glad you're liking it. I would note that a connection pool is a DataSource already so perhaps implementing p/Connectable would be more natural here? The function call idiom is something we've been using extensively at work where Component-compatible things are callable to return the thing they wrap and we really like that.

lukasz18:03:37

@U04V70XH6 this sounds really interesting - do you have some sample code you could share?

seancorfield18:03:42

@U0JEFEZH6 Beyond what's in next.jdbc you mean?

lukasz19:03:16

@U04V70XH6 yes, just the idea of making "where Component-compatible things are callable to return the thing they wrap and we really like that."

seancorfield20:03:00

OK, some examples from our code base at work: 1. our configuration Component: reads and merges EDN files at startup; function invocation acts like key access into the config: (cfg :messaging :datasource) 2. our JWT Component: reads a key file at startup and generates a secure seed for our tokens; function invocation returns the underlying seed 3. our Redis pooling library: creates and starts a Jedis Pool at startup; function invocation returns the pool 4. our HTTP server Component: starts the web server and compiles the routes; function invocation decodes the route and runs the associated handler ... and many, many more

lukasz20:03:54

Gotcha - this is really cool, it fixes the issue we have in components, where you have to know the internal structure to dig out the wrapped object/connection/etc. Than you for the details!

seancorfield20:03:48

And it's nice that a function can carry metadata so your component start can return a fn with metadata for stop.

seancorfield20:03:30

(which is specifically what next.jdbc does for Component for the pooled datasource)

lukasz15:03:51

Indeed :thumbsup: I'll study the component function in next.jdbc a bit more, it does open some interesting possibilities

mg23:03:42

Having a component that implements p/Sourceable works well in my case because I inject into other service-level components. Those components then use it directly (jdbc/plan datasource ...) or (jdbc/execute-one! datasource ...) or whatever. For that pattern it's nice to be able to just inject in a db-spec map as a datasource for dev/testing, and have that work exactly the same as the connection pool

seancorfield23:03:36

Fair enough. We use connection pools in dev so the same code works everywhere for us.

mg23:03:07

I can see the advantages of the unwrap-with-function pattern for components more broadly though. Not everything has a nice set of protocols to hook into

seancorfield23:03:18

I just announced com.github.seancorfield/next.jdbc {:mvn/version "1.1.643"} (in #announcements) and I wanted to call out the GraalVM compatibility change here in particular: I've copied next.jdbc.prepare/execute-batch! to next.jdbc/execute-batch! and would encourage anyone using execute-batch! to switch over to the next.jdbc version. The original next.jdbc.prepare version ended up depending on next.jdbc.result-set (which already depended on next.jdbc.prepare) and to break that circular dependency, I used requiring-resolve at runtime. Clearly, execute-batch! should never have been added to next.jdbc.prepare in the first place -- a bad decision on my part -- but in order to avoid breaking changes, I have modified next.jdbc.prepare/execute-batch! to rely on a volatile! Var that is set up in next.jdbc.result-set -- this is a workaround suggested by @borkdude as a way to get GraalVM compatibility when you would otherwise have a dynamic runtime dependency. It shouldn't be a breaking change, since you would normally require next.jdbc to call prepare to get a PreparedStatement object which is what execute-batch! requires, and next.jdbc requires next.jdbc.result-set so the new Var should always be set up. However, any code that somehow managed to not cause next.jdbc.result-set to be required and yet was requiring next.jdbc.prepare and calling execute-batch! may now fail. Like I say, I don't think any real world code could call execute-batch! without somehow causing next.jdbc.result-set to be required but I wanted to post this caution, just in case you update to 1.1.643 and your code breaks.

seancorfield23:03:37

In addition, I'm double-publishing all my libraries to both the existing seancorfield group and the new com.github.seancorfield group so that I can switch them all over to verified group names per the recently-announced Clojars policy. depstar and clj-new already started doing that last week (and the depstar README now explains how I do it). I plan to keep the double-publishing going until download stats on Clojars indicate dropping the non-verified group will cause very little inconvenience -- but, obviously, I'd like folks to switch over to using the new group whenever they next update their dependencies. Any new libraries I publish will be under com.github.seancorfield.