code-reviews 2016-01-25 | Slack Archive

tomc15:01:31

hey @jonahbenton, I suppose I should explain my application a bit more to clarify why I'm manually closing pools. The application is for moving data between jdbc compliant databases - my use case is postgresql to mysql. The entry point accepts a configuration map that describes the databases and tables (a source-to-target mapping), as well as how the transfer can be parallelized. Each mapping can depend on any two databases the user can access, and not all mappings will necessarily be transferred at once, so assuming there's more than 2 databases involved, some connections might not be needed until some point (long) after application startup, and some might cease to be needed (long) before application shut down. In my initial pass of the code I created and closed a connection pool when transferring the data for each mapping. When parallelized, this resulted in multiple identical connection pools being created; I ran out of mysql connections. My solution to that was to manually manage the deduplication of the connection pools, hence the code I posted earlier. If I'm understanding your suggestion correctly, I should just create a pool for each DB at startup and only close those pools at shut down. That would definitely work, but I don't like that this could result in idle connections on databases that are not yet/are no longer needed. Once again, I really appreciate you taking the time to help me out.

jonahbenton16:01:06

hey @tomc thanks for the additional context. will say it back- it sounds like users can on demand initiate workloads that involve the asynchronous parallelized transfer of data from one database to another- pg to mysql. because users can initiate these transfers and the specific kind of transfer may have a kind of "fan-out" behavior in terms of database resource consumption, one understandably wants to be careful about how db resources are consumed. is that fair? the main architectural question that comes to mind is- are the data being transferred all living under the same database user account? or does each user have their own database credentials? this dictates architecturally whether there is one pool per database, or one pool per user. in the former case, the solution still looks the same to me- have a single pool per database created at startup and allow these user-initiated transfers to just consume and release connections from those pools. in terms of sizing the pools, use the tunables as aggressively as needed, and because only one pool is in use, settings like maximumPoolSize will not be exceeded, so the server side resources will be appropriately protected. If the work is asynchronous, have a higher connectionTimeout and lower idleTimeout and can consider reducing minimumIdle, though Hikari's logic for recommending idle and max be the same in most cases is sound. in the latter case, there is a need for a pool per user, because a pool can only utilize a single set of credentials. In this circumstance, more aggressiveness is warranted with pool tunables because now a single pool does not have exclusive use of the database- set max connections per pool to a small number, min to 0, and have connectionTimeout high and idle low so the imposition of the user-specific parallelization will be limited and the utilization of resources by idle users will be minimized. it may be that there is other useful context but that's how it sounds to me from what's been shared so far, hope that's helpful.

tomc18:01:23

@jonahbenton: that is helpful. I'm going to switch to creating every pool at startup. That code will be easier since I won't have to do any counting or even involve any reference types. Thanks for all the help.

2016-01-25

Channels