xtdb 2021-04-02 | Slack Archive

Jacob O'Bryant03:04:59

if you had multiple nodes using jdbc for tx log and doc store, would it be possible/not terribly difficult to share the connection pool between nodes (in addition to sharing between tx log and doc store in the individual nodes)?

adamfeldman03:04:07

I can’t speak to the ease of doing that with Crux. For Postgres (and other systems), there are also connection poolers external to the application https://pgdash.io/blog/pgbouncer-connection-pool.html

👍 3

ordnungswidrig15:04:19

Would the nodes actually have varying number of connections?

refset14:04:42

I've not tried this myself yet, but I believe that you should be able to create a non-closing alternative to HikariConnectionPool, using your own record & constructor, similar to what is seen here https://github.com/juxt/crux/blob/master/crux-jdbc/src/crux/jdbc.clj#L34-L52 You can then create your CustomHikariConnectionPooloutside of Crux's system.clj lifecycle management and simply pass it in to as many nodes or independent Crux instances as you need - by overriding it in the start-node configuration, similar to the instructions here: https://opencrux.com/reference/jdbc.html#_sharing_connection_pools (where you specify it as a value for the :connection-pool key in the modules)

Jacob O'Bryant02:04:22

Thanks, this looks helpful. After some more googling it looks like it's possible to set the db schema to use when retrieving a connection also, which I think would be the only other piece needed. (I'm not expecting significantly varying numbers of connections; it's just that there could potentially be a lot of nodes) To take a step back--I'm building a self-hosted app that'll let you use Crux as a personal data store, with plugins for importing and exporting data in various ways (e.g. import your tweets and substack posts, write a blog post on the app, export it all to a static site somewhere). As a convenience I'd like the app to accommodate multiple users so that you don't have to self-host; self-hosting would only be necessary if you want to decide which plugins to install. So from a performance perspective, a single node would be fine. But for convenience/simplicity/making it harder for plugins to screw things up, I'd rather not introduce an e.g. :user/id key to every document. Hence multi-tenancy. I'm thinking the best option for now will be to just use standalone nodes with either a cron job or a tx listener for backups, then revisit postgres down the road if needed.

refset11:04:16

Yes, from a conversation on Zulip in Feb I gather setting the db schema in the connection works as you expect > it "just works" by setting the `:currentSchema` properly of the db-spec passed to connection-pool. sweet! https://juxt-oss.zulipchat.com/#narrow/stream/194466-crux/topic/Using.20multiple.20postgresql.20schemas.20for.20same.20DB

refset11:04:19

It's hard to judge the effort/time it takes to set these things up, but I'd expect you'd have an easier time with crux-jdbc here rather than rolling your own multi-tenancy control plane thing on top of Rocks-only nodes. That's just a guess really though 🙂

refset11:04:44

Adding a :user/id to every document isn't the worst idea either, we're building a system like this with Crux at the moment, but it takes some careful design & testing to guarantee isolation in the queries and writes

Jacob O'Bryant13:04:07

From what I can tell (which could be inaccurate; I don't have much experience in this area), that won't work for this situation--the difference is that he has a small, fixed number of schemas (2), so he's just making two separate connection pools, each with its own schema. whereas I need a schema per user and thus need to reuse the same connection pool across multiple users/tenants, and when a crux node retrieves a connection from hikari, the schema for that connection needs to be switched to the node's schema. (right?) I've found several blog posts that describe doing similar with spring boot, for example http://stuartingram.com/2016/10/02/spring-boot-schema-based-multi-tenancy/:

@Override
  public Connection getConnection(String tenantIdentifier) throws SQLException {
    final Connection connection = getAnyConnection();
    try {
      connection.createStatement().execute( "USE " + tenantIdentifier );
    }
    catch ( SQLException e ) {
      throw new HibernateException(
          "Could not alter JDBC connection to specified schema [" + tenantIdentifier + "]",
          e
          );
    }
    return connection;
  }

So I think I'd just need to make a non-closing version of HikariConnectionPool like you suggested, and then figure out how to map these spring boot examples onto that. It might not be that difficult--it's just an unfamiliar area for me... which impacts my time estimation ha ha. Whereas I already know exactly how to get the standalone version set up; just make a unique db dir for each user on a volume and then e.g. have a cron job rsync the volume somewhere. It would be nice to have the stronger durability from jdbc/somewhat lower ops burden, not to mention the option of scaling past a single machine (though I don't expect to need that anytime soon, if ever). I'm thinking of going ahead with standalone, and then once everything else is ready for release, take a stab at getting jdbc to work and hopefully it goes smoothly. But let me know if you think I'm overestimating the complexity of getting jdbc to work.

refset15:04:53

Right, yes that makes sense. Hmm. I guess another solution could be to modify crux-jdbc to use provided table names from the node config, then tenants could all live in the same schema. I can't promise that we would have appetite for doing that work on crux-jdbc ourselves, but we are due to work on the module again soon to address https://github.com/juxt/crux/issues/1463 - so we could probably spare some brain cycles consider your options at the same time. I'll bring it up with the team tomorrow. Are you working to an ASAP timeline? Could you wait a month?

Jacob O'Bryant16:04:43

that would be fabulous if you did implement that/if that's something I could feasibly implement. no time pressure on my end, a month is fine. I'm happy to just use standalone for the near future

👌 3

refset11:04:10

Please don't actually use this, as it's flawed for several reasons (described within) https://gist.github.com/refset/a5f810f18590a6f0d00630b102b78d00 ...however it might give an idea of just how easy (or not) it could be to create your perfect low-ops multi-tenant setup using whatever backend you like. For JDBC specifically it might be that adding a node ID column is the best way to go (rather than multiple schemas or tables), but managing the indexing from a single consumer/listener, as above, is likely to be most efficient

👀 3

refset12:04:08

I know you wouldn't want to use a cloud service, as with https://github.com/avisi-apps/crux-datastore, but maybe there's something for you in the sweet spot between JDBC and Kafka

Jacob O'Bryant00:04:43

Thanks! this is helpful

Jacob O'Bryant03:04:30

by the way--after some more thought I've decided to come to my senses and just add a node/user id to each document. separate dbs would be nice to have but probably aren't an option without me violating the KISS principle 🙂

refset13:04:48

Aha, that seems fair, thanks for keeping us posted. Adding some sort of multi-instance control plane layer to Crux does seem worth us figuring out eventually, or we could write a k8s operator!

Jacob O'Bryant22:04:23

yeah, if you do add something in that domain I'd definitely be interested!

👌 4

2021-04-02

Channels