Fork me on GitHub

Pushed some ideas about fast JDBC access for Clojure (related to the next.jdbc discussion) into a small experimental library, metosin/porsas. It leans on query compilation (like many libs in Java/Scala), supports records, simple & qualified keys for maps and has initial support for streaming results. Code is found in here: Comments welcome.




small nitpick after reading the readme: the :con parameter is a bit surprising, it certainly sounds odd in english or french 🙂


one big thing that's missing is ireduceinit support ("reducible result-sets"), it allows to bolt a result-set into a transformation chain with (kinda|sorta) zero cost. I guess it adds some overhead to get some flexibility


I think squee is the "first" jdbc clj client library that used it: -


fyi: Ghadi worked on generalisations of that pattern

👍 4

@U050SC7SV :con, oh, should it be :conn or :connection?


there's no rule, but one of these 2 is better yeah


thanks, now a connection. Need to read about reducibles and check the perf too. next.jdbc has those too.


Finally getting around to adding c3p0 to my app. The one thing that is confusing me is the need to specify the database name upfront. Can I just specify a default database (i.e. postgres)? Or is it better to specify a database name used within the app? What if multiple databases are used?


There are three key use cases (for queries) in my mind: process a large result set (as fast as possible), fetch a single row as a (fully realized) hash map (or record or...), fetch multiple rows as a sequence of (fully realized) hash maps.


The reducible-query in and the reducible! in next.jdbc aim to tackle the first use case -- where no hash maps need to be created in many cases since access by key uses the raw Java interop of JDBC itself.


porsas is tackling the third use case primarily (right @ikitommi?) and is a variant of some of squee and what I've been discussing with @ghadi for next.jdbc as well. Making it possible to "plugin" different strategies for building rows from ResultSet objects to allow for "just hash maps", "arrays of row values", and possibly the record-based approach of porsas.


next.jdbc has evolved a lot from the very early performance-focused code fragments -- which @ikitommi spurred me to think about after he contributed some benchmarking code to


I think the use case of not making a map is largely artificial


@seancorfield my primary motivation was just general performance. I think porsas actually tackles all three, to some extent: 1) the batching api allows to read stuff in configurable chunks, e.g. 100 rows before flushing to a callback 2&3) porsas has a RowCompiler protocol and impls for both records & maps (qualified & simple keys).


It doesn't allow custom reducer to be used, e.g. "read rows until total fruit.cost > 100".


Are there any examples where the reducible queries are a really good fit?


Control over when consumption of the record-set ends (`reduced`), not creating intermediary garbage when you just want to reduce over the resultset (basically using it as an iterator kinda) or consuming it for side effect for instance or via transduce/educe/sequence etc etc


I am surely forgetting a few. It's useful for sure but it might not be the thing you leverage in "normal" use often.


A few core types/ds are built that way, via ireduceinit or similar means, like cycle, range, iterate, repeat and a few others I think. When they are used via reduce/into for instance they hit that fastpath and no intermediary (lazy) seqs are created, it becomes a simple "iteration"


We use them at work with reduce and transduce as part of large, complex pipelines.


I know, based on JIRA tickets and other feedback, that folks "out there" in the wild rely on reducible-query and streaming result sets to process data that is much larger than can fit in memory.


So, yeah, there are definitely examples where reducible-query is the right approach.


And, to be honest, we could use it a lot more than we do -- it's mostly because we have a large code base that grew up before reducible queries existed -- if we'd had the feature long ago, we'd have a huge amount of usage.


thanks for the explanation. If one has a complex pipeline together with the reducible query, isn't the statement & connection kept open as long as the pipeline is running? e.g. for longer than needed?


I see the value of being able to say "stop" when needed by running a custom code within the pipeline pulling different things from the resultsets based on the current accumulation of data. But a real world use case would help to see if I'm just inventing imagenary use cases..


If you're streaming results from the DB, the connection and statement must stay open until you're done reading. But a reduce-based pipeline can signal completion at any point via reduced, which will cause the statement etc to be closed. If you can't get the results all in memory, that's really your only option -- aside from reading explicitly paginated results and potentially using multiple connections (from a pool, presumably).


Think of it as being akin to attaching a transducer to a core.async channel: data is transformed as it is read through the pipeline. You don't just read everything off the channel into a vector and then run a transform on that (potentially large) vector.


In some ways, I'd like to "encourage" more users onto the reduce/transduce path. And that's really why the execute! part of the API in next.jdbc is implemented in terms of reducible! as I do see some value in having query results that are datafiable and navigable, looking forward into Clojure's future.


I'd want datafy/nav to be opt-in


Even if it's just a with-meta call on each row?


non negligible overhead for something 99% users will not care about, especially in production


isn't it something optional in your lib already?