This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2019-04-09
Channels
- # announcements (4)
- # beginners (44)
- # boot (15)
- # calva (66)
- # cider (66)
- # clojure (75)
- # clojure-austin (4)
- # clojure-europe (2)
- # clojure-finland (1)
- # clojure-italy (3)
- # clojure-nl (10)
- # clojure-russia (3)
- # clojure-sg (1)
- # clojure-uk (109)
- # clojurescript (18)
- # datomic (8)
- # emacs (1)
- # figwheel-main (1)
- # fulcro (5)
- # jobs (1)
- # jobs-discuss (8)
- # kaocha (7)
- # leiningen (11)
- # luminus (2)
- # off-topic (69)
- # pathom (5)
- # re-frame (7)
- # reagent (4)
- # reitit (18)
- # ring-swagger (3)
- # shadow-cljs (123)
- # spacemacs (1)
- # sql (35)
- # tools-deps (89)
- # uncomplicate (3)
- # vim (6)
- # yada (3)
Pushed some ideas about fast JDBC access for Clojure (related to the next.jdbc
discussion) into a small experimental library, metosin/porsas
. It leans on query compilation (like many libs in Java/Scala), supports records, simple & qualified keys for maps and has initial support for streaming results. Code is found in here: https://github.com/metosin/porsas. Comments welcome.
small nitpick after reading the readme: the :con
parameter is a bit surprising, it certainly sounds odd in english or french 🙂
one big thing that's missing is ireduceinit support ("reducible result-sets"), it allows to bolt a result-set into a transformation chain with (kinda|sorta) zero cost. I guess it adds some overhead to get some flexibility
I think squee is the "first" jdbc clj client library that used it: - https://github.com/ghadishayban/squee
fyi: Ghadi worked on generalisations of that pattern https://gist.github.com/ghadishayban/902373e247e920855139902912d237f0
@U050SC7SV :con
, oh, should it be :conn
or :connection
?
thanks, now a connection
. Need to read about reducibles and check the perf too. next.jdbc
has those too.
Finally getting around to adding c3p0 to my app. The one thing that is confusing me is the need to specify the database name upfront. Can I just specify a default database (i.e. postgres
)? Or is it better to specify a database name used within the app? What if multiple databases are used?
There are three key use cases (for queries) in my mind: process a large result set (as fast as possible), fetch a single row as a (fully realized) hash map (or record or...), fetch multiple rows as a sequence of (fully realized) hash maps.
The reducible-query
in clojure.java.jdbc
and the reducible!
in next.jdbc
aim to tackle the first use case -- where no hash maps need to be created in many cases since access by key uses the raw Java interop of JDBC itself.
porsas
is tackling the third use case primarily (right @ikitommi?) and is a variant of some of squee
and what I've been discussing with @ghadi for next.jdbc
as well. Making it possible to "plugin" different strategies for building rows from ResultSet
objects to allow for "just hash maps", "arrays of row values", and possibly the record-based approach of porsas
.
next.jdbc
has evolved a lot from the very early performance-focused code fragments -- which @ikitommi spurred me to think about after he contributed some benchmarking code to clojure.java.jdbc
.
@seancorfield my primary motivation was just general performance. I think porsas
actually tackles all three, to some extent: 1) the batching api allows to read stuff in configurable chunks, e.g. 100 rows before flushing to a callback 2&3) porsas
has a RowCompiler
protocol and impls for both records & maps (qualified & simple keys).
It doesn't allow custom reducer to be used, e.g. "read rows until total fruit.cost > 100".
Control over when consumption of the record-set ends (`reduced`), not creating intermediary garbage when you just want to reduce over the resultset (basically using it as an iterator kinda) or consuming it for side effect for instance or via transduce/educe/sequence etc etc
I am surely forgetting a few. It's useful for sure but it might not be the thing you leverage in "normal" use often.
A few core types/ds are built that way, via ireduceinit or similar means, like cycle, range, iterate, repeat and a few others I think. When they are used via reduce/into for instance they hit that fastpath and no intermediary (lazy) seqs are created, it becomes a simple "iteration"
We use them at work with reduce
and transduce
as part of large, complex pipelines.
I know, based on JIRA tickets and other feedback, that folks "out there" in the wild rely on reducible-query
and streaming result sets to process data that is much larger than can fit in memory.
So, yeah, there are definitely examples where reducible-query
is the right approach.
And, to be honest, we could use it a lot more than we do -- it's mostly because we have a large code base that grew up before reducible queries existed -- if we'd had the feature long ago, we'd have a huge amount of usage.
thanks for the explanation. If one has a complex pipeline together with the reducible query, isn't the statement & connection kept open as long as the pipeline is running? e.g. for longer than needed?
I see the value of being able to say "stop" when needed by running a custom code within the pipeline pulling different things from the resultsets based on the current accumulation of data. But a real world use case would help to see if I'm just inventing imagenary use cases..
If you're streaming results from the DB, the connection and statement must stay open until you're done reading. But a reduce-based pipeline can signal completion at any point via reduced
, which will cause the statement etc to be closed. If you can't get the results all in memory, that's really your only option -- aside from reading explicitly paginated results and potentially using multiple connections (from a pool, presumably).
Think of it as being akin to attaching a transducer to a core.async
channel: data is transformed as it is read through the pipeline. You don't just read everything off the channel into a vector and then run a transform on that (potentially large) vector.
In some ways, I'd like to "encourage" more users onto the reduce/transduce path. And that's really why the execute!
part of the API in next.jdbc
is implemented in terms of reducible!
as I do see some value in having query results that are datafiable and navigable, looking forward into Clojure's future.
Even if it's just a with-meta
call on each row?