hi everyone, I'm using ring + reitit + muuntaja and am trying to construct a response where I stream a clojure.java.jdbc query result as json, I was wondering if there's any prior art for that somewhere?
next.jdbc has nicer streaming support
Jsonista has three open PRs trying to expose streaming API to write (and read): https://github.com/metosin/jsonista/pulls
sweet - yeah the stack i'm working in uses a lot of older libraries, I need to be judicious in one parts I upgrade because each upgrade can have a wide-ranging impact. would be great to use next.jdbc though
So right now you pretty much need to just take the object-mapper from Jsonista, and then use Jackson directly
Muuntaja probably isn't really useful here
If you need a real streaming response, you might use to get a OStream where you can write the encoded JSON
So you'll build the ring response map directly in the handler code and Muuntaja isn't really used to encode Clj -> JSON
I think I've also previously just called JDBC directly, going around clojure.java.jdbc to get better performance in a such case. I don't think I have public example anywhere.
Next.jdbc has some notes about connection options that are required for streaming result sets (it depends on the DB): https://cljdoc.org/d/com.github.seancorfield/next.jdbc/1.3.939/doc/getting-started/tips-tricks#streaming-result-sets-1 Even if you do everything else, without these options the DB driver will load full result set from DB and you might run out of memory
Writing streaming JSON looks something like this:
(def ^com.fasterxml.jackson.databind.ObjectMapper mapper (json/object-mapper))
(defn streaming-json-array-response [db _opts]
(ring-io/piped-input-stream
(fn [^OutputStream output-stream]
(let [^com.fasterxml.jackson.databind.ObjectWriter writer (.writerFor mapper Object)
^com.fasterxml.jackson.databind.SequenceWriter seq-writer (.writeValuesAsArray writer output-stream)]
(...
(.write seq-writer (into {} row)))
(.close seq-writer)
))))
{:status 200
:headers {"Content-Type" "application/json;charset=utf-8"}
:body (streaming-json-array-response db {})}
With next.jdbc you can use plan function to get a Reducible you can go over with reduce and then call .write with each row.
And you probably should catch exceptions, because when you are streaming the response, 200 OK is already sent if you get an exception when writing JSON for some item, so you can change the response status.
I think we got this working with ~2GB JSON output with -Xmx 100M or somethingwow this is perfect, thank you!
It might be also important to create a transaction for the query, even though you only run one read-only query. Might be specific to Postgres.
I have the pleasure of using oracle π
It can be done without Jackson quite easily IF you have a lot of JSON documents that you need to stream and the result can be line-delimited JSON (JSON-LD). However, if you need a valid (single) JSON value, then Jackson is the way to go. π
Also perhaps worth mentioning: next.jdbc doesnβt conflict with clojure.java.jdbc so you can use both in the same project.
Maybe you'll find this useful https://github.com/bsless/jsonista.streams/
that looks useful! thank you. what i'm hoping for, which might not be realistic, is to stream a data structure that looks something like
{:key-1 "some-value"
:key-2 "some-value"
:database-records streaming-sequence-of-values-as-array}
I can kinda see how this would be possible using the jackson API, but haven't gotten there yetthat's possible but hard you can do it manually with the httpkit channel api
also, this response format would probably be harder for clients to stream process
hm that's a good point π
It's probably better to return a JSON array or JSON-LD
you can refer to how analytic databases like Druid or Clickhouse allow users to format the responses
It can be done with the Jackson API. I might have an example somewhere.
On read side you can use .readTree to read JSON object with some large arrays:
(defn read-geojson-features
"Try reading geojson file without loading full features array to memory"
[^Reader f]
(let [^ObjectMapper om json/default-object-mapper
^JsonNode tree (.readTree om f)
^JsonNode node (.get tree "features")]
(map (fn [node]
(.treeToValue om ^JsonNode node ^Class Object))
node)))I don't think switching to StreamableResponses alone is enough to stream JDBC results to the response, without having the full collection in memory
But you could probably replace piped-output-stream with custom StreamableResponseBody implementation, that does the streaming JDBC query
Because you need to keep the DB TX open until everything is written to the client, which doesn't work if you just return a lazy seq from the handler
I got something working π streaming a large data set placed in an arbitrary position in a clojure data structure - will share soon
I think writing can be done with ObjectWriter, you can get from the object-mapper
I can't found a example. Maybe I tested it somewhere, but it didn't end up being used. I think it is something similar to that read example, you need to write the top level properties yourself, but you can use jsonista to convert the values of those values. Then you use some calls to stream the large array into the JSON output.
If I put something together I'll be sure to share it π
MGSAlert.mp3 ! https://github.com/metosin/muuntaja/blob/master/doc/Configuration.md#using-streaming-json-and-transit-encoders
ooooh!
your sound effect for this post https://youtu.be/2P5qbcRAXVk