2023-12-24 transit | Clojure Slack Archive

transit

p-himik 2023-12-24T21:25:52.141589Z

Have there been any considerations to add caching for collections? For example, consider this:

(let [m (create-a-huge-map)
      a {:a 1}
      b {:b 2}]
  {:init         m
   :before-items [b m]
   :after-items  [m a]})

In this scenario, m would be serialized 3 times. It seems that the writer+cache can be clever enough to serialize m only once. Something like a poor man's structural sharing.

souenzzo 2023-12-26T10:30:23.746899Z

Have you tested this in a realworld scenario? I think that https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding, that should be used by default in most of json API's, should handle this kind of duplication.

p-himik 2023-12-26T10:38:18.573089Z

Have you tested this in a realworld scenario?Well, I tested how serializing a relatively small data structure into Transit crashes the tab. :) The data is small in terms of memory due to structural sharing, but traversing it completely takes a huge amount of time. Using GZIP won't help because serialization just take a long time. Even if the results fit into memory, the browser will crash the tab due to it being unresponsive. I don't know why sometimes the browser shows the "Do you want to wait or kill this tab?" dialog and sometimes it's "Oops, the tab crashed" message. Probably also memory-related, but that would have to be some limit from the browser and on the tab itself, not from the system.

p-himik 2023-12-26T10:39:27.975839Z

Maybe GZIP can help with crashing if the compression is done during serialization. No clue by how much though given that keys and some other things are already cached and compressed.

p-himik 2023-12-26T10:41:14.094829Z

But it certainly won't help with serialization time. In my particular case, a hypothetical perfect map cache would reduce the time it takes to serialize a particular map down to a second or so. Without such a cache, the tab crashes after a minute or so, and I have no idea how much there was left to serialize.

souenzzo 2023-12-26T10:51:25.135229Z

answering myself with data: gzip don't help on this scenario. some years ago, I tested long.qualified/keywords vs keywords and on that scenario, gzip helped. But it was the same datastructure.

(let [t-sizes (fn [value]
                (let [gzip (ByteArrayOutputStream.)
                      gzip* (GZIPOutputStream. gzip)
                      json (ByteArrayOutputStream.)
                      gjson (ByteArrayOutputStream.)
                      gjson* (GZIPOutputStream. gjson)
                      raw (ByteArrayOutputStream.)]
                  (with-open [w (io/writer json)
                              gjson-w (io/writer gjson*)]
                    (cheshire.core/generate-stream value w)
                    (cheshire.core/generate-stream value gjson-w))
                  (cognitect.transit/write
                    (cognitect.transit/writer gzip* :json) value)
                  (cognitect.transit/write
                    (cognitect.transit/writer raw :json)
                    value)
                  (doto gzip* .flush .close)
                  (doto gjson .flush .close)
                  (doto gzip .flush .close)
                  (doto raw .flush .close)
                  {:json   (count (.toByteArray json))
                   :g-json (count (.toByteArray gjson))
                   :traw   (count (.toByteArray raw))
                   :tgzip  (count (.toByteArray gzip))}))
      big-value (into (sorted-map)
                  (for [i (range 1e3)]
                    [(keyword (str "a-very-big-keyword-" i))
                     {:some-random i
                      :values      (* i i)}]))]
  {:x1 (t-sizes {:a big-value})
   :x5 (t-sizes {:a big-value
                 :b big-value
                 :c big-value
                 :d big-value
                 :e big-value})})
{:x1 {:json   60324
      :g-json 9133
      :traw   54353
      :tgzip  8842},
 :x5 {:json   301616
      :g-json 45419
      :traw   187945
      :tgzip  43195}}

p-himik 2023-12-26T10:56:18.921809Z

It's also for plain JSON, not for Transit+JSON that compresses keys.

Clojurians Log v2

transit