This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-12-24
Channels
Have there been any considerations to add caching for collections? For example, consider this:
(let [m (create-a-huge-map)
a {:a 1}
b {:b 2}]
{:init m
:before-items [b m]
:after-items [m a]})
In this scenario, m
would be serialized 3 times.
It seems that the writer+cache can be clever enough to serialize m
only once. Something like a poor man's structural sharing.Have you tested this in a realworld scenario? I think that https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding, that should be used by default in most of json API's, should handle this kind of duplication.
Have you tested this in a realworld scenario?Well, I tested how serializing a relatively small data structure into Transit crashes the tab. :) The data is small in terms of memory due to structural sharing, but traversing it completely takes a huge amount of time. Using GZIP won't help because serialization just take a long time. Even if the results fit into memory, the browser will crash the tab due to it being unresponsive. I don't know why sometimes the browser shows the "Do you want to wait or kill this tab?" dialog and sometimes it's "Oops, the tab crashed" message. Probably also memory-related, but that would have to be some limit from the browser and on the tab itself, not from the system.
Maybe GZIP can help with crashing if the compression is done during serialization. No clue by how much though given that keys and some other things are already cached and compressed.
But it certainly won't help with serialization time. In my particular case, a hypothetical perfect map cache would reduce the time it takes to serialize a particular map down to a second or so. Without such a cache, the tab crashes after a minute or so, and I have no idea how much there was left to serialize.
answering myself with data: gzip don't help on this scenario.
some years ago, I tested long.qualified/keywords
vs keywords
and on that scenario, gzip helped. But it was the same datastructure.
(let [t-sizes (fn [value]
(let [gzip (ByteArrayOutputStream.)
gzip* (GZIPOutputStream. gzip)
json (ByteArrayOutputStream.)
gjson (ByteArrayOutputStream.)
gjson* (GZIPOutputStream. gjson)
raw (ByteArrayOutputStream.)]
(with-open [w (io/writer json)
gjson-w (io/writer gjson*)]
(cheshire.core/generate-stream value w)
(cheshire.core/generate-stream value gjson-w))
(cognitect.transit/write
(cognitect.transit/writer gzip* :json) value)
(cognitect.transit/write
(cognitect.transit/writer raw :json)
value)
(doto gzip* .flush .close)
(doto gjson .flush .close)
(doto gzip .flush .close)
(doto raw .flush .close)
{:json (count (.toByteArray json))
:g-json (count (.toByteArray gjson))
:traw (count (.toByteArray raw))
:tgzip (count (.toByteArray gzip))}))
big-value (into (sorted-map)
(for [i (range 1e3)]
[(keyword (str "a-very-big-keyword-" i))
{:some-random i
:values (* i i)}]))]
{:x1 (t-sizes {:a big-value})
:x5 (t-sizes {:a big-value
:b big-value
:c big-value
:d big-value
:e big-value})})
{:x1 {:json 60324
:g-json 9133
:traw 54353
:tgzip 8842},
:x5 {:json 301616
:g-json 45419
:traw 187945
:tgzip 43195}}