Fork me on GitHub
#clojurescript
<
2022-03-29
>
tabidots02:03:25

I’m working on a client-side app where one big EDN data blob (~10MB) is stored in the data-attribute of a div in the HTML. However, since the data grab is synchronous, it locks the browser during that time and I can’t implement any kind of “Loading…” screen. I have tried • making it an atom and reset!ting it in the init function • using add-watch to detect when the atom is updated • using a button to trigger it, just as a test but I can’t get around the browser locking up temporarily. What is the conventional solution for this?

;; in "data" namespace

(defn get-data
  [tag]
  (-> (.getElementById js/document "my-app-data")
      (.getAttribute (str "data-" tag))
      (reader/read-string)))

;; old way
(defonce all-nouns (data/get-data "nouns"))

;; current way
(defonce all-nouns (atom {}))
;; in "app" namespace
(reset! data/all-nouns (data/get-data "nouns"))

Cora (she/her)02:03:11

maybe try parsing the edn in a web worker?

tabidots03:03:21

Thanks, I’ll give it a try. Have absolutely no clue what I’m doing though 😅

tabidots03:03:56

I’ve kinda started working on it but I guess I’m not really understanding the concept. You can only send little messages back and forth?

tabidots03:03:32

I just want the web worker to store the data blob. The actual processing I’m doing is really complicated and spread out over multiple namespaces already

Cora (she/her)03:03:23

you'd pass the data string to the web worker by passing it as a message. the web worker would parse it on its thread so it doesn't block the main thread. after it's done it would pass the parsed edn back to the main thread

tabidots03:03:29

The app is kind of a dictionary, so I want the worker to contain the blob with all the word data. Then the app would fetch the word data from the worker, do all the processing, then show the result

Cora (she/her)03:03:16

personally I'd just move the parsing over into the web worker and then keep as much as I can in the main thread since that's so much easier to deal with

Cora (she/her)03:03:26

there's a cost to communication between the main thread and the web worker, you know? may as well not pay that cost unless you absolutely need to

Cora (she/her)03:03:34

all of this is predicated on the parsing of the edn being the slow part. you should benchmark to see if that's the case

tabidots03:03:08

Yeah, I’m using shadow-cljs. But I still don’t quite follow what this is supposed to look like (their example is way too bare-bones)

tabidots03:03:05

The EDN is definitely the slow part, the subsequent processing is nearly instantaneous. The bottleneck just happens once at the beginning

tabidots03:03:01

So I should be able to send a message like “violin” to the worker and it gives me back the EDN data about “violin” right?

Cora (she/her)03:03:27

I think so? I've never used web workers either but it sounds like the right thing to use here

tabidots03:03:43

This is what I have in the worker file so far, but I don’t think this is what it’s supposed to be

(ns slovarish.frontend.data
  (:require [cljs.reader :as reader]))

(defn get-data ; 
  [tag]
  (-> (.getElementById js/document "slovarish-data")
      (.getAttribute (str "data-" tag))
      (reader/read-string)))

(defonce all-nouns (atom {}))

(defn init []
  (reset! all-nouns (get-data "nouns")))

tabidots04:03:36

because I can’t figure out how to call the web worker from my main app namespace in order to know when the EDN blob is loaded

craftybones04:03:50

One other option is to see if you can process the data as a stream

craftybones04:03:28

Given that it is a large edn blob, this might not be possible

craftybones04:03:16

Is there anyway for you to not send such a large EDN blob? And if you need to, is there anyway you can do it asynchronously instead of embedding it into the html doc?

tabidots04:03:16

The main reason is that I want to just host this on GitHub pages without setting up a backend. I originally thought I might need to use a DB but when I asked for advice a couple months ago here, people told me to just embed the EDN in a data attribute

tabidots04:03:58

Because a full-on DB is overkill for what I’m doing. 10MB blob that doesn’t need to be written to

craftybones04:03:29

Then web workers are your best way out

craftybones04:03:44

Don’t embed the data in the doc. Keep it as a separate file, use a web worker to fetch it in the background

tabidots04:03:11

The Streams solution would be great if I could get it to work—I could even implement a real loading bar that way. But yeah it looks like that would require the resource to exist externally somewhere. Would CORS be an issue if I just put the EDN somewhere else in the repo?

craftybones04:03:30

Why can’t your edn blob be a separate file?

craftybones04:03:43

No, it would just be like fetching an image

craftybones04:03:55

If your data is as large as 10mb and synchronously processing it is not being performant, then having that resource externally seems to be the right way

tabidots04:03:01

Ah, cool. This is my first project in a while and when I asked about it originally I had no idea that you could even embed EDN/JSON in data attrs like that

tabidots04:03:30

so I just went with it. I figured having it be external would cause CORS issues

tabidots04:03:27

I’m still not clear about how to get the web worker to do what I want. Right now I load the EDN blob into an atom and query it from a couple different namespaces in my app. The blob is a map, so I’m working with Clojure objects from beginning to end. Would I have to sacrifice some of this ergonomics in order to use a web worker?

pithyless05:03:15

> The main reason is that I want to just host this on GitHub pages without setting up a backend. It's early morning, so I apologize if I'm missing the obvious point - but why can't you save the data as a separate file in git (under the correct public and/or resources dir where github pages look for images, etc.) and just fetch it asynchronously via javascript? That should unblock the browser thread and it can render a loading spinner, etc. Also, if it turns out the actual parsing of the EDN blob is an issue - I would consider testing with an alternative format like transit+json. It would require an additional step in the build (before pushing to git), but it may turn out parsing via transit+json is noticeably faster in your specific case that it's worth the hassle (browsers have very fast JSON parsers, and you still get regular Clojure data on both ends).

tabidots05:03:29

Cool, thanks. I think this is the way to go. I think my next step will be to try to put the data in an external file and implement something like this https://javascript.info/fetch-progress … However, I am a beginner with interop so it will take me a very long time to translate this. The only question on SO I found related to fetch and CLJS is this, which is pretty minimal https://stackoverflow.com/questions/58665259/how-to-i-get-the-body-text-of-a-response-object-returned-by-the-fetch-api-in-clo

ahungry05:03:35

if you wanted a single html file model, and not making special web-worker.js runners, or changing data format, maybe you don't need the 10MB edn to be one single edn object (I'm guessing its a big map with many keys, or a collection of maps etc.?)? It looks like bubbling down to the edn implementation, it reads a char at a time - if you can't partition your data because its giant strings of binary/data in a few objects, maybe it makes more sense to keep as a string and break into two data attributes - one with the indexes and the other with the string offsets to pull out just the data as needed?

tabidots05:03:43

Oh right, I didn’t think about partitioning the data. It’s definitely doable. The shape of the data is a map with about 14k keys where each value is a vector of maps. No individual kv pair is overly large, so maybe it’s worth splitting them into chunks of 1k kv pairs and seeing how that works.

ahungry05:03:41

that might let you yield the main thread more easily, especially if you staggered the calls via settimeouts to give all the other calls a moment to run, and as each portion runs, you could recombine into your existing atom format

tabidots05:03:50

Yeah that sounds good. Easier to implement while still letting the user see some progress (if choppy) while it loads. I’ll give it a go tomorrow. Thanks!

pithyless05:03:00

> However, I am a beginner with interop so it will take me a very long time to translate this. If you want something easy and don't mind including an external dependency in your build, I'd just pull in something like https://github.com/r0man/cljs-http and call it a day. :) PS. cljs-http also supports progress monitoring

lilactown14:03:27

no matter what, you'll need to partition the data to get the performance you want. whether you're fetching it or reading it from a block on the page

lilactown14:03:36

you might also check to see how long the page takes to load completely - a 10mb page on initial load is pretty big. just loading and parsing the page might take a few seconds

lilactown14:03:22

and in that time, it's difficult to show a loading message

tabidots14:03:49

@U05476190 Oh, nice… I didn’t realize that about the progress monitoring! I’m already using clj-http in the “backend” (data-generating scripts) so using something similar in the frontend seems logical

tabidots14:03:34

@U4YGF4NGM I haven’t deployed the app so I’ve just been interacting with it on my local machine. The wait time doesn’t bother me (it’s just a hobby/portfolio project)—well, I mean, yeah it does take more than a few seconds but just a little progress bar or wheel or something to indicate that the page hasn’t frozen would be fine. Actually when I originally asked on here for advice a couple months ago, people were telling me 10mb is no big deal 😅

lilactown15:03:48

always varies on use case. 10mb over a mobile connection or spotty wifi when I need to do something sucks! but if it doesn't bother you or your audience then no harm, no foul

Robert Brotherus10:03:31

I have re-frame app created from re-frame template with +test option so that I can run tests with karma runner. I have made several tests and running them works fine with npm run watch + karma start (continuous) or npm run ci (single run). I would like occasionally to run only subset or single test, either in continuous manner or single run. Have not found solution with some amount of googling. What I have googled and tried so far: • There are general Karma JS instructions for running single tests, but they do not seem to be applicable for the cljs-test + karma combo https://stackoverflow.com/questions/26552729/karma-run-single-test • Instructions for cljs karma-reporter https://github.com/honzabrecka/karma-reporter tell that tests can be run from cljs-REPL as well and that might provide a way. However, when I go to (shadow/repl :app) and (clojure.test/run-all-tests) none of my tests are executed, presumable because they are in a separate testproject/folder created by re-frame template +test. • I tried also (shadow/repl :karma-test) but (clojure.test/run-all-tests) gives there "No available JS runtime."

roklenarcic13:03:38

I’ve put a clojure map into IndexedDB in browser, then when I fetched back I’ve got a JS obj that I cannot do much with. I have tried calling js->clj on it but I get:

TypeError: coll.cljs$core$IEmptyableCollection$_empty$arity$1 is not a function

roklenarcic13:03:15

Is there some sort of trick to this, or do I have to do my serialization/deserialization when working with indexeddb

p-himik13:03:19

I'm 80% sure it's the latter.

kennytilton14:03:50

@U66G3SGP5 Ugh. I hate it when this happens. 🙂 I do not recognize that error, but in the past I have managed to crack these stubborn objects with goog.object. I start with (goog.object/getKeys some-obj) to see if that works, then (goog.object/get some-obj "someProperty") if so. 🤷 hth

p-himik14:03:44

Not sure what you mean, but one should definitely not use goog.object on CLJS data structures, and one should definitely not store CLJS data structures without serialization that explicitly supports them (built-in browser facilities do not support them).

kennytilton14:03:52

I used it on an event handler event.

kennytilton14:03:31

You'd think js->clj would work but nope.

p-himik14:03:50

That's a completely different problem. And yes, js->clj will not work on any objects that aren't plain JS objects/arrays/primitives, by design (there's an exception but it's not important).

p-himik14:03:34

Also, just in case - no need for goog.object unless your keys are dynamic. You can just use interop with externs inference.

kennytilton14:03:29

I saw " I’ve got a JS obj that I cannot do much with" and offered what had worked for me before. And I 🤷ed 🙂.

p-himik14:03:49

The first part of the original sentence is more important. :)

p-himik14:03:38

In short, OP's problem: CLJS data -> put into IndexedDB -> get from IndexedDB -> garbage out. Reason: lack of serialization.

kennytilton16:03:32

Oh, I get it. The problem really was on the put. IndexedDB did its best--accepting the put, storing sth that would e unusable when returned. Brave effort by IndexedDB! But then its really garbage in? Thx for the clarification! 🙏

👍 1
roklenarcic07:04:02

Yes I guess I will use pr-str to print to EDN

roklenarcic07:04:27

I guess I will see what happens to js-joda objects