cherry

borkdude 2022-08-05T12:50:15.071199Z

Another experimental thought for another project. What if we forked cherry, do not use cljs.core at all, but generate immutable JS data structures like this: https://twitter.com/robpalmer2/status/1555495764444545024 Or we could use some immutablejs thing behind the scenes

👍 2
borkdude 2022-08-05T12:57:31.452259Z

or we could use a proxy layer so we can switch implementations later

borkdude 2022-08-05T14:26:14.889219Z

Trying this crazy idea in a branch:

$ ./node_cli.js -e '(prn (str (assoc {} :foo :bar)))'
Map { "foo": "bar" }

borkdude 2022-08-05T14:32:23.198909Z

$ ./node_cli.js -e '(prn (let [[x y] [1 2]] (str [(inc x) (dec y)])))'
List [ 2, 1 ]

mkvlr 2022-08-05T14:46:04.222719Z

didn't push it yet, right?

mkvlr 2022-08-05T14:47:16.724939Z

and then keywords could be Symbols

borkdude 2022-08-05T14:53:58.539799Z

Yes, but what about symbol? ;)

mkvlr 2022-08-05T14:56:11.386209Z

either prefix one, eg keywords with :

mkvlr 2022-08-05T14:56:52.837069Z

but think just using it for keywords would already be a huge speed benefit as it's much more common at runtime

borkdude 2022-08-05T14:57:20.485019Z

I considered also just using a string for keywords

borkdude 2022-08-05T14:57:27.962309Z

since that has benefits for JS interop

mkvlr 2022-08-05T14:59:30.942829Z

how would that be better for interop than Symbol?

borkdude 2022-08-05T15:00:05.785949Z

returning an immutable-js map with string keys would convert into a mutable JS object more easily than one with Symbol keys

mkvlr 2022-08-05T15:00:32.768349Z

ah I see

mkvlr 2022-08-05T15:02:06.439089Z

immer js is also something to look at, it uses frozen objects, a trick @jackrusher also applied for doing a js triple store

borkdude 2022-08-05T15:12:07.825169Z

Thanks for sharing

borkdude 2022-08-05T15:12:13.674379Z

(pun intended!)

2022-08-05T15:42:45.349499Z

Could you expand on the difficulties imposed by Symbol.for("foo") as a key?

borkdude 2022-08-05T15:46:51.227079Z

JS APIs and JSON, etc just assume string keys. Why not go with the flow?

mkvlr 2022-08-05T15:51:50.983889Z

because strings aren't keywords? How will you represent string keys then?

borkdude 2022-08-05T15:52:05.142339Z

you don't: just string keys everywhere

borkdude 2022-08-05T15:52:20.130289Z

converting from string keys to keywords is an uphill battle in CLJS programs that do lots of interop

borkdude 2022-08-05T15:53:32.670449Z

Btw, I'm not settled on any approach, but just thinking out loud

mkvlr 2022-08-05T15:54:27.929109Z

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/for should be heavily optimized by the runtime

2022-08-05T15:56:34.742739Z

No guarantees are made regarding the precise implementation in any given browser, but they should generally be unique lookups, like the JVM ones. They also have semantics that map well to the other keyword manipulating functions in Clojure, like name. I can't really say whether it's worth it from a performance (most importantly: memory consumption) perspective without a bakeoff in the major browsers. 🤷🏻‍♂️

mkvlr 2022-08-05T15:56:38.188629Z

treating keywords as strings also sounds like an uphill battle for any Clojure/Script program

borkdude 2022-08-05T15:57:57.503529Z

@mkvlr Explain?

borkdude 2022-08-05T15:58:25.255379Z

@jackrusher When I generate Symbol.of("foo") - what do I gain? The string "foo" is allocated first anyway?

mkvlr 2022-08-05T16:02:53.191859Z

not sure if you were suggesting this: to me it sounded like you want to implement keywords as strings when used as keys for example in maps. Could I then not have a map with the same string and keyword keys with different values and clojure semantics would not be preserved?

borkdude 2022-08-05T16:04:48.063619Z

@mkvlr That's currently what happens in the immutable-js-poc branch (as a quick hack) but I haven't fully thought through the consequences

borkdude 2022-08-05T16:05:08.494509Z

Yes, that would be one limitation, but maybe a reasonable one in the name of performance and interop

borkdude 2022-08-05T16:06:11.252479Z

It's fairly cheap to go from an immutable-js object to a mutable JS object. If you go with some non-standard key type then you'd have to convert that yourself (which might be worth it)

mkvlr 2022-08-05T16:13:03.298839Z

I have a feeling this would break a ton of stuff

2022-08-05T16:14:41.535999Z

So far as I know (which might be out of date), JS strings are immutable, but in many browsers not interned. So if I have a large application with loads of :whatever-keyword, I get one string allocation for each one, and value comparison between them with O(N) (N = length of the shorter of the two strings). In contrast, Symbols are interned and compared based on a unique ID with O(1). As I said, I'd need to do some testing to see how this works out in practice.

borkdude 2022-08-05T16:15:17.191359Z

It depends on how you look what cherry is going to be. If it's a new way to work with CLJS(ish) in JS with bootstrapping a new ecosystem around it then breaking a ton of stuff isn't a concern.

borkdude 2022-08-05T16:16:39.335119Z

@jackrusher I get that aspect, but if you generate Symbol("foo") you will still generate a "foo" string... right? But after that, they are cheap to compare, sure yes.

2022-08-05T16:16:50.066579Z

I would say that being able to compile whatever cljc files a project has would be the best case, but I'd also be sympathetic to breaking compatibility if the overall payoff were good enough. And, of course, it's ultimately your call 🙂

2022-08-05T16:17:43.094449Z

You would generate only one "foo" string no matter how times the :foo keyword occurs.

borkdude 2022-08-05T16:17:53.522539Z

Well, the way cherry currently works, it's already different from CLJS. Macros are run in Node.js and are compiled first to .mjs. It supports JS destructuring and async await...

2022-08-05T16:18:21.569899Z

(Or rather, all but one of them would be garbage collected immediately if the allocation wasn't optimized away by the JIT)

borkdude 2022-08-05T16:20:00.082299Z

@jackrusher If you have this code:

{:foo :foo}
this would compile to:
Map({Symbol("foo"), Symbol("foo")}
The string is allocated twice. Or do you mean, that the compiler should only emit Symbol("foo") once at the top of the file? This is also what we could do in cherry with the cljs.core backend btw, but it would only work within one and the same file (since it's not a whole-program optimizer like Closure). Still a nice optimization

borkdude 2022-08-05T16:22:37.719609Z

lol:

> Symbol.for("foo") === Symbol.for("foo")
true
> Symbol("foo") === Symbol("foo")
false

2022-08-05T16:24:02.479389Z

I find that aspect of the API amusing as well 🙂

mkvlr 2022-08-05T16:26:16.554889Z

I can imagine quite a few use cases (one example: an http request map where the request param keys are strings and then coerced params are keywords) where string/keyword keys distinction is useful. I know there's differences in cherry but I'd love there to be none for pure functional code using only the core data structures.

2022-08-05T16:27:08.685159Z

I'm not sure how to phrase what I'm saying more clearly. Another attempt: If you have, say, 100 references to :foo, and strings are not interned, you end up with 100 "foo" strings in memory. If you have 100 Symbol.for("foo") references, you get one string in memory because all the others will get immediately garbage collected after the Symbol lookups have completed. (Or the allocation will be factored out by the JIT, if the code produces loads of the same keyword during runtime).

borkdude 2022-08-05T16:28:33.320459Z

Yeah. JS runtimes might also have an intrinsic for the Symbol.for("foo") expression

2022-08-05T16:28:57.915789Z

but, as before, can't be sure if it's better without measuring 🙂

borkdude 2022-08-05T16:30:07.992139Z

I'm sympathetic to @mkvlr’s argument. The immutable-js-poc branch is a 1-2 hour hack to see if it works (and so far it does quite nicely work) but I think if you go further along with this, you'll end up re-implementing transducers, chunking etc maybe too

💯 1
😅 1
borkdude 2022-08-05T16:31:05.718009Z

So I think going with a JS-backed immutable data structure has benefits if you focus on output size, JS tooling, etc with the trade-off that it might be a bigger deviation from CLJS

borkdude 2022-08-05T16:31:17.885469Z

cherry could also have multiple output modes where one of them is immutable-js

mkvlr 2022-08-05T16:32:32.881319Z

curious if you can use Symbol to get the best of both worlds or can that not be put in a immutable js map?

mkvlr 2022-08-05T16:33:36.662619Z

and in benchmarks. Is there any cljs data structure benchmarks we could reuse?

mkvlr 2022-08-05T16:35:24.367209Z

re implementing transducers: https://github.com/thi-ng/umbrella/tree/develop/packages/transducers

borkdude 2022-08-05T17:28:29.721159Z

yes, you can use anything as a key I think, you just have to make your pass over the toJS transformation to convert those symbols to strings again

borkdude 2022-08-05T17:29:40.905169Z

it's possible to implement transducers yourself on top of immutablejs or whatever, but my main point was: then you're creating more and more your own stuff which is likely to have bugs and deviations from what CLJS does

borkdude 2022-08-05T17:31:00.105299Z

I'll keep pushing forward with the CLJS stuff and keep this branch as an interesting segway

2022-08-07T15:37:13.426739Z

The discussion on how they came to that decision is truly dispiriting and another reminder of why we will never have nice things.

borkdude 2022-08-07T15:41:32.555009Z

do you have a link and/or summary of that?

borkdude 2022-08-05T14:53:35.413279Z

@mkvlr Pushed it now to immutable-js-poc

👍 1
mkvlr 2022-08-05T14:58:03.572419Z

ah so immutable.js, was wondering if the new built in immutable stuff was already useable in some envs

borkdude 2022-08-05T14:58:48.422449Z

those things aren't persistent data structures I think, so probably not

borkdude 2022-08-05T15:09:50.723659Z

Cool this also works now:

$ ./node_cli.js run corpus/doseq.cljs
[cherry] Running corpus/doseq.cljs
1 hello
1 bye
2 hello
2 bye
3 hello
3 bye
borkdude@m1 ~/dev/cherry (immutable-js-poc) $ cat corpus/doseq.cljs
(ns doseq)

(doseq [x [1 2 3]
        y [:hello :bye]]
  (prn x y))

borkdude 2022-08-05T15:14:01.963599Z

$ npx esbuild corpus/doseq.mjs --bundle --minify --platform=node --outfile=dist/index.mjs --format=esm

  dist/index.mjs  65.1kb

borkdude 2022-08-05T15:15:42.032289Z

This is about the size of immutable JS itself, so I'm still not sure if there's any good treeshaking going on, but at least it's a lot smaller

borkdude 2022-08-05T15:37:23.109909Z

The same example produces 97kb with shadow-cljs

ray 2022-08-05T15:50:29.931009Z

🍒 🤛🏼

lilactown 2022-08-05T19:11:48.988259Z

Records and tuples have different semantics than libs like immutablejs and clojure.core

lilactown 2022-08-05T19:11:59.744279Z

They cannot contain mutable objects at all

lilactown 2022-08-05T19:12:20.349019Z

At least, last time I read the proposal

Chris McCormick 2022-08-05T19:45:56.786889Z

@borkdude I assume you know of Wisp but I thought I'd check just in case. It could be an interesting reference point: https://github.com/wisp-lang/wisp/ Specifically the escodegen backend (which I don't know much about myself): https://github.com/wisp-lang/wisp/tree/master/src/backend/escodegen

borkdude 2022-08-05T20:01:55.363009Z

I've seen you reference it before and it surely looks interesting. I've got something on par with this in cherry and it could be easily tweaked to a "pure" JS thing without CLJS stuff in it I think

borkdude 2022-08-05T20:02:03.648539Z

I wonder how they do this: wisp vectors are plain JavaScript arrays, but nevertheless all standard library functions are non-destructive and pure functional as in Clojure.

borkdude 2022-08-05T20:09:23.018339Z

Ah I see, just create new arrays all the time ;)

Chris McCormick 2022-08-05T20:18:53.174589Z

The thing I always found annoying about Wisp were the places where it varies from Clojure. It was "clojure-like" but not quite clojure-like enough in many instances and I often found myself tripping over that. It did make me wish for a pure Cljs mode where datastructures are assumed to be native instead of Clojure (e.g. [] is a native array instead of a vec). Mostly for performance/small compile reasons. Both #js [] and (clj->js []) are cumbersome for different reasons (the first is annoying to type in deeper structures and the second has the expensive round-trip) although j/lit is probably a good enough tradeoff.

borkdude 2022-08-05T20:22:26.716989Z

right. suppose we have cherryjs which did [] and {} as native objects. what would the standard lib then look like? e.g. filter and map over a native array, are those destructive or not?

borkdude 2022-08-05T20:23:40.320479Z

it seems those copy:

> x = [1, 2]
[ 1, 2 ]
> y = x.map(x =>  x + 1)
[ 2, 3 ]
> x[0]
1
> y[0]
2

borkdude 2022-08-05T20:24:09.270999Z

but what about assoc?

Chris McCormick 2022-08-05T20:25:45.273259Z

If it was me designing the language i would try to make it feel as close to clojure as possible, so assoc would feel immutable (even if it was just cloning). I am way out of my depth here though! 😅

borkdude 2022-08-05T20:45:26.701369Z

I think in that case it makes more sense to use the cljs.core stdlib or immutablejs/etc

borkdude 2022-08-05T20:45:40.615089Z

as they implement structural sharing which is way more efficient than copy-on-write