Another experimental thought for another project. What if we forked cherry, do not use cljs.core at all, but generate immutable JS data structures like this: https://twitter.com/robpalmer2/status/1555495764444545024 Or we could use some immutablejs thing behind the scenes
or we could use a proxy layer so we can switch implementations later
Trying this crazy idea in a branch:
$ ./node_cli.js -e '(prn (str (assoc {} :foo :bar)))'
Map { "foo": "bar" }
$ ./node_cli.js -e '(prn (let [[x y] [1 2]] (str [(inc x) (dec y)])))'
List [ 2, 1 ]
didn't push it yet, right?
and then keywords could be Symbols
Yes, but what about symbol? ;)
either prefix one, eg keywords with :
but think just using it for keywords would already be a huge speed benefit as it's much more common at runtime
I considered also just using a string for keywords
since that has benefits for JS interop
how would that be better for interop than Symbol?
returning an immutable-js map with string keys would convert into a mutable JS object more easily than one with Symbol keys
ah I see
immer js is also something to look at, it uses frozen objects, a trick @jackrusher also applied for doing a js triple store
https://medium.com/hackernoon/introducing-immer-immutability-the-easy-way-9d73d8f71cb3
Thanks for sharing
(pun intended!)
Could you expand on the difficulties imposed by Symbol.for("foo") as a key?
JS APIs and JSON, etc just assume string keys. Why not go with the flow?
because strings aren't keywords? How will you represent string keys then?
you don't: just string keys everywhere
converting from string keys to keywords is an uphill battle in CLJS programs that do lots of interop
Btw, I'm not settled on any approach, but just thinking out loud
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/for should be heavily optimized by the runtime
No guarantees are made regarding the precise implementation in any given browser, but they should generally be unique lookups, like the JVM ones. They also have semantics that map well to the other keyword manipulating functions in Clojure, like name. I can't really say whether it's worth it from a performance (most importantly: memory consumption) perspective without a bakeoff in the major browsers. 🤷🏻♂️
treating keywords as strings also sounds like an uphill battle for any Clojure/Script program
@mkvlr Explain?
@jackrusher When I generate Symbol.of("foo") - what do I gain? The string "foo" is allocated first anyway?
not sure if you were suggesting this: to me it sounded like you want to implement keywords as strings when used as keys for example in maps. Could I then not have a map with the same string and keyword keys with different values and clojure semantics would not be preserved?
@mkvlr That's currently what happens in the immutable-js-poc branch (as a quick hack) but I haven't fully thought through the consequences
Yes, that would be one limitation, but maybe a reasonable one in the name of performance and interop
It's fairly cheap to go from an immutable-js object to a mutable JS object. If you go with some non-standard key type then you'd have to convert that yourself (which might be worth it)
I have a feeling this would break a ton of stuff
So far as I know (which might be out of date), JS strings are immutable, but in many browsers not interned. So if I have a large application with loads of :whatever-keyword, I get one string allocation for each one, and value comparison between them with O(N) (N = length of the shorter of the two strings). In contrast, Symbols are interned and compared based on a unique ID with O(1). As I said, I'd need to do some testing to see how this works out in practice.
It depends on how you look what cherry is going to be. If it's a new way to work with CLJS(ish) in JS with bootstrapping a new ecosystem around it then breaking a ton of stuff isn't a concern.
@jackrusher I get that aspect, but if you generate Symbol("foo") you will still generate a "foo" string... right? But after that, they are cheap to compare, sure yes.
I would say that being able to compile whatever cljc files a project has would be the best case, but I'd also be sympathetic to breaking compatibility if the overall payoff were good enough. And, of course, it's ultimately your call 🙂
You would generate only one "foo" string no matter how times the :foo keyword occurs.
Well, the way cherry currently works, it's already different from CLJS. Macros are run in Node.js and are compiled first to .mjs. It supports JS destructuring and async await...
(Or rather, all but one of them would be garbage collected immediately if the allocation wasn't optimized away by the JIT)
@jackrusher If you have this code:
{:foo :foo}
this would compile to:
Map({Symbol("foo"), Symbol("foo")}
The string is allocated twice. Or do you mean, that the compiler should only emit Symbol("foo") once at the top of the file? This is also what we could do in cherry with the cljs.core backend btw, but it would only work within one and the same file (since it's not a whole-program optimizer like Closure). Still a nice optimizationlol:
> Symbol.for("foo") === Symbol.for("foo")
true
> Symbol("foo") === Symbol("foo")
falseI find that aspect of the API amusing as well 🙂
I can imagine quite a few use cases (one example: an http request map where the request param keys are strings and then coerced params are keywords) where string/keyword keys distinction is useful. I know there's differences in cherry but I'd love there to be none for pure functional code using only the core data structures.
I'm not sure how to phrase what I'm saying more clearly. Another attempt: If you have, say, 100 references to :foo, and strings are not interned, you end up with 100 "foo" strings in memory. If you have 100 Symbol.for("foo") references, you get one string in memory because all the others will get immediately garbage collected after the Symbol lookups have completed. (Or the allocation will be factored out by the JIT, if the code produces loads of the same keyword during runtime).
Yeah. JS runtimes might also have an intrinsic for the Symbol.for("foo") expression
but, as before, can't be sure if it's better without measuring 🙂
I'm sympathetic to @mkvlr’s argument. The immutable-js-poc branch is a 1-2 hour hack to see if it works (and so far it does quite nicely work) but I think if you go further along with this, you'll end up re-implementing transducers, chunking etc maybe too
So I think going with a JS-backed immutable data structure has benefits if you focus on output size, JS tooling, etc with the trade-off that it might be a bigger deviation from CLJS
cherry could also have multiple output modes where one of them is immutable-js
curious if you can use Symbol to get the best of both worlds or can that not be put in a immutable js map?
and in benchmarks. Is there any cljs data structure benchmarks we could reuse?
re implementing transducers: https://github.com/thi-ng/umbrella/tree/develop/packages/transducers
yes, you can use anything as a key I think, you just have to make your pass over the toJS transformation to convert those symbols to strings again
it's possible to implement transducers yourself on top of immutablejs or whatever, but my main point was: then you're creating more and more your own stuff which is likely to have bugs and deviations from what CLJS does
I'll keep pushing forward with the CLJS stuff and keep this branch as an interesting segway
The discussion on how they came to that decision is truly dispiriting and another reminder of why we will never have nice things.
do you have a link and/or summary of that?
btw, the new JS records will probably only allow string keys: https://rickbutton.github.io/record-tuple-playground/#eyJjb250ZW50IjoiaW1wb3J0IHsgUmVjb3JkLCBUdXBsZSB9IGZyb20gXCJAYmxvb21iZXJnL3JlY29yZC10dXBsZS1wb2x5ZmlsbFwiO1xuXG5jb25zdCBmb28gPSAje2hlbGxvOiBcInRoZXJlXCJ9XG5jb25zdCBiYXIgPSAjeyAuLi5mb28sIGFub3RoZXI6IFwia2V5XCJ9XG5cbmNvbnNvbGUubG9nKGJhcikiLCJzeW50YXgiOiJoYXNoIiwiZG9tTW9kZSI6ZmFsc2V9
ah so immutable.js, was wondering if the new built in immutable stuff was already useable in some envs
those things aren't persistent data structures I think, so probably not
Cool this also works now:
$ ./node_cli.js run corpus/doseq.cljs
[cherry] Running corpus/doseq.cljs
1 hello
1 bye
2 hello
2 bye
3 hello
3 bye
borkdude@m1 ~/dev/cherry (immutable-js-poc) $ cat corpus/doseq.cljs
(ns doseq)
(doseq [x [1 2 3]
y [:hello :bye]]
(prn x y))
$ npx esbuild corpus/doseq.mjs --bundle --minify --platform=node --outfile=dist/index.mjs --format=esm
dist/index.mjs 65.1kb
This is about the size of immutable JS itself, so I'm still not sure if there's any good treeshaking going on, but at least it's a lot smaller
The same example produces 97kb with shadow-cljs
🍒 🤛🏼
Records and tuples have different semantics than libs like immutablejs and clojure.core
They cannot contain mutable objects at all
At least, last time I read the proposal
@borkdude I assume you know of Wisp but I thought I'd check just in case. It could be an interesting reference point: https://github.com/wisp-lang/wisp/ Specifically the escodegen backend (which I don't know much about myself): https://github.com/wisp-lang/wisp/tree/master/src/backend/escodegen
I've seen you reference it before and it surely looks interesting. I've got something on par with this in cherry and it could be easily tweaked to a "pure" JS thing without CLJS stuff in it I think
I wonder how they do this: wisp vectors are plain JavaScript arrays, but nevertheless all standard library functions are non-destructive and pure functional as in Clojure.
Ah I see, just create new arrays all the time ;)
The thing I always found annoying about Wisp were the places where it varies from Clojure. It was "clojure-like" but not quite clojure-like enough in many instances and I often found myself tripping over that. It did make me wish for a pure Cljs mode where datastructures are assumed to be native instead of Clojure (e.g. [] is a native array instead of a vec). Mostly for performance/small compile reasons. Both #js [] and (clj->js []) are cumbersome for different reasons (the first is annoying to type in deeper structures and the second has the expensive round-trip) although j/lit is probably a good enough tradeoff.
right. suppose we have cherryjs which did [] and {} as native objects. what would the standard lib then look like? e.g. filter and map over a native array, are those destructive or not?
it seems those copy:
> x = [1, 2]
[ 1, 2 ]
> y = x.map(x => x + 1)
[ 2, 3 ]
> x[0]
1
> y[0]
2but what about assoc?
If it was me designing the language i would try to make it feel as close to clojure as possible, so assoc would feel immutable (even if it was just cloning). I am way out of my depth here though! 😅
I think in that case it makes more sense to use the cljs.core stdlib or immutablejs/etc
as they implement structural sharing which is way more efficient than copy-on-write