cljs-dev

borkdude 2023-04-01T15:30:09.085159Z

I'm storing a collection of symbols in a set. The collection looks like this:

(def core-vars '#{_PLUS_, inc, ...})
etc. I always compared the munged name of the symbols. But in some projects, this collection gets printed as
#{+, inc, ..}
only under advanced compilation. I've had a similar bug once with shadow-cljs: https://github.com/thheller/shadow-cljs/issues/1063 which was fixed, but I'm getting this bug with vanilla CLJS now. Note that not all symbols display as the unmunged version, only + in this case... This bug causes (contains the-set '_PLUS_) to fail under advanced compilation. Does CLJS do comparable optimizations for symbols as shadow? Any pointers to where this might happen? @thheller suggested: I guess :optimize-constants works similarly to what shadow-cljs did before, so I guess its the same munging issue

thheller 2023-04-01T15:33:02.534729Z

the problem in shadow-cljs was that it creates var cljs$cst$munged_symbol = new cljs.core.Symbol(null, "munged-symbol"); references and sometimes that ended up using the same name

thheller 2023-04-01T15:33:51.458979Z

so var cljs$cst$_EQ_ = new cljs.core.Symbol(null, "="); and var cljs$cst$_EQ_ = new cljs.core.Symbol(null, "_EQ_");

thheller 2023-04-01T15:34:35.479629Z

I guess optimize constants just does the same still

borkdude 2023-04-01T15:49:52.446259Z

A repro:

(ns test.project)

(def x '+)
(def y '_PLUS_)

(defn -main [& _args]
  (prn [x y])
  (prn (= x y)))

(set! *main-cli-fn* -main)
$clj -M -m cljs.main --target nodejs -O simple -c test.project
$ node .cljs_node_repl/main.js
[+ _PLUS_]
false
$ clj -M -m cljs.main --target nodejs -O advanced -c test.project
$ node .cljs_node_repl/main.js
[+ +]
true

borkdude 2023-04-01T18:23:06.894319Z

@dnolen @thheller I've got a fix here which solves the problem in the above repro and I double-checked that the symbols are still properly shortened with advanced compilation: https://github.com/borkdude/clojurescript/commit/0ec3e6c9ead7930861f036fc24ccac9fd14be662

Re=new Qb(null,"_PLUS_","_PLUS_",-89880507,null)
Se=new Qb(null,"+","+",-740910886,null)

borkdude 2023-04-01T18:23:54.770569Z

I would appreciate your feedback before I make a proper patch. Also I wonder if I should add a test for this or not, and if so, where/how. I could move the "index table" into the env instead of making it global, not sure if that's important.

thheller 2023-04-01T18:25:05.500419Z

I think that'll fail with caching? can't remember how or even if there is any cache though

thheller 2023-04-01T18:26:22.714809Z

a simpler option may be just md5 hashing the source name and using that as a variable name. no munging conflicts that way and :advanced will turn it into something more gzip friendly anyways

borkdude 2023-04-01T18:27:21.649659Z

I remember proposing something like that to you too :) We could also do a hex64 encoding

borkdude 2023-04-01T18:30:41.500189Z

it seems the compiler already uses stuff like this: (symbol (str "cljs.user.source$form$" (util/content-sha (pr-str src) 7))) in various places

borkdude 2023-04-01T18:30:54.608859Z

(util/content-sha)

thheller 2023-04-01T18:32:36.551509Z

yeah only issue is that md5 or any hash is pretty hostile for gzipping, so less than ideal if :advanced is not used

thheller 2023-04-01T18:33:04.335549Z

dunno how likely conflicts are if you shorten to make the gzip impact smaller

borkdude 2023-04-01T18:39:55.321339Z

@thheller how likely is it that people would use :optimize-constants without :advanced?

thheller 2023-04-01T18:40:57.899799Z

pretty sure its also enabled for :simple?

thheller 2023-04-01T18:41:14.353609Z

otherwise I see no reason why you'd disable it, shadow-cljs doesn't even allow you to disable

borkdude 2023-04-01T18:42:56.999729Z

👍

borkdude 2023-04-01T18:49:58.401159Z

Created the issue: https://clojure.atlassian.net/browse/CLJS-3401

borkdude 2023-04-01T19:06:13.423779Z

I wonder if we could just use the original string length as the extra "index" since:

+ -> _PLUS_ ;;=> 1 (vs 6 munged) ==> cst$sym$1$_PLUS_ 
_PLUS_ ;;=> 6 (vs 6 munged) ==> cst$sym$6$_PLUS_
Since the munging always makes the original string longer it would avoid the conflict. But perhaps I'm missing some case where you could have the same munged string with the same original length for different symbols?

thheller 2023-04-01T19:07:26.227109Z

seems reasonable

borkdude 2023-04-01T19:09:44.961909Z

I guess then we can't differentiate between +_PLUS_ and _PLUS_+

borkdude 2023-04-01T19:44:36.402619Z

Hmm, maybe renaming _ to __ is enough:

'_PLUS_ => cst$sym$__PLUS__
'+ => cst$sym$_PLUS_

borkdude 2023-04-01T20:19:38.514089Z

and renaming $ to $$:

'foo.bar => cst$sym$foo$bar
'foo$bar => cst$sym$foo$$bar

borkdude 2023-04-01T20:20:06.064739Z

There aren't coming any other conflicts to my mind with that scheme so far

borkdude 2023-04-01T20:34:00.541249Z

OK, attached a patch with that solution + tests