Fork me on GitHub
#cljs-dev
<
2023-04-01
>
borkdude15:04:09

I'm storing a collection of symbols in a set. The collection looks like this:

(def core-vars '#{_PLUS_, inc, ...})
etc. I always compared the munged name of the symbols. But in some projects, this collection gets printed as
#{+, inc, ..}
only under advanced compilation. I've had a similar bug once with shadow-cljs: https://github.com/thheller/shadow-cljs/issues/1063 which was fixed, but I'm getting this bug with vanilla CLJS now. Note that not all symbols display as the unmunged version, only + in this case... This bug causes (contains the-set '_PLUS_) to fail under advanced compilation. Does CLJS do comparable optimizations for symbols as shadow? Any pointers to where this might happen? @thheller suggested: I guess :optimize-constants works similarly to what shadow-cljs did before, so I guess its the same munging issue

thheller15:04:02

the problem in shadow-cljs was that it creates var cljs$cst$munged_symbol = new cljs.core.Symbol(null, "munged-symbol"); references and sometimes that ended up using the same name

thheller15:04:51

so var cljs$cst$_EQ_ = new cljs.core.Symbol(null, "="); and var cljs$cst$_EQ_ = new cljs.core.Symbol(null, "_EQ_");

thheller15:04:35

I guess optimize constants just does the same still

borkdude15:04:52

A repro:

(ns test.project)

(def x '+)
(def y '_PLUS_)

(defn -main [& _args]
  (prn [x y])
  (prn (= x y)))

(set! *main-cli-fn* -main)
$clj -M -m cljs.main --target nodejs -O simple -c test.project
$ node .cljs_node_repl/main.js
[+ _PLUS_]
false
$ clj -M -m cljs.main --target nodejs -O advanced -c test.project
$ node .cljs_node_repl/main.js
[+ +]
true

borkdude18:04:06

@dnolen @thheller I've got a fix here which solves the problem in the above repro and I double-checked that the symbols are still properly shortened with advanced compilation: https://github.com/borkdude/clojurescript/commit/0ec3e6c9ead7930861f036fc24ccac9fd14be662

Re=new Qb(null,"_PLUS_","_PLUS_",-89880507,null)
Se=new Qb(null,"+","+",-740910886,null)

borkdude18:04:54

I would appreciate your feedback before I make a proper patch. Also I wonder if I should add a test for this or not, and if so, where/how. I could move the "index table" into the env instead of making it global, not sure if that's important.

thheller18:04:05

I think that'll fail with caching? can't remember how or even if there is any cache though

thheller18:04:22

a simpler option may be just md5 hashing the source name and using that as a variable name. no munging conflicts that way and :advanced will turn it into something more gzip friendly anyways

borkdude18:04:21

I remember proposing something like that to you too :) We could also do a hex64 encoding

borkdude18:04:41

it seems the compiler already uses stuff like this: (symbol (str "cljs.user.source$form$" (util/content-sha (pr-str src) 7))) in various places

borkdude18:04:54

(util/content-sha)

thheller18:04:36

yeah only issue is that md5 or any hash is pretty hostile for gzipping, so less than ideal if :advanced is not used

thheller18:04:04

dunno how likely conflicts are if you shorten to make the gzip impact smaller

borkdude18:04:55

@thheller how likely is it that people would use :optimize-constants without :advanced?

thheller18:04:57

pretty sure its also enabled for :simple?

thheller18:04:14

otherwise I see no reason why you'd disable it, shadow-cljs doesn't even allow you to disable

borkdude19:04:13

I wonder if we could just use the original string length as the extra "index" since:

+ -> _PLUS_ ;;=> 1 (vs 6 munged) ==> cst$sym$1$_PLUS_ 
_PLUS_ ;;=> 6 (vs 6 munged) ==> cst$sym$6$_PLUS_
Since the munging always makes the original string longer it would avoid the conflict. But perhaps I'm missing some case where you could have the same munged string with the same original length for different symbols?

thheller19:04:26

seems reasonable

borkdude19:04:44

I guess then we can't differentiate between +_PLUS_ and _PLUS_+

borkdude19:04:36

Hmm, maybe renaming _ to __ is enough:

'_PLUS_ => cst$sym$__PLUS__
'+ => cst$sym$_PLUS_

borkdude20:04:38

and renaming $ to $$:

'foo.bar => cst$sym$foo$bar
'foo$bar => cst$sym$foo$$bar

borkdude20:04:06

There aren't coming any other conflicts to my mind with that scheme so far

borkdude20:04:00

OK, attached a patch with that solution + tests