Fork me on GitHub
#cljs-dev
<
2023-02-14
>
niwinz09:02:49

Hello. I found a strange (probably a bug somewhere) compilation output when _ placeholder is used multiple times in a protocol function. The code sample is:

(defprotocol IState1
  (-extract1 [_] [_ _] "Extract the current state value."))

(defprotocol IState2
  (-extract2 [it] [it default] "Extract the current state value."))
You can observe they are the same method one using named arguments and other using just _ placeholder everywhere. And the compiler output extract for the both is:
// [...]
// the bug is here:
(bugrep.main1._extract1.cljs$core$IFn$_invoke$arity$1 = (function (_){
if((((!((_ == null)))) && ((!((_.bugrep$main1$IState1$_extract1$arity$1 == null)))))){
return _.bugrep$main1$IState1$_extract1$arity$1(_);
} else {
return bugrep$main1$IState1$_extract1$dyn_484.call(null,_);
}
}));

(bugrep.main1._extract1.cljs$core$IFn$_invoke$arity$2 = (function (_,___$1){
if((((!((___$1 == null)))) && ((!((___$1.bugrep$main1$IState1$_extract1$arity$2 == null)))))){
return ___$1.bugrep$main1$IState1$_extract1$arity$2(___$1,___$1);
} else {
return bugrep$main1$IState1$_extract1$dyn_484.call(null,___$1,___$1);
}
}));

(bugrep.main1._extract1.cljs$lang$maxFixedArity = 2);

// [...]
// The correct output

(bugrep.main1._extract2.cljs$core$IFn$_invoke$arity$1 = (function (it){
if((((!((it == null)))) && ((!((it.bugrep$main1$IState2$_extract2$arity$1 == null)))))){
return it.bugrep$main1$IState2$_extract2$arity$1(it);
} else {
return bugrep$main1$IState2$_extract2$dyn_489.call(null,it);
}
}));

(bugrep.main1._extract2.cljs$core$IFn$_invoke$arity$2 = (function (it,default$){
if((((!((it == null)))) && ((!((it.bugrep$main1$IState2$_extract2$arity$2 == null)))))){
return it.bugrep$main1$IState2$_extract2$arity$2(it,default$);
} else {
return bugrep$main1$IState2$_extract2$dyn_489.call(null,it,default$);
}
}));
Observe how on the first part the inner function call is called with ___$1 passing it down as first and second argument. I know that the protocol placeholders are used as documentation, but I found this on a first fast prototyping and constantly getting "no protocol IState implementation found" for the second argument value which of course don't implement it and don't need implement it on my case. The code is simple and can be reproduced with any protocol with multiple arities using _ placeholder, but if you need I can push the repro code to an repository.

dnolen15:02:37

@niwinz _ has no special meaning

dnolen15:02:49

using it again and again is like using the same parameter name over again

dnolen15:02:13

(defprotocol IFoo (-bar [a a]))

dnolen15:02:47

it maybe that what you found is a bug around _ and JS var names, but did you check that the same problem doesn’t occur w/ duplicated parameter names in general?

niwinz15:02:39

i will check it right now

niwinz15:02:29

(bugrep.main1._extract2.cljs$core$IFn$_invoke$arity$2 = (function (it,it__$1){
if((((!((it__$1 == null)))) && ((!((it__$1.bugrep$main1$IState2$_extract2$arity$2 == null)))))){
return it__$1.bugrep$main1$IState2$_extract2$arity$2(it__$1,it__$1);
} else {
return bugrep$main1$IState2$_extract2$dyn_489.call(null,it__$1,it__$1);
}

niwinz15:02:25

the protocol is defined as (-extract2 [it] [it it] "docstring")

niwinz16:02:09

i know that _ has no special meaning, and I checking this with bare cljs compiler (latest version 1.11.60) and compiling with

(require '[cljs.build.api :as b])

(b/build "src" {:main 'bugrep.main1
                :output-to "main.js"
                :output-dir "out"
                :target :nodejs
                :verbose true})

thheller16:02:46

I mean technically what code could it possibly generate that would make this "correct"?

niwinz16:02:20

(bugrep.main1._extract2.cljs$core$IFn$_invoke$arity$2 = (function (it,it__$1){
if((((!((it == null)))) && ((!((it.bugrep$main1$IState2$_extract2$arity$2 == null)))))){
return it.bugrep$main1$IState2$_extract2$arity$2(it,it__$1);
} else {
return bugrep$main1$IState2$_extract2$dyn_489.call(null,it, it__$1);
}

thheller16:02:17

forget the JS code for a second. (defn x [a a] a) (x 1 2)

thheller16:02:21

what do you expect? 😛

niwinz16:02:21

but this is not a code, is just a placeholder, and all the possible variables are still there (it,it__$1)

niwinz16:02:04

I mean, I define the variable names when I implement the protocol, not on the protocol definition.

p-himik16:02:41

Just in case, if you only want to get rid of the "unused variable" warning your IDE might be showing, usually you can get rid of it by adding _ in front of the proper name, like (-extract [_it] [_it _default] ...).

p-himik16:02:20

@thheller A closer analog would be (defn x [a a] "hello").

mikerod17:02:25

I actually wondered about this in clj/cljs.

mikerod17:02:41

I didn’t realize it should even be valid to have the same formal param name multiple times in an single arg list

mikerod17:02:53

Isn’t that invalid in eg java or jvm bytecode?

mikerod17:02:19

However, I’ve seen it come up with people ignoring multiple formal params with the _ convention, eg. (fn [_ _ x] x)

p-himik17:02:46

No clue about Java or bytecode. But it has nothing to do with Clojure anyway since it can rename the arguments as it pleases (potentially - I have no idea whether it actually does it).

mikerod18:02:59

Yeah, I know the compiler (clj or cljs) could rename if they wanted to to support this. I’m just surprised if they do. Of course that does raise the question to me if you can do this what does ((fn [a a] a) 1 2) mean as already mentioned

p-himik18:02:48

Oh but they do. :) You can see the CLJS example above in the main channel. And here's an example from CLJ:

user=> (require 'clj-java-decompiler.core)
nil
user=> (clj-java-decompiler.core/decompile (fn [a a] a))

// Decompiling class: user$fn_line_1__220
import clojure.lang.*;

public final class user$fn_line_1__220 extends AFunction
{
    public static Object invokeStatic(final Object a, final Object a) {
        return a;
    }
    
    @Override
    public Object invoke(final Object a, final Object a2) {
        return invokeStatic(a, a2);
    }
}

nil
user=> 

👍 2
p-himik18:02:32

Not sure about the final Object a, final Object a part, but note the final Object a, final Object a2 part. Also decompilers are not always necessarily correct, so take it all with a grain of salt.

mikerod15:02:34

Thanks for all those details. That makes it quite clear. Yeah, I think the decompiled static method is probably just incorrect, but the invoke() call is insightful.

p-himik15:02:49

In addition, while I'm not even remotely knowledgeable in the world of JVM bytecode, AFAIK parameter names might or might not be stored there at all. E.g. a https://docs.oracle.com/javase/tutorial/reflect/member/methodparameterreflection.html says: > If the parameter's name is present, then this method [`getName`] returns the name provided by the .class file. Otherwise, this method synthesizes a name of the form argN, where N is the index of the parameter in the descriptor of the method that declares the parameter.

niwinz16:02:25

in any case, I found this strange, the workaround is clear, just use different placeholder variables on protocol definitions, but still strange, because, on protocol definitions you really need to know how many arguments... and define a function that receves it. The impl should check if the first parameter implements it or run a dynamic lookup

thheller16:02:19

seems to be ok in clojure so I guess it can be considred a bug

niwinz16:02:26

And since the compiler already does this correctly setting different names when generating a function: ...e$arity$2 = (function (it,it__$1){

niwinz16:02:10

And yes, in Clojure it works without any issue.

dnolen16:02:46

@niwinz so it works w/ a and not _ or both are broken?

niwinz17:02:33

both are broken @dnolen

dnolen19:02:28

@niwinz I'm still confused about this case because we're probably talking a bit too much about details and not semantics

dnolen19:02:36

so I will rephrase the problem description

dnolen19:02:46

"if a protocol is implemented with duplicated parameter names, the later parameter masks the former"

dnolen19:02:08

yet somehow this isn't a problem in Clojure? but is this even semantic thing that is guaranteed that we need support?

niwinz19:02:17

@dnolen It happens when a protocol is defined (not implemented) with repeated placeholder (per example _) because you are prototyping or it is just an internal and you just want to indicate the number of parameters but not names. Per example:

(defprotocol IState
  (-extract [_] [_ _] "Extract the current state value."))

(extend-type SomeObject
  IState
  (-extract [this] nil)
  (-extract [this default] default))

;; on calling this will raise an exception that `:no-val` does not
;; implements the IState protocol, that as first, very confusing
;; because I don't call -extract over `:no-val` keyword.
(-extract (SomeObject.) :no-val)
Yes, in an ideal case, the protocol should be defined with proper parameter placeholder names, but sometimes that naming is just irrelevant and you just use _ for specify that there will go an argument that I don't care right now. The main problem is that, if you use a dummy placeholder repeatedly (but also happens when the same placeholder name is used repeatedly) it causes a very confusing error message. You can observe that in the implementation, I use clear names, and this is how it already works on Clojure side, I just noticed it when start passing cljs tests and surprised by very confusing error, that is caused because compiler generates partially incorrect output, it generates correct separated symbols for each argument but later uses only the last one repeatedly.

dnolen19:02:04

@niwinz the use case (while I understand is important to you) is not so relevant for whether this should get fixed or not

dnolen19:02:18

I just care about the semantics of protocol fns period

dnolen19:02:34

and the parameters, how it will be used - that's up to you

niwinz19:02:45

is not about use case, is about very strange error message, that will make spend some hours researching until reaching that is because compiller generates that code

dnolen19:02:06

yes, yes - but I don't care about that stuff right now

dnolen19:02:14

only getting at the semantics and the parameter problem only

niwinz19:02:33

ok, it is ok for me, just wanted notify about this

dnolen19:02:59

so one thing for us to look as is to determine why this works in Clojure, or how it avoids the duped parameter issue when dispatching

dnolen19:02:12

@U064X3EF3 any chance you have some insight?

niwinz19:02:25

the deduplication is already done

niwinz19:02:46

that part already generates a correct code, is just the body of the function uses incorrect parameters

niwinz19:02:33

(bugrep.main1._extract2.cljs$core$IFn$_invoke$arity$2 = (function (it,it__$1){
if((((!((it__$1 == null)))) && ((!((it__$1.bugrep$main1$IState2$_extract2$arity$2 == null)))))){
return it__$1.bugrep$main1$IState2$_extract2$arity$2(it__$1,it__$1);
} else {
return bugrep$main1$IState2$_extract2$dyn_489.call(null,it__$1,it__$1);
}
Look at this generated code, the function already receives: it and it__$1

niwinz19:02:57

but later, it just ignores it and only uses it__$1 repeatedly

dnolen19:02:14

you are focusing on the implementation

dnolen19:02:20

again let's just avoid talking about all of that stuff

dnolen19:02:29

the first thing is why it works in Clojure at all

dnolen19:02:32

and if it's even intentional

dnolen19:02:08

the renaming of the parameters is not for the reasons you think anyway

niwinz19:02:44

because in clojure they are just placeholders as far as I understand, and only indicates the number of variables. Once you add an impl, is there you really use the correct symbols for the arguments

dnolen19:02:49

for a general reason for JS fns

dnolen19:02:50

noted - but I think some feedback from the Clojure dev team is warranted before doing anything at all

niwinz19:02:19

yep, of course, i'm also will take some time to look on it and understand better how clojure is doing. in any case, thanks for your time for looking on it.

dnolen19:02:51

one thing that might be causing a problem is the dispatch mechanism, it may also be that the first parameter is treated in a special way in Clojure - I do not know

dnolen19:02:10

but the parameter renaming thing has nothing to w/ anything but JavaScript compilation

dnolen19:02:18

and it was introduced to fix other issues

dnolen19:02:36

but because the dispatcher is a fn, maybe that collides w/ other problems we solved that are more important

dnolen19:02:47

finally maybe there is a way around this

dnolen19:02:01

i.e. for protocol fns making the first parameter special etc.

dnolen19:02:15

I don't know but we need start at the beginning and go from there

niwinz19:02:28

yep, agree

2
Alex Miller (Clojure team)20:02:24

sorry, haven't read all that - is there a summary?

dnolen20:02:14

summary is whether there's an expected behavior around protocol fns of the form

dnolen20:02:29

(defprotocol IFoo (-foo [a a]))

dnolen20:02:48

note the duplication of the parameter names

Alex Miller (Clojure team)20:02:26

protocol definitions only really define arity right?

dnolen20:02:49

CLJS implementation details leads to the following behavior - the second name shadows the first inside the protocol fn - which breaks dispatch on first arg

dnolen20:02:03

it is the generated dispatch fn that breaks

Alex Miller (Clojure team)20:02:06

this is on protocol definition or protocol implementation?

dnolen20:02:21

the issue in CLJS is protocol definition

dnolen20:02:33

but I'm curious how it could work in Clojure since you generate protocol fn there as well

dnolen20:02:55

so param ambiguity must be surmounted somehow, unless at the point the names don't matter

Alex Miller (Clojure team)20:02:58

not at definition time?

dnolen20:02:41

ah k, so that's not how protocols were implemented at the beginning in ClojureScript as far as I can remember - Rich did that work originally

dnolen20:02:55

the generated fn(s) does the dispatch work

dnolen20:02:24

we could of course ignore the names and generate them

Alex Miller (Clojure team)20:02:13

user=> (defprotocol IFoo (-foo [a a]))
IFoo
user=> (defrecord A [] IFoo (-foo [a a] (println a)))
user.A
user=> (-foo (->A) 1)
1

Alex Miller (Clojure team)20:02:41

I know this has been asked in the general defn case before - something like (defn x [a a] a)

Alex Miller (Clojure team)20:02:14

I believe Rich's question then was something like, what does that even mean? :) in practice, I think you get the last named param bound

dnolen20:02:42

right, so I think the issue then is that the protocol dispatcher uses normal fn compilation

dnolen20:02:02

so that interacts oddly in this case w/ protocols semantics where it's assumed the declaration param names do not matter

Alex Miller (Clojure team)20:02:28

in Clojure, defprotocol is really just defining the interface and the param names in an interface method mean nothing for implementors (in Java)

Alex Miller (Clojure team)20:02:27

broadly, I think this falls into the category of "don't do that" :)

dnolen20:02:20

I agree and it's a minor issue, but I agree is surprising

Alex Miller (Clojure team)20:02:53

get out your sand paper for those rough edges

dnolen19:02:51

switching to thread