clojure 2022-04-12 | Slack Archive

Joshua Suskalo00:04:05

So I'm trying to write a library that squeezes as much performance as possible. One of the ways that I'm doing this is with liberal usage of :inline meta in order to help with the inliner. One issue I've run into however is that sometimes the inlined body is just a java interop call on one of the arguments. In the actual function body I type hint the argument to ensure that it gets checked and I get no reflection. In the inline definition however I can't do that since I don't know if the argument is type-hintable. So my question is this: is it preferred in this case to just avoid the :inline meta entirely in this case, to introduce a let binding that has a type hint on it, or to use clojure.core/cast to explicitly cast the value to the correct type?

didibus03:04:32

From my understanding, when you type hint on function parameters it only supports ^long, ^double, and arrays of primitives, any other hint is pretty much useless and will just compile to Object anyways. That said, if the argument is passed to a java method in the body, and there are ambiguous arities for that method, the hint can also be used to avoid reflection, and that's the only scenario where hinting function parameters with anything else but ^long, ^double, or array of primitive is useful. If you hint inside a let, you can hint to more primitive types, you can hint about int, float, boolean, short, char, etc. let and loop I think support all the Java primitive types, so a hint of that type will compile to a local variable of that type.

didibus03:04:34

In your case, if you wanted to hint the call to the java method to avoid reflection, I think you are better off wrapping in a let and hinting the bindings. But inline gets the form unevaluated, so technically you can get access to the meta on the arguments, and you can simply put it back when you expand. That lets the caller be in charge of type hinting as normal.

didibus03:04:26

I explained inlining here: https://stackoverflow.com/questions/44261030/what-is-the-inline-keyword-in-clojure-functions/71820469#71820469 maybe it can help you as well

didibus04:04:23

It's a little tricky, you have to do this:

(defn foo
  {:inline (fn [a]
             (let [hinted-a (with-meta (gensym) {:tag `String})]
               `(let [~hinted-a ~a]
                  (.toString ~hinted-a))))}
  [^String a]
  (.toString a))

didibus04:04:38

When you use foo in higher-order, it calls the type hinted function. And when you call it normally like (foo "a"), it gets inlined into:

(let [^String G__10285 "a"]
  (.toString G__10285))

didibus04:04:20

Anyways, so this is how you can support type-hinting for avoiding reflection in inlining. If you want to type-hint for primitive interop you don't have to do that, you can simply coerce:

(defn cos
  {:inline (fn [a] `(Math/cos (double ~a)))}
  ^double [^double a]
  (Math/cos a))

Joshua Suskalo04:04:45

Yeah your understanding is correct, i was just wanting to know if any compiler magic happened with cast to make it preferable since at the bytecode level let bindings introduce some unnecessary stack manipulation.

didibus04:04:15

Ah, fair enough, don't think so. The hinted let will be equivalent to if in Java you wrote:

Object a = "a";
(String)a.toString();

I believe

Joshua Suskalo04:04:01

Yes, but with some extra stack dance. It throws the item from the stack into a local, back on the stack, clears the local, does the cast, and proceeds. That's pretty easy for the jit to collapse though.

didibus04:04:04

Hum, I believe that's because it is actually the same as:

final Object a = "a";
(String)a.toString();

Which might be why it clears the local? Due to final?

Joshua Suskalo04:04:48

Well clojure compiles directly to bytecode, no java ir, so there may not be any directly comparable code. Clojure does this because of a flag called locals clearing which aggressively clears function arguments and local variables to prevent garbage collection issues like holding onto seq heads

didibus04:04:03

This is what I get:

public static java.lang.Object invokeStatic();
        Flags: PUBLIC, STATIC
        Code:
                  linenumber      1
               0: ldc             "a"
               2: astore_0        /* a */
               3: aload_0         /* a */
               4: checkcast       Ljava/lang/String;
                  linenumber      1
               7: invokevirtual   java/lang/String.toString:()Ljava/lang/String;
              10: areturn

for:

(let [^String a "a"]
  (.toString a))

Joshua Suskalo04:04:26

yep

didibus04:04:20

I'm not familiar enough with bytecode to know if this is different than just normal variable assignment?

Joshua Suskalo04:04:33

that store/load is the redundancy I was hoping cast would fix, but cast is not magic and it, instead of fiddling with the stack, derefs a var, makes a function call, which defers to a method call on an instance of Class with invokevirtual, and only then does a checkcast, so no dice, just have to rely on the jit collapsing it

didibus04:04:17

Right, but is that store/load not just variable assignment?

Joshua Suskalo04:04:05

well variable assignment doesn't exist at a bytecode level, there are no variables, only the stack and basically 'registers' which are numbered and can hold anything mutably

Joshua Suskalo04:04:20

but I guess you could think of it that way

didibus04:04:29

I understand that you only need (String)"a".toString(); in theory, but just wondering if let is doing something weirder than a java local

Joshua Suskalo04:04:59

probably not except when locals clearing kicks in

didibus05:04:50

I believe the "registers" are local no?

didibus05:04:21

> When each method is executed including the initial main method a frame is created on the stack which has a set of local variables. The array of local variables contains all the variables used during the execution of the method, including a reference to this, all method parameters, and other locally defined variables.

Joshua Suskalo05:04:26

yeah, they are, they map to the idea, except that there's a limited number of them. In cases where you have a lot of local variables they no longer map 1:1 and some things have to spill onto the stack for storage

didibus05:04:55

I don't think so, I think then you need to use astore with operand for the index

didibus05:04:07

But it still gets stored in the local variable array

didibus05:04:30

> Is one of a group of opcodes with the format istore_<n> they all store an integer into local variables. The <n> refers to the location in the local variable array that is being stored and can only be 0, 1, 2 or 3. Another opcode is used for values higher then 3 called istore, which takes an operand for the location in the local variable array.

Joshua Suskalo05:04:50

hmm. that's news to me, i guess i need to reread that part of the vm spec

Joshua Suskalo05:04:23

Joshua Suskalo05:04:17

thanks for pointing that out!

didibus05:04:42

I would really hope the JIT is smart enough to skip astore -> aload though, I mean, that's an unnecessary push/pop, and it should be pretty easy to see that the same thing is being pushed and poped right away, just seems so common too, like every-time someone decides to use a local to have a name

didibus05:04:23

Though I guess it has to check that nothing after would try to aload that same local again to know it can skip the astore.... But still, I feel that's also probably something the JIT does

Joshua Suskalo00:04:48

I have answered my own question with some usage of no.disassemble. Using a let is much preferred because it results in a little bit of stack manipulation which the JVM can easily optimize, while the cast function results in a get var root and an invocation of a JVM function, as opposed to just a checkedcast

Joshua Suskalo00:04:09

So I will be preferring using inline with a let, since I was attempting to remove the need for var lookups in the first place

hiredman01:04:07

Cast is not something you ever need

hiredman01:04:09

It is very literally just asking the compiler to emit a checkcast instruction, it doesn't do anything for reflection, doesn't give the compiler any type information, etc

hiredman01:04:02

Your :inline definition should be able to use the same type hints as the not inclined version

hiredman01:04:48

(ah, which I guess is what you mean about the let)

Joshua Suskalo01:04:50

yeah

Alex Miller (Clojure team)01:04:13

agreed with hiredman

Alex Miller (Clojure team)01:04:31

as a general observation, most people don't understand what inline functions do and/or don't need them in the first place, and should not use them

Joshua Suskalo01:04:50

That is true, however in this case the function call overhead is a part of what I am trying to remove, especially wr/t keeping the number of layers down for the inliner because I am interoperating with native code and need to keep serde overhead to a minimum.

Alex Miller (Clojure team)02:04:56

at some point (and you're close to that point), why not just write it in Java?

Joshua Suskalo02:04:24

Because I'm providing a Clojure interface and I want to use macros for plenty of the syntax, and I do want these functions to be able to be used in higher order functions, there are legit usecases for these functions being passed to map

Joshua Suskalo02:04:01

My "interoperating with native code" isn't because I have a native wrapper I'm using, it's because I'm the author of https://github.com/IGJoshua/coffi and want to provide as performant a way for others to write native-wrapping code as possible. I don't want people to feel like coffi is an invalid option for their usecase because of performance problems if they like the interface.

Joshua Suskalo03:04:13

(also there are sections of this where I am dropping down to something lower-level than clojure, though in this case it's dropping straight down to JVM bytecode)

Alex Miller (Clojure team)03:04:15

Inlines won't be used for higher order functions

didibus03:04:54

The JIT can inline a lot of the function call away, often when you bench with something like criterium, the difference between :inline and not using it disappears, because a Warm JIT will inline things automatically

didibus03:04:06

My understanding for FFI, the thing to be worried about is memory copying.

Ben Sless03:04:33

Regarding actually profiling JIT behavior, take a look at a tool like jitwatch

Joshua Suskalo04:04:40

I understand inlining isn't used in higher order functions, my point was providing a java api, e.g. doing it in java, would be bad for use with HOF. Yes the jit can inline a lot, but it has a max inline depth of 9 levels (this can be changed with cli flags but that's almost never a good idea), and clojure uses those fast because of how closures etc go through two stack frames with the instance and static forms of methods. The goal with inlining is to make sure that even with all that, most or all layers of ffi wrapper libraries will be inlined to aid performance.

didibus04:04:37

Ya, that's up to you. I think might point is that it's hard to know if all the trouble of inlining pays off or not. That's your call though.

Joshua Suskalo04:04:46

Right, and my call is to do everything I can to make performance simply not a concern when people are shopping for ffi api ergonomics. If people like the api I provide they should not have to walk away and use another lib just because I was inefficient or lazy with my implementation.

👍 1

Joshua Suskalo05:04:54

for reference, this inline function is the kind of lengths I'm willing to go to in order to avoid a little more function call overhead in this library: https://github.com/IGJoshua/coffi/blob/master/src/clj/coffi/ffi.clj#L243 which is the inline function definition for the next def, a relatively short function: https://github.com/IGJoshua/coffi/blob/master/src/clj/coffi/ffi.clj#L390

didibus05:04:33

It's fair, and in libs I think it make more sense in general as well. Clojure core itself chooses to inline a lot as well.

Ben Sless05:04:37

Regarding type hints, you can wrap the code generation in with-meta to attach a tag to the emitted symbol. Why not use definline, btw?

Joshua Suskalo14:04:18

because it's labeled as experimental mostly. I just am doing what core does mostly. There's a couple instances where I'm using inline as something closer to a common lisp compiler macro than a clojure inline definition.

Ben Sless14:04:57

IIRC Alex warned me several times to treat :inline with almost the same degree of care I would definline. Not that I didn't abuse it, but if it blows up in my face in the future I can't complain now 🙂

Joshua Suskalo14:04:46

Well yeah, I asked him the same question when I started down this path and he said I can be pretty confident that it won't be changing soon, and since the library is relying on experimental features in the jdk anyway, I see no reason to avoid inlines altogether.

Ben Sless14:04:45

Yes. I was merely suggesting that while you're doing heavily experimental stuff anyway, you might as well use definline where it is convenient. Which is not always, having briefly browsed your code, but I see no reason to avoid it

Joshua Suskalo14:04:01

that's fair

Ben Sless14:04:54

I also have a multiple arity definline lying around somewhere if you need it (haven't figured & args yet)

Carlo02:04:06

when I explore a clojure codebase, I often start with the evaluation of a high-level piece of code, and work my way down to what that code does, interactively. One tool at my disposal - when macros are involved - is the macroexpand family of functions, which I find quite convenient to use because I can just past the result in my buffer and continue. When functions are involved though, I don't have a similarly convenient tool. What I would want to do, is to substitute the invocation with the body of the function, in which my parameters have been replaced with the arguments of the invocation. Is there a tool that does this? If not, what's the library I should look at to write something like this myself?

Joshua Suskalo02:04:53

there is clojure.core/source which does get the source code of functions.

noisesmith19:04:01

minor correction: it' s clojure.repl/source

noisesmith19:04:07

it can be brought into current namespace scope via (apply require clojure.main/repl-requires) - clojure itself does this with your initial repl if you use the normal built in repl

Joshua Suskalo19:04:25

ah, cool, thanks for the correction

noisesmith20:04:42

np - you don't really notice until you swith namespaces in your repl and suddenly all the clojure.repl and clojure.pprint stuff is missing (recoverable as described above)

Joshua Suskalo02:04:19

The challenge with this and any tools provided by IDEs etc however is that it's very easy for what's loaded in the VM to get out of sync with the files on disk.

Nom Nom Mousse07:04:39

If I have a map like

(def m {{:sample "A" :person "B"} 1
        {:sample "B" :person "B"} 2
        {:sample "A" :person "A"} 3
        {:sample "B" :person "A"} 4})

where the key-maps can contain arbitrary keywords and values, what is the best way to return a sorted vec of the values? The keys in the keys will always be the same, i.e. (set (map keys (keys m))) will always have length 1. Since p comes before s in the alphabet and A comes before B I'd like the sort order to be:

[{:sample "A" :person "A"} 3
 {:sample "B" :person "A"} 4
 {:sample "A" :person "B"} 1
 {:sample "B" :person "B"} 2]

vanelsas08:04:26

There are people way smarter than me but something like this might work:

(map last (sort-by #(str (into (sorted-map) (first %))) (seq m)))

vanelsas08:04:30

It takes the map, turns it into a seq, then sorts the keys of the maps of each elements, and turns them into a string that is comparable

Nom Nom Mousse08:04:47

Thanks. I did not consider using sorted-map for the keys. That makes it a bit more readable 🙂

Nom Nom Mousse08:04:49

If someone has a solution they think is more efficient I'd love to hear it.

p-himik10:04:32

(defn sort-vals-by-map-keys [m]
  (when (seq m)
    (let [ks (sort (-> m first key keys))
          f (apply juxt ks)]
      ;; Don't use `first` and `second` with map entries - use `key` and `val` instead.
      (mapv val (sort-by #(f (key %)) m)))))

🙏 1

p-himik10:04:04

Given what you said in the OP, the code above assumes that all maps that are used as keys have the same set of keys among themselves.

vanelsas10:04:59

I was actually looking for a way to use juxt, but ended up going a different route. This looks pretty good.

Nom Nom Mousse10:04:09

Thanks. Also learnt about key and val 😄

Franklin10:04:58

Is there any notable advantage to upgrading from java 8 to java 17 for instance?

Franklin10:04:13

also, are there any foreseeable problems of using java 17. I have noticed some clojure libraries/packages don't support java 17 or java 11

p-himik10:04:37

> I have noticed some clojure libraries/packages don't support java 17 or java 11 Chances are, those are abandoned libraries that are relying on some JDK internals. I've stumbled upon 2 such libraries, but they were eventually fixed. Can you give an example where it hasn't been fixed? Regarding advantages - you should read changelogs and decide for yourself. One immediate thing worth mentioning is much better default garbage collector and, seemingly, memory management in general. At least on my apps, memory usage went down dramatically.

Franklin10:04:12

you are right about the 'abandoned libraries'; the one I'm having trouble with while trying to use java 17 is https://github.com/domkm/optimus-sass which doesn't seem to be currently maintained

Franklin10:04:06

enhanced memory management sounds great

p-himik10:04:48

FWIW, some time ago I was choosing a SASS library and after comparing my options decided to use https://github.com/Deraen/sass4clj Have been using it for 3+ years, I think - no problems so far.

flowthing10:04:33

Libsass is deprecated, though, FWIW. (And sass-java, which uses libsass.)

👀 1

p-himik10:04:51

Oh, that's a good point, I didn't realize that. Thanks!

kwladyka12:04:38

there are, but… I don’t remember them 🙂 But upgrade to at least 11 is reasonable.

😄 1

👍 1

ghadi13:04:44

there are massive improvements to the garbage collectors and JIT

👍 2

didibus20:04:19

I think mostly it's performance and memory usage that are improved. There's some interop features as well if you care, like JDK 11 standard library includes an async http client. There can be security reasons as well, if your option is an unpatched end-of-life JDK 8 versus JDK 17. But if you get your JDK 8 from a vendor still maintaining it then I guess you'd get security patches for it and that's no longer an advantage.

didibus20:04:03

JDK 11 and up also let you create an application bundle, if that's something you care about. Basically, it can bundle the JDK with only the modules you depend on alongside your app. That's nice for say a Desktop application, you just give it to users and they can run it, they don't need to install a JDK, it's all bundled up and relatively lean due to having only the modules you need.

Franklin08:04:14

thanks @U0K064KQV

tomekw10:04:59

I'm trying to push a a Clojure map to Algolia index. It requires an objectID property or a Jackson annotation @JsonProperty("objectID") to not duplicate records. What's the best / idiomatic way to add that? I tried using protocols and records, but with no luck yet

tomekw10:04:02

(defprotocol WithObjectID
    (getObjectID [this]))
  (defrecord Person [object-id name number]
    WithObjectID
    (getObjectID [this] (:object-id this)))
  (.getObjectID (Person. "1" "John" 42))

this doesn't work 😞

dgb2312:04:53

Have you tried https://clojuredocs.org/clojure.core/deftype ? See the image in this stackoverflow response: https://stackoverflow.com/questions/7142495/why-does-clojure-have-5-ways-to-define-a-class-instead-of-just-one

👀 1

dazld12:04:49

deftype sadly also doesn’t seem to satisfy it

eskos13:04:59

there’s also this which might be useful: https://clojure.org/reference/datatypes#_java_annotation_support

eskos13:04:33

It’s been a few years since I last used deftype with annotations, might be you need aot with it? Not because of Clojure but because of the other side not being able to see the type properly if you don’t.

dazld13:04:17

https://github.com/algolia-samples/api-clients-quickstarts/blob/bd2a922aff57e4c056d22c90c4b85feddcecfdf9/java/src/main/java/Contact.java#L36-L38

dazld13:04:25

the equivalent in javalandia.

dazld13:04:59

I’d prefer to stay away from annotations..

dazld13:04:34

private static Field getField(@Nonnull Class<?> clazz, @Nonnull String fieldName) {
    Class<?> tmpClass = clazz;
    do {
      try {
        Field f = tmpClass.getDeclaredField(fieldName);
        f.setAccessible(true);
        return f;
      } catch (NoSuchFieldException e) {
        tmpClass = tmpClass.getSuperclass();
      }
    } while (tmpClass != null);

    return null;
  }

seems to be their check. I’m not sure how to feed this.

Joshua Suskalo13:04:28

that means it will never check interfaces, so protocols are out

Joshua Suskalo13:04:40

I say using java annotations on deftype fields is probably the way to do it, but the way I'd suggest to do it imo would be to just have a java class you use from Clojure. If you really want you could make the java class with a :gen-class and aot, or you could just make a class in java and use like https://github.com/IGJoshua/americano or something to compile it.

👍 2

dazld15:04:08

Annotations did indeed make it all happy - for anyone searching in the future, here’s what it looked like:

(ns foo 
  (:import (com.fasterxml.jackson.annotation JsonProperty)))

(deftype ^{JsonProperty {}}
  Entry [^{:tag String
           JsonProperty "objectID"} objectID
         data])

tomekw18:04:41

Thank you all! 💙

👍 1

emccue14:04:02

I have an idea of what to tackle to 1.11.1.1111

Error building classpath. class java.util.HashMap$Node cannot be cast to class java.util.HashMap$TreeNode (java.util.HashMap$Node and java.util.HashMap$TreeNode are in module java.base of loader 'bootstrap')
java.lang.ClassCastException: class java.util.HashMap$Node cannot be cast to class java.util.HashMap$TreeNode (java.util.HashMap$Node and java.util.HashMap$TreeNode are in module java.base of loader 'bootstrap')
	at java.base/java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1986)
	at java.base/java.util.HashMap$TreeNode.treeify(HashMap.java:2102)
	at java.base/java.util.HashMap.treeifyBin(HashMap.java:770)
	at java.base/java.util.HashMap.putVal(HashMap.java:642)
	at java.base/java.util.HashMap.put(HashMap.java:610)
	at java.base/java.util.HashSet.add(HashSet.java:221)
	at org.apache.maven.model.validation.DefaultModelValidator.validateId(DefaultModelValidator.java:848)
	at org.apache.maven.model.validation.DefaultModelValidator.validateEffectiveDependency(DefaultModelValidator.java:660)

Alex Miller (Clojure team)14:04:06

this should be fixed in latest - what version of the CLI are you on?

emccue14:04:01

1.11.1.1105

Alex Miller (Clojure team)14:04:39

is it reproducible? can you share your deps.edn?

Alex Miller (Clojure team)14:04:53

and can I get the full stack trace

emccue14:04:37

yep, happens pretty consistently. I can send you a bunch of circleci logs

plexus14:04:58

I've seen many mentions of Clojure following "the garbage-in garbage-out philosophy". I know what that means and can see how it applies to a lot of clojure core APIs, but I was wondering... is this an official position of the core team? Is there a good write-up about that (from the core team or elsewhere)? In particular I'd like to see how they interpret this, and what the main reasons are for adopting it.

Joshua Suskalo14:04:44

This is obviously not a direct answer to your question about sources, but I believe one of the main reasons is about performance. In order to keep Clojure code within the 1-4x runtime performance of Java, having lots of extraneous argument checks etc would cause slowdown that can't be optimized out by the JIT in most cases, and so no checks are made. This is also why spec instrumentation is not recommended to be turned on in production builds, but only to track down issues in development, and during testing. I don't have a direct source on this, but I believe that this is the stance that Rich and Alex have mentioned before in the google groups, though I don't have time to dig it up for now, and Alex, being active on here, can probably confirm/deny this.

plexus14:04:53

That sounds about right, let's see if I can get it from the horse's mouth :)

andy.fingerhut15:04:24

This is also not from the horse’s mouth, but I am pretty sure it was someone in the Clojure community rather than the core team that popularized the use of GIGO when applied to Clojure APIs. I know GIGO was a common phrase before Clojure existed. My only point is that the Clojure core team probably didn’t originate using GIGO for Clojure APIs.

Alex Miller (Clojure team)15:04:24

There are a couple of things here. One, as mentioned, is that some kinds of validation have performance implications which creates tradeoffs. In general, we prefer to optimize for correct programs rather than incorrect ones - in other words, we will not penalize all correct programs with the cost of validation. (Importantly, this does not rule out validation approaches that occur at compile time, or that can be turned off with minimal penalty, or that are external.) And two is about undefined behavior being a place where evolution can occur. (We strongly reject "garbage in garbage out" as a framing in this regard.) In general, Clojure core functions define behavior for certain input sets and leave undefined what happens for other inputs. There have been many cases where those undefined input sets later became defined and behavior has been extended - either by adding arities or by new behavior on new inputs. Undefined behavior is room to grow. Prohibited behavior creates expectations that may not be true in the future - you should not rely on the rejection of input when that input may actually have meaning later. This is an important philosophical approach to design that incorporates growth.

👍 4

Alex Miller (Clojure team)15:04:41

I realize that this is at odds with the zeitgeist of static typing and checking and automated validation but I will note that afaict programs written with those tools do not seem to be any less full of bugs than the Clojure programs that I write without them and my own experience has been that the Clojure programs are easier to write and easier to modify to keep pace with changing requirements over time. If you agree with that observation, I think the natural question is - is all that static typing and validation actually actually worth the effort people put into it?

p-himik15:04:07

To corroborate the above - there has been a literature review on static vs dynamic typing done by Dan Luu: https://danluu.com/empirical-pl/ tl;dr: not enough data of acceptable quality to draw any conclusions

plexus15:04:18

Thanks a lot Alex for the input, that's very helpful. Note that if static checking is the zeitgeist it is certainly not my zeitgeist 🙂 never been a fan of throwing static analyzers at dynamic languages.

ghadi15:04:46

caring about what you can do rather than saying what you can't do

ghadi15:04:10

undefined behavior gets a bad rap because it's associated with segfaults in C/C++

ghadi15:04:34

but it's a very useful technique in the arsenal

Alex Miller (Clojure team)15:04:09

Rich intentionally focused on describing defined behavior (and not talking about undefined behavior) in docstrings but I think he wishes now that had been a bit more explicit.

plexus15:04:41

The reason I'm asking is related to error messages, we've gotten a lot of positive feedback on our recent error message blog post (https://lambdaisland.com/blog/2022-04-07-Clojure-Error-Messages) but also some salty comments in the vein of "why doesn't it just tell me what's actually wrong"

plexus15:04:07

Good example: derefing anything that's not an IDeref or a java future will give you "can't cast to java.util.Future", instead of "Can't deref <type>, expected an IDeref (like Atom, ...) or Future"

👍 1

Alex Miller (Clojure team)15:04:41

and that would be a fine thing to ask on ask.clojure, and write a patch for

👍 2

Alex Miller (Clojure team)15:04:57

the main constraint for stuff like this is that we try to be careful to keep method bodies small for hotspot inlining so it is often better to move error message logic that involves diagnosis and message construction into a secondary function (not sure that's needed here - this usage has just expanded here when the message has not)

Alex Miller (Clojure team)15:04:34

I have some other feedback on this article but I have not had the time to write it anywhere, not sure where you would want that feedback. also don't want to be a pedantic asshat about every fine detail of it. :)

Alex Miller (Clojure team)16:04:37

if you (or anyone) wanted to sit down and walk through these and talk about them and what might be done, I would be very happy to spend that time and empower that work to go in a direction we could incorporate

plexus16:04:56

Thank you Alex, I think I may take you up on that offer together with Ariel. I wasn't sure if addressing these things was even something that would really be considered, I'm very happy to hear it is. Feedback on the article would be welcome too!

didibus21:04:04

It's worth noting that macros are now fully validated at expansion time against their spec. I wonder if inlined functions could be similarly specced and be validated at inlining? Even though I know it wouldn't be able to do much for variables. Anyways, my point is just that at least when it comes to macros, Clojure has chosen to not go the GIGO route. The compiler will validate your use of the macro for what is valid usage. So as others have said, I don't think it's a weird ideological thing. It's more about the practicalities of it. Can it affect performance, could it over-validate and preclude some valid usage. Could it create assumptions about behavior that if it later evolves to behave differently can break users, has anyone put the effort to implement the zero-cost validation for it, etc.

didibus21:04:14

My impression is this is also true of static typing. Like Alex said, he's taking a very pragmatic stance, my programs don't seem to have more bugs without them. So the question is, why put effort in the compiler for it? Why restrict some correct programs from being valid to the static checker? Why force to refactor a function from (:name user) to (:name user-without-password) when clearly both work? Why force people to type annotate a bunch of things? If it doesn't reduce bugs, what's the point? I think some people say it helps with refactoring, that's fair, but others say that's not a good enough benefit to outweigh all that effort and other cons that comes with it.

Joshua Suskalo22:04:21

right, well it makes more sense to spec macros and not inline functions, because speccing an inline function would be vastly different from speccing the function that it inlines, and those two are designed to share semantics. You have to accept basically anything as arguments to inlines.

Alex Miller (Clojure team)22:04:58

inlined functions are designed to remove function invocation overhead; it makes no sense to add validation overhead back into it. we have specs, and public libs exist with specs for core functions, and you can instrument and use them if you like (but it's really slow)

Alex Miller (Clojure team)22:04:41

https://github.com/borkdude/speculative - but note that these specs have the problems mentioned above - they are "locked down" and may thus fail with inputs made valid in the future

sheluchin22:04:52

> Prohibited behavior creates expectations that may not be true in the future - you should not rely on the rejection of input when that input may actually have meaning later. This is an important philosophical approach to design that incorporates growth. I recall Rich talking about this in one of the videos. Something along the lines of "what, you're sure it will NEVER be allowed?!". Anyone know which talk it is?

didibus23:04:52

I meant to spec the expansion of inline functions, same as for macros, not the invocation of the inlined code. It wouldn't be that useful I admit, it would only catch issues when called with values directly, like: (min "a") could fail at expansion, because the online expansion could see that "a" is neither a symbol or a number.

didibus00:04:34

And to be fair to speculative, it only has this problem because it's not updated alongside core. (not saying it doesn't have other problems) I actually feel allowing undefined behavior has more future breakage issues. If (+ "a" 2) returned nil for example. And someone started to depend on that. And later it was evolved that it returns "a2", you'd break that person's use of it. That's where you'd tell them, well this was always undefined you shouldn't have depended on it. But if (+ "a" 2), because it is undefined now, failed to validate, you'd have known to not depend on it. Speculative actually made that very apparent. One of the issues it had was it realized in the wild people had come to depend on all kind of undefined behavior, and actually finding a spec that was broad enough to include all of those was a challenge.

andy.fingerhut00:04:02

I didn't follow the design/implementation of speculative in any detail, but isn't it based upon the best guess of someone not on the core team what is considered defined behavior, vs. what is not? I know there were many questions asked of folks like Alex Miller in many cases, but I'm guessing that if the core team went over speculative with a fine-toothed comb there would likely be differences between what they consider defined, vs. what speculative checks for. (this is not a request for the core team to take the probably huge time that would require -- just a speculation on speculative 🙂

andy.fingerhut00:04:49

If/when future Clojure versions define new behavior for additional kinds of inputs, speculative could be updated to match, i.e. it would be "locked down" for the new version of Clojure at that time.

andy.fingerhut00:04:16

If one is using a particular version of Clojure, it seems you would want a version of speculative that was "locked down" for that version of Clojure. Who actually wants to use behavior that the core team considers undefined and that might change in the future? (OK, maybe some people do, if it happens to be useful to them right now, but they might also like to know they are potentially on thin ice.)

👍 1

Alex Miller (Clojure team)00:04:57

I provided a lot of input to the speculative effort

didibus00:04:08

Here's a feature request I logged a long time back about speccing functions as macros: https://ask.clojure.org/index.php/9876/conform-function-specs-at-read-time-similar-to-macros maybe that'll make it more clear.

Alex Miller (Clojure team)00:04:28

I don't think that makes any sense tbh

☝️ 1

didibus00:04:18

The feature or the explanation of it? 😝

Alex Miller (Clojure team)00:04:31

Well, both :)

Alex Miller (Clojure team)00:04:19

You have to go much farther to do anything useful, and at that point you have clj-kondo effectively

didibus00:04:28

I'm thinking say a regex function:

(re-find #"\d+" "abc12345def")

It's very often called where the regex is provided as a literal. The compiler could check that the unevaluated read forms conform to the :args spec.

didibus00:04:44

I can be convinced by the, not worth the effort for something that only catches usage of literal errors. It's true it might not be of much use.

didibus00:04:12

But I feel that same argument could have applied to macros as well no?

didibus00:04:02

Or you have functions like run-jetty that can take a million options. It's easy to typo or think you're passing the correct key for the option but you're not.

didibus00:04:56

Or when I call update-in, I always forget if the vector takes keywords or symbols, and that's always called literally.

didibus00:04:25

I admit, not that useful haha. It has zero upvote, so probably no one cares, and clj-kondo can do all that and more now too. Just was on the train as what kind of validation could be done that's zero-cost.

didibus00:04:54

Might be more useful in 1.11, if more people make use of named arguments, it's easy to typo a keyword, would be nice if compiler could tell you that's not a valid named parameter at compile time. (though that could probably be implemented some other way without even needing spec)

Joshua Suskalo01:04:45

The difference is that function arguments are evaluated. Any form has to be accepted in any argument position for inline functions because any form is accepted in any argument position by the non-inline version.

Joshua Suskalo01:04:10

Speccing that is useless because the spec would just be like any? with a fixed count, so you'd only catch arity errors early.

Alex Miller (Clojure team)01:04:53

It's different than macros because macros often define custom syntax (positional meaning) and that structure can be verified at compile time. We don't spec all the macros - we spec the ones that effectively define new language - ns, defn, etc

didibus01:04:22

I personally find some functions also tend to define DSL like syntax, even if they don't need to break the rules of evaluation, but I admit there are a lot fewer of those.

agigao17:04:22

Hello Clojurians, I’m having some Java interop adventures and wonder - how to properly type hint the output of the function when it returns an array of bytes. Example for illustration:

(defn ^bytes f [^String x] (.getBytes x))

Alex Miller (Clojure team)17:04:06

(defn f ^bytes [^String x] (.getBytes x))

agigao17:04:27

Oh, thank you Alex! Now doc-string complained and is it a right form to provide a docstring after a hint?

(defn f ^bytes ^{:doc "f does x"} [^String x] (.getBytes x))

Joshua Suskalo18:04:05

no, the docstring must be provided either as the defn arglists suggest, or as meta on the var.

(defn f
  "f does x"
  ^bytes [&String x]
  (.getBytes x))

👍 1

agigao19:04:35

Thanks :the_horns:

Joshua Suskalo18:04:55

so what happened with io!? Is it just that the community forgot it existed and so now huge swaths of effectful code just doesn't have it wrapping it? Or was there a deliberate decision in the past to stop spreading its use?

phronmophobic18:04:31

I thought io! was only useful in the context of refs/stm. Since refs are rare, there's generally not much benefit to sprinkling io! s.

Joshua Suskalo18:04:51

that's the one place that currently uses the fact that code is wrapped in io!, but refs are useful in some cases, and if they were used more then io! could be used by linters in order to discover effectful functions and do something with that information.

Alex Miller (Clojure team)19:04:52

it was not a deliberate decision, just never really caught on (partly due to the frequency of refs I think)

Joshua Suskalo19:04:32

fair enough.

phronmophobic19:04:18

> if they were used more then io! could be used by linters in order to discover effectful functions and do something with that information. Potentially, but I'm not sure it would be straightforward. While the name io! implies that it's marking code as doing some sort of IO, it's actually marking code as "should not be run within an STM transaction". While those two are ideas are similar, they're not necessarily exactly the same. It might be useful for statically catching errors for other contexts that might care about IO, but it's probably too coarse. Different contexts will care about different properties: • idempotent • network IO • file IO • stateful • no long running computations • memory consumption • etc.

Joshua Suskalo19:04:02

that's fair

didibus21:04:33

I feel that also made it so no one uses io!, When people switched from, marking code as doing io, to can be used inside ref, people lost interest. Personally though, I would say having a "pure" would be more useful. Unless you want an effect system where you can inject the type of io needed, which that's just out of scope I think. But it be cool for some things to say warn you, like everytime I use something inside a seq that isn't marked pure, it could warn me. But even that haha, it would probably just get annoying.

noisesmith18:04:45

@U7RJTCH6J I'd argue that in all cases where I care about side effects, I care about that same list of things (eg. I don't want them inside a swap!, I don't want to do them inside lazy code (due to bugs caused by indeterminacy of if / when they are calculated), and I don't want to do them inside a core.async go block (because they would clog the go thread pool))

noisesmith18:04:00

none of those concerns are specific to refs

phronmophobic18:04:25

and not all of those things are IO

phronmophobic18:04:35

I can also think of exceptions where you might find idempotent file I/O to be acceptable within a swap! or core.async block

noisesmith18:04:22

file I/O can take minutes to complete, it depends on the state of the system, I don't consider it safe

phronmophobic18:04:44

It's not always safe, but I think there may be some use cases where it's a reasonable exception

phronmophobic18:04:13

for async go blocks, there isn't directly a problem with doing IO. The problem for go blocks is taking too long and consuming go threads. If you could guarantee that IO was consistently "quick", then IO isn't a problem. There's also the converse, where you have code that takes a long time, but doesn't do IO and would be unacceptable in an STM transaction.

phronmophobic19:04:33

I believe go blocks won't rerun parts of your code, right? This is a different problem from STM transactions and swap! where the same code might be rerun many times.

noisesmith19:04:48

> if you could guarantee that IO was consistently "quick" this is impossible on any OS I've deployed ot

noisesmith19:04:25

depending on your definition of "consistent" I guess

phronmophobic19:04:12

I don't think it's a rare use case where doing a small amount of IO in a go thread would make it worth managing your own threads.

phronmophobic19:04:10

I'm not saying it would be useless to mark functions with any amount of "unsafeyness", but I'm not sure that putting all types of unsafeyness in one bucket is that useful.

noisesmith19:04:53

backing up for a moment, I could see a use for an expensive! macro, like io! but explicitly about using some resource (time, memory, metered API access ...)

phronmophobic19:04:59

That sounds interesting. The worry I have is that you basically start going down this rabbit hole of categorizing types of resources and end up at haskell 😱

noisesmith19:04:36

no, I wouldn't split it - I'd keep it generic (and you'd probably want a "I know what I'm doing" escape hatch)

noisesmith19:04:13

but you could annotate code as expensive, and then annotate contexts as "cheap", and cheap contexts could compile error if they contain expensive code

noisesmith19:04:29

regarding io in a go block - (go (<! (thread (write-to file data))) ...) is pretty easy

noisesmith19:04:21

where thread is core.async/thread of course, using an expanding thread pool and returning a channel

phronmophobic19:04:50

that's true, but is it worth it for (go (println "hello"))?

phronmophobic19:04:41

I would put println in the bucket of "acceptable IO within a go block" for many use cases .

phronmophobic19:04:47

but I would avoid using println inside a transaction or swap function

didibus19:04:08

Seems maybe the better properties are idempotent and heavy ?

👍 1

didibus19:04:43

swap! and ref and all need idempotency. Go needs to avoid heavy things.

didibus19:04:45

But I think this conversation is showing why none of this took off maybe? It seems to be too contextual, too many details, too many edge cases where maybe you do want to do it, too many nuances, that it seems adding all these guard rails just get in the way and you're always having to step over them.

👆 1

phronmophobic23:04:11

That's basically my reasoning, but I wouldn't rule out someone smart and motivated being able to do something interesting in this space.

2022-04-12

Channels