beginners 2023-07-10 | Slack Archive

Daniel Shriki09:07:28

Hi guys! I’m trying to make spec for an input, but the spec can change based on the input. if :message/source is “email” - text should allow up to 5K letters, where for sms it should allow only 160. I made the following, but it’s not quite working as expected:

(s/def :email/text
  (s/spec (s/and string? #(<= 1 (count %) 5000))
          :gen #(gen/fmap str (gen/choose 1 5000))))

(s/def :sms/text
  (s/spec (s/and string? #(<= 1 (count %) 160))
          :gen #(gen/fmap str (gen/choose 1 160))))

(defmulti message-text :message/source)

(defmethod message-text "email" [_]
  (s/keys :req-un [:email/text]))

(defmethod message-text "sms" [_]
  (s/keys :req-un [:sms/text]))

(s/def :message/text (s/multi-spec message-text :message/source))

(s/def ::messages-post
  (s/keys :req-un [:customer/id
                   :message/source
                   :message/text]
          :opt-un [:message/subject]))

I’m obviously missing something but I’m not sure what or if this is the right approach to achieve what I need Thanks for the help 🙏

rolt12:07:30

They way you're describing the spec it looks like you want data that looks like this:

{:id 1 :source "email" :text {:text "abc" :message/source "email"}}

I guess you're looking for something like:

(defmulti message-post :source)  ;; if you're using non namespaced keywords for the dispatch function
(defmethod message-post "email" [_] (s/keys :req-un [:customer/id :message/source :email/text] :opt-un ...)
(defmethod message-post "sms" [_] (s/keys :req-un [:customer/id :message/source :sms/text] :opt-un ...)

(s/def ::messages-post (s/multi-spec message-post :source)

{id 1 :source "email" :text "abc"}

(I have not tested that code) you could re-use the "common" part of the spec (id, source, subject) and do something like s/merge probably. But i'm not sure that plays well with generators I haven't used spec in a while

Benjamin18:07:19

A friend asked me what the repl is. I described how we load the pieces of the code into the program 1 by 1. He asked me why this is not an issue in a big project. Now I am sending him the NASA repo and I am thinking about what benchmark I can make. I make 2 million functions that return their own name and it ~~doesn't seem to matter much.~~ I suppose I can tell him how namespaces ~~are clojure maps~~ which have great performance characteristics?

(map
   (fn [s] (intern *ns* s (fn [] s)))
   (map
    (fn [_] (symbol (str (random-uuid))))
    (range (* 2 100 1000))))

How can I steelman his question? I never had any issue with a big repl, except that the initial startup is longish, which just doesn't matter so much. Actually, I just was missing a doall and defing 2 million symbols into the current namespace like this actually makes a noticable difference in the cider eldoc as well. Lol.

hiredman18:07:57

namespaces are not clojure maps

Benjamin18:07:43

I'm not sure if I was right about the eldoc. I try again

hiredman18:07:17

namespaces hold a number of different mappings, are are ConcurrentHashMaps under the hood

hiredman18:07:24

the main mapping is from symbols to vars

hiredman18:07:38

when clojure code is compiled it looks up the var, and that looked up var is what is used in the compiled code

hiredman18:07:01

at runtime the code doesn't look things up in the map

jpmonettas18:07:41

imho trying to explain a repl confuses people because they tend to compare it with other languages shells and consoles. I think the important thing to explain is a language and a system designed to build a program interactively while you run it by adding and removing from it without the need to restart it

😃 2

☝️ 2

hiredman18:07:18

for

(defn f [x] (+ x x))

the compiler emits bytecode that is sort of like

(def f (let [g (resolve '+)] (fn [x] ((deref g) x x))))

hiredman18:07:04

the var resolution happens once, and every time f is invoked it just rederefs the known var

hiredman18:07:12

the performance of looking things up in namespaces then only matters when compiling and when loading code, but not when running code

Benjamin18:07:03

🤯

Benjamin18:07:27

what about all the code that is loaded in a big project? Will it not bloat the repl?

jpmonettas18:07:04

functions are compiled to classes, and is just the JVM loading classes

hiredman18:07:04

that question is not really answerable

hiredman18:07:33

code takes space, all code does, for any language or runtime

hiredman18:07:52

so loading more code takes more space

daveliepmann18:07:53

the repl is, in one sense, the ability to interact with your running program. if you don't worry about loading the project itself, why would it be a problem "in the repl" (which is the same program but you interact with it directly)

hiredman18:07:58

loading less code takes less space

hiredman18:07:18

the repl has nothing to do with code taking space

Benjamin18:07:01

yea the program is loading code. The repl doesn't have anything to do with that. It is just a interaction layer on top of the program.

👍 2

daveliepmann18:07:29

https://jackrusher.com/strange-loop-2022/ especially at the various points when he mentions "interactive" programming or runtime. That might be a more helpful style of answer to "what is a REPL?".

Benjamin18:07:37

what would you say when the one says: "This dotnet project takes minutes to compile, it must be so much code!" "Doesn't Clojure break when it is holding all that code in memory?"

hiredman18:07:19

why would it?

Benjamin18:07:38

because it is so much! I cannot imagine it working

😂 2

hiredman18:07:58

when machine code runs, it is loaded in memory, do you ever ask yourself if it is too much?

☝️ 2

hiredman18:07:17

the clojure runtime doesn't actually keep a reference to the clojure source

jpmonettas18:07:19

it is the "same" amount of byte code, doesn't matter if you compiled java or clojure

👍 2

hiredman18:07:34

it compiles it to jvm bytecode loads it, and throws the source away

👍 2

hiredman18:07:25

you can aot compile clojure, which does the same process but saves the bytecode to disk, and then you can just load the bytecode and not use the source at all

jpmonettas18:07:37

even if you were to maintain a reference to the source code strings loaded in memory, it should be cheap anyway, source code doesn't take a lot of memory

hiredman18:07:54

the primary thing that impacts is start up speed, either way it is being compiled to bytecode and the jvm is executing the bytecode

Benjamin19:07:58

so we start the jvm, we load clojure core and then we load our clojure sources as needed. And they byte comple when we load, same as jvm or dotnet.. And then we have the byte compiled classes in a lookup?

hiredman19:07:40

user=> (defn f [x] x)
#'user/f
user=> (println (nope/disassemble (fn [] (f 10))))
// Compiled from NO_SOURCE_FILE (version unknown : 52.0, super bit)
public final class user$eval366$fn__367 extends clojure.lang.AFunction {
  
  // Field descriptor #13 Lclojure/lang/Var;
  public static final clojure.lang.Var const__0;
  
  // Field descriptor #24 Ljava/lang/Object;
  public static final java.lang.Object const__1;
  
  // Method descriptor #7 ()V
  // Stack: 1, Locals: 1
  public user$eval366$fn__367();
    0  aload_0 [this]
    1  invokespecial clojure.lang.AFunction() [9]
    4  return
      Line numbers:
        [pc: 0, line: 1]
  
  // Method descriptor #11 ()Ljava/lang/Object;
  // Stack: 3, Locals: 1
  public java.lang.Object invoke();
     0  getstatic user$eval366$fn__367.const__0 : clojure.lang.Var [15]
     3  invokevirtual clojure.lang.Var.getRawRoot() : java.lang.Object [20]
     6  checkcast clojure.lang.IFn [22]
     9  getstatic user$eval366$fn__367.const__1 : java.lang.Object [26]
    12  aconst_null
    13  astore_0 [this]
    14  invokeinterface clojure.lang.IFn.invoke(java.lang.Object) : java.lang.Object [29] [nargs: 2]
    19  areturn
      Line numbers:
        [pc: 0, line: 1]
        [pc: 6, line: 1]
        [pc: 12, line: 1]
      Local variable table:
        [pc: 0, pc: 19] local: this index: 0 type: java.lang.Object
  
  // Method descriptor #7 ()V
  // Stack: 2, Locals: 0
  public static {};
     0  ldc <String "user"> [33]
     2  ldc <String "f"> [35]
     4  invokestatic clojure.lang.RT.var(java.lang.String, java.lang.String) : clojure.lang.Var [41]
     7  checkcast clojure.lang.Var [17]
    10  putstatic user$eval366$fn__367.const__0 : clojure.lang.Var [15]
    13  ldc2_w <Long 10> [42]
    16  invokestatic java.lang.Long.valueOf(long) : java.lang.Long [49]
    19  putstatic user$eval366$fn__367.const__1 : java.lang.Object [26]
    22  return
      Line numbers:
        [pc: 0, line: 1]

}
nil
user=>

👀 2

jpmonettas19:07:53

I think this could be also a little easier to read :

(defn foo [a b]
  (+ a b))

(defn bar []
  (foo 4 5))

compiling those functions generate this two classes :

// Decompiling class: dev$foo

public final class dev$foo extends AFunction
{
    public static Object invokeStatic(final Object a, final Object b) {
        final Number add = Numbers.add(a, b);
        return add;
    }
    
    @Override
    public Object invoke(final Object a, final Object b) {
        return invokeStatic(a, b);
    }
}


// Decompiling class: dev$bar

public final class dev$bar extends AFunction
{
    public static final Var const__0;
    public static final Object const__1;
    public static final Object const__2;
    
    public static Object invokeStatic() {
        final Object invoke = ((IFn)dev$bar.const__0.getRawRoot()).invoke(dev$bar.const__1, dev$bar.const__2);
        return invoke;
    }
    
    @Override
    public Object invoke() {
        return invokeStatic();
    }
    
    static {
        const__0 = RT.var("dev", "foo");
        const__1 = 4L;
        const__2 = 5L;
    }
}

Benjamin19:07:10

he says the NASA project is not big enough. Only 13600 lines clj files

jpmonettas19:07:02

the important parts related to vars are : • when the class gets initialized (when the dev/bar fn instance is created it will load a reference to the dev/foo var

const__0 = RT.var("dev", "foo");

• and every time dev/bar is invoked, it will deref that var to get the fn the var is pointing to, and then call it

final Object invoke = ((IFn)dev$bar.const__0.getRawRoot()).invoke(dev$bar.const__1, dev$bar.const__2);

Benjamin19:07:07

thanks, will check. Still hard for me to convincingly make my friend relaxed. I think he just fundamentally assumes that big projects bloat .... something. Maybe because of the suffering he had from csharp tooling and a big project... I remember rider trying to analyze code 😅

daveliepmann19:07:07

sounds like the central concept has not been communicated

daveliepmann19:07:29

everything has already been compiled because the repl is the loaded, running program

Benjamin19:07:22

Ah do we compile clojure sources on the classpath? Say I have src/foo.clj I thought we only load - and thereby compile - when I require or load ? Unless I am requiring foo of course.

Bob B19:07:53

another thing to keep in mind is that projects typically don't have to get as big (in line count) in clojure as in e.g. C# to accomplish a similar thing it's also easy to just decide "this can't possibly work" and then dismiss every example of it working as "eventually it won't work", and sure, eventually you'll run out of memory to hold a running system; C# and Java can also throw OOMs

didibus06:07:30

It's a funny question your friend asked you. But that's basically why we have a lot of RAM in our computers nowadays.

didibus06:07:19

When you start the REPL, Clojure will compile the code, load it, and initialize it, this is why REPL start takes a while. AOT compile can be used to pre-compile it. A Java app work the same way (but is always AOT). But when you start a Java app, it similarly loads the code and initializes it. That code will take up some space in memory. It's not that much though, even for very large code base.

didibus06:07:07

The REPL doesn't really change anything. It just allows you to load additional code into the app at runtime.

👍 2

didibus06:07:08

> "This dotnet project takes minutes to compile, it must be so much code!" "Doesn't Clojure break when it is holding all that code in memory?" Clojure will compile code in a streaming fashion, as needed. Though it can also do it all ahead of time using AOT. It doesn't hold the source code in memory. So when a ns is loaded, the file is taken from the JAR, and each form one by one are compiled and loaded. The source file is not maintained in-memory. The difference is between AOT compilation, and streaming. AOT means, the compiler loops over all files, and one by one compiles them, and saves the compiled output back into files. Then when you start the app, those compiled files are loaded into memory by JVM. Clojure, when not AOT, will do this compile -> load as needed, in a streaming fashion.

didibus06:07:39

This also allows the REPL, because you can stream source code to the REPL, and it will compile and load it on-demand.

didibus06:07:32

It gets a bit more complicated underneath. Because here it compiles to bytecode, which is an IR (intermediate representation). The JVM will then interpret this bytecode to execute it, using an interpreter. And it will also start to profile it as it gets executed. As it gathers heuristics on it, it will take the bytecode and JIT compile it into optimized machine code (Just in time compilation). But it will not forget the bytecode once it does that, because if the code paths change dramatically, it will deoptimize, throw away the machine code and go back to interpreting, or recompile it into a different machine code.

didibus06:07:11

This is kind of why in general, JIT runtimes like the JVM eat up more memory. Not only do you maintain bytecode + machine code in memory, but also a bunch of profiling information, and all of the code needed for the JIT and GC themselves.

didibus06:07:00

And all that is why you'll struggle getting an app in Java/Clojure that can run with less than 100MB, because all this stuff takes up a minimum amount of memory on its own. But if you use Graal to do a native image, that stuff won't be there, it's all pre-compiled directly into static machine code, and that's all that gets loaded. Now, there's still going to be more than a small C program, because it will bring about a whole GC with it, but it will have no uneeded code, because it's all tree shaken, statically, and pre-compiled to machine code. So it takes considerably less memory.

Abhi K20:07:33

Hi Clojurians I have been struggling with this for sometime now, any pointers how should I approach this: I have below list of maps, I need to filter it with below criteria so it produces given output: INPUT: [{:a 44, :b 2041, :sponsorid 7, :peer 1, :customer 0, :monthnumber 2, :yearnumber 2022, :c 0.0} {:a 44, :b 2042, :sponsorid 7, :peer 1, :customer 0, :monthnumber 3, :yearnumber 2022, :c 0.0} {:a 44, :b 2041, :sponsorid 5, :peer 1, :customer 0, :monthnumber 2, :yearnumber 2022, :c 0.0} {:a 44, :b 2041, :sponsorid 5, :peer 1, :customer 0, :monthnumber 3, :yearnumber 2022, :c 0.0} {:a 44, :b 2041, :sponsorid 4, :peer 1, :customer 0, :monthnumber 2, :yearnumber 2022, :c 0.0} {:a 44, :b 2041, :sponsorid 3, :peer 1, :customer 0, :monthnumber 2, :yearnumber 2022, :c 0.0}] CRITERIA: COUNT(DISTINCT sponsorid) >= 2 AND COUNT(DISTINCT CONCAT(a, b)) >= 2 OUTPUT: [{:a 44, :b 2041, :sponsorid 7, :peer 1, :customer 0, :monthnumber 2, :yearnumber 2022, :c 0.0} {:a 44, :b 2042, :sponsorid 7, :peer 1, :customer 0, :monthnumber 3, :yearnumber 2022, :c 0.0}]

Bob B21:07:00

those two maps share one sponsorid, so I think the count of distinct sponsorid is maybe a bit misleading - it seems like you want any groups by sponsorid that have more than one unique value of concatting a and b (and I assume the desire is to get all the members of that group) so, I think you'd have to first group by sponsorid, and then filter each group

Abhi K21:07:25

that worked, thanks @U013JFLRFS8

2023-07-10

Channels