Fork me on GitHub
#beginners
<
2023-02-25
>
slk50011:02:53

is there refactoring in emacs to convert a (fn [] ...) to #(...) ?

daveliepmann11:02:56

I'm not aware of one. If it existed I'd expect to see it in clj-refactor. Perhaps https://github.com/clojure-emacs/clj-refactor.el/wiki/cljr-promote-function could be extended or mimicked to provide a "demote" operation.

elken12:02:31

clojure-lsp does

daveliepmann12:02:55

@U043RSZ25HQ any hint to how? The docs I see either point to clj-refactor or don't mention this refactoring, as far as I can tell. I can't find anything with m-x or apropos with my current version.

elken12:02:16

Wrong person 😛 Follow https://clojure-lsp.io/installation/ then https://clojure-lsp.io/clients/ and when you lsp-execute-code-action over a function you should get the option

👍 4
❤️ 2
slk50013:02:38

that's weird that those refactoring are under list in one command, and others like rename are separate commands, @U043RSZ25HQ thx again for hint. I think I finally started to feel comfortable with my setup 😄

🙌 2
elken13:02:36

Things like rename are defined by the protocol, code actions are server-by-server 😄

slk50013:02:28

hmm thx for clarification 😄 It have sense from software stand point but for user interface they are similar actions -> refactorings

elken13:02:12

Code actions aren't always available, so it's not (easily) possible to have them able to be bound anywhere. You should always be able to rename something or find its references 🙂 It's similar across all LSP-speaking servers

👍 2
slk50013:02:53

btw. have tried with tree-sitter ?

Markus19:02:59

I have a question regarding optimization. The following function calculating the f1-score is painfully slow sometimes, reaching values of 280000ms when evaluating with time, but I am unable to pinpoint the problem.

(defn f1-score
  [word-map func]
  (let [{:keys [valid-words invalid-words]} word-map
        valid-func                          (partial func :true-pos :false-neg)
        {true-pos  :true-pos
         false-neg :false-neg}              (reduce valid-func
                                                    {:true-pos 0 :false-neg 0}
                                                    valid-words)
        invalid-func                        (partial func :false-pos :true-neg)
        {false-pos :false-pos}              (reduce invalid-func
                                                    {:false-pos 0 :true-neg 0}
                                                    invalid-words)
        ;; it is important to catch a division by 0
        precision                           (if (> (+ true-pos false-pos) 0)
                                              (/ true-pos
                                                 (+ true-pos false-pos))
                                              0)
        recall                              (if (> (+ true-pos false-neg) 0)
                                              (/ true-pos
                                                 (+ true-pos false-neg))
                                              0)]
    (if (> (+ precision recall) 0)
      (* 2 (/ (* precision recall)
              (+ precision recall)))
      0)))
In my case func is given like this:
(defn regex-reducer
  "Updates the counter of a map corresponding to value."
  [k1 k2 map value]
  (if value
    (assoc map k1 (inc (map k1)))
    (assoc map k2 (inc (map k2)))))
and word-map looks like this for valid-words and invalid-words:
(nil nil nil "aa" "abab" nil nil "abaaa" ...)
The word-map initially contains randomly generated words, which are then matched with different regex strings. The code above calculates the f1-score for each regex. Has someone got an idea what might be the issue? Im open to adding more details and/or code if needed.

Ben Sless20:02:05

This is probably because you're working with boxed math

Ben Sless20:02:13

The fastest way to calculate two primitive rolling values would be using a loop

Ben Sless20:02:03

if you want to profile it, you can use the excellent https://github.com/clojure-goes-fast/clj-async-profiler

Markus20:02:25

Thanks! I will read into it and update this thread if I get stuck again.

Markus21:02:03

Well, I tried to profile my code and it now kinda shows me, that not the code I provided earlier is the issue, but rather matching all words in word-map… I am currently using this function to match on all strings with a given regex like (a*)|b:

(defn find-matches
  "Returns a list of the full match or nil for every word in string-list matched by regex."
  [regex-str string-list]
  (map #(-> regex-str re-pattern (re-matches %) (nth 0 nil)) string-list))
What would be the most efficient way to do this?

Ben Sless21:02:45

precompile the pattern, call re-pattern once

Ben Sless22:02:07

then use a non-capturing group

Markus23:02:30

Thanks! I now rewrote it like this and I added non-capturing groups instead of groups:

(defn find-matches
  "Returns a list of the full match or nil for every word in string-list matched by regex."
  [regex-str string-list]
  (let [pattern (re-pattern regex-str)]
    (pmap #(-> pattern (re-matches %) (nth 0 nil)) string-list)))
I also introduced pmap which increased the speed as well. I generate my regex strings via genetic programming, so it is a bit harder to control the structure. Most of the time its fast, but when its slow a lot of time is taken up by java.util.regex.Pattern$GroupTail.match, java.util.regex.Pattern$GroupHead.match and java.util.regex.Pattern$BmpCharPropertyGreedy.match in roughly this order with java.util.regex.Pattern$GroupTail.match being the most common. What could be the cause of this and how can I improve my regular expressions at this point?

Ben Sless05:02:33

Let's not introduce pmap before we finish optimizing the regex Can you share the full regex?

Ben Sless05:02:23

Are you using lein or deps?

Ben Sless06:02:05

Since the regex is generated, can you share a pathological case?

Markus11:02:21

First of all, thanks a lot for spending so much time on answering my questions, @UK0810AQ2! :saluting_face: I’m using deps. My regex is only using following patterns: |*+?. For now I always automatically group each pattern to prevent invalid regular expressions. To prevent something like a++ from happening, it will be automatically grouped to ((a+)+). This, for now, is the same for all of these patterns. Since you advised the usage of non-capturing groups, the example regex from above now looks like this (?:(?:a+)+). One improvement I already have planned is removing unnecessary repeated operations like the double + above and simplifying things like ((a+)?) to (a*), but this hasn’t been implemented yet. It could prove to be a bit difficult as well when working with the tree structure of genetic programming, so any advice besides minimizing the regex is very welcome as well!

Markus11:02:37

One thing I stumbled upon as well are atomic groups (?>) in regex. I know that since they don’t allow for any backtracking, one would have to be careful when using them, but maybe they could be helpful as well when working with generated regexes?

Markus11:02:55

And one of the ‘worst’ regular expressions my gp algorithm has to mimic would be this one: b*(((aa)+b*)|(a(aa)*(bb)+))*a?. It would become this abomination when automatically grouped while generating it from the tree: (b*)(((((aa)+)b*)|((a((aa)*))((bb)+)))*)(a?)

Ben Sless11:02:06

Interesting. how are you generating the regexes?

Ben Sless11:02:32

one of the problems with | is it's backtracking

Ben Sless11:02:55

and if you can share the entire code (public repo?) I could also benchmark it and poke at it

Ben Sless11:02:06

you could also try to feed the expressions through a simplifier

Markus12:02:41

I am using a modified Clojure implementation of John R. Koza’s Common Lisp algorithm for gp from his book ‘Genetic Programming: Complex Adaptive Systems’. His implementation can be found on page 737 onwards. I can make my code publicly available, but it is part of my bachelor thesis, so I would have to look into it. This is why I will try my best to explain the general process here first and then open up the repo if more code for clarification is needed. I have defined following things beforehand:

(def functions-regex
  [:* :+ :? :& :|])

(def arities-regex
  {:* 1
   :+ 1
   :? 1
   :& 2
   :| 2})
and a terminal set like ["a" "b"] is created as well. I’m using the & internally to create patterns like aa. I’m solely working with tomita grammars, so the only terminals ever in use for now are a and b, but the implementation is so generic, that any number of terminals should work. The algorithm to generate an individual of a generation looks like this:
(declare create-arguments-for-function)

(defn create-individual-program
  "Creates an individual using the given function-set and terminal-set.
  Aritiy-map is used to determine the number of arguments for each function. Remaining-depth
  is the remaining depth of the tree we can create. When remaining-depth reaches
  0, only terminals are selected. Top-node? is only true, when we are the top node of the tree.
  Full? indicates if this individual should be maximum bushy or not."
  [function-set arity-map terminal-set
   remaining-depth top-node? full?]
  (cond (<= remaining-depth 0)
        ;; Maximum depth reached, only terminals at this point!
        (h/choose-from-terminal-set terminal-set)
        (or top-node? full?)
        ;; We only select functions if the current node is the top node or the full method
        ;; is used.
        (let [function (rand-nth function-set)
              arity (arity-map function)]
          (into []
                (cons
                 function
                 (create-arguments-for-function
                  arity
                  function-set arity-map terminal-set
                  (- remaining-depth 1) full?))))
        :else
        (let [choice (rand-int (+ (count function-set)
                                  (count terminal-set)))]
          (if (< choice (count function-set))
            ;; Select a function from the function set
            (let [function (nth function-set choice)
                  arity (arity-map function)]
              (into []
                    (cons
                     function
                     (create-arguments-for-function
                      arity
                      function-set arity-map terminal-set
                      (- remaining-depth 1) full?))))
             ;; Select a terminal from the terminal set
            (h/choose-from-terminal-set terminal-set)))))

(defn create-arguments-for-function
  "Creates the argument list for a node.
  Number-of-args is the number of arguments still remaining to be created.
  Each argument is created by calling create-individual-program."
  [number-of-arguments
   function-set arity-map terminal-set
   remaining-depth full?]
  (when (> number-of-arguments 0)
    (into []
          (cons
           (create-individual-program
            function-set arity-map terminal-set
            remaining-depth false full?)
           (create-arguments-for-function
            (- number-of-arguments 1)
            function-set arity-map terminal-set
            remaining-depth full?)))))
I hope this gives you some insight into how this works. If this isn’t enough, I will look into making my repository publicly available. Using a simplifier sounds interesting. Are there any existing clojure implementations or general rules on simplifying regular expressions?

Markus12:02:55

Also, since this is part of my thesis and you already did help me a lot with improving my regular expressions, I would love to give you some credit for the time you are spending on this. If you want to be mentioned in the final project, please tell me how I should credit you in a pm. Thanks again for everything @UK0810AQ2!

Markus12:02:22

Oh, and what I forgot to share is the code that generates the regular expressions from a tree:

(defn group-node
  "Inserts round braces as the leftmost and rightmost sibling.
  Return position is the sibling after the opening brace!"
  [node]
  (when (not (zip/branch? node))
    (-> node
        zip/rightmost (zip/insert-right ")")
        zip/leftmost (zip/insert-left "(?:"))))

(defn tree->regex-str
  [tree]
  (loop [node (zip/vector-zip tree)]
    (if (zip/end? node)
      (-> node zip/root flatten str/join)
      (if (zip/branch? node)
        (recur (zip/next node))
        (let [value (zip/node node)]
          (if (string? value)
            (recur (zip/next node))
            (case value
              :& (-> node group-node zip/remove recur)
              :| (-> node group-node zip/remove zip/right (zip/insert-right "|") zip/leftmost recur)
              (-> node group-node zip/remove zip/rightmost (zip/insert-left (name value)) zip/leftmost recur))))))))

Ben Sless12:02:37

For simplification you can use an algebraic simplifier if your rewrite alternation as + and concatenation as multiplication

Markus13:02:49

Could you elaborate this a bit with an example? I don’t quite get it right now

Ben Sless13:02:44

(aa|ab) -> (a*a) + (a*b) -> a(a*b)
This sort of thing

Ben Sless13:02:27

So if you translate a regular expression to algebra you can use algebraic simplifiers

Markus13:02:36

Ah, thats actually great! I will look into that! Is there anything else I can do with my existing code to improve performance a bit?

Ben Sless13:02:37

I should profile it first, or you could share the latest flame graph I've been looking at it on my phone until now 🙂

Markus13:02:54

This is the flame graph of a single generation run of the program with 40 individuals and 200 words evaluated for each individual

Michael W21:02:18

What's the difference between a binding and a normal let?

Alex Miller (Clojure team)21:02:01

Most importantly, bindings are global and lets are local

Michael W21:02:32

If I bind a value I can rebind it later?

hiredman21:02:31

Binding is a thing you can do to vars that are marked as dynamic. Vars are (almost always) interned in namespaces and hold top level definitions

hiredman21:02:34

Let binds a name to a value in local static scope, and does not create a var or intern anything

kennytilton21:02:09

You can bind a dynamic var anywhere, and within the binding's dynamic scope -- the body, and any fuunctions called from the body -- the bound value will be seen. If within -- within, not later -- the binding scope, we bind the same var again, we have a "stack" thing going on: within this second scope, again any called function will see the new value, but as soon as the nested binding form exits, the original value pops back in as the var's value. This is really great for not having to forever pass around a parameter such as a DB connection. In mutating languages, btw, we have to:

(let [save-db *DB*]
   (set! *DB* (the-db-to-use))
   ...do our thing...
   (set! *DB* save-db))

kennytilton21:02:10

A "later" second binding is fine.

Michael W22:02:54

I understand now, thanks.

kennytilton22:02:37

I was too slow! Anyway...

(def ^:dynamic *depth* 0)
(defn logger [tag]
  (prn tag *depth*))
(comment
  (do
    (logger :pre-binding)
    (binding [*depth* (inc *depth*)]
      (logger :in-first-binding)
      (binding [*depth* 42]
        (logger :in-nested-42-binding))
      (logger :after-nested-42))
    (binding [*depth* :later]
      (logger :in-later-binding))
    (logger :exiting)))

:pre-binding 0
:in-first-binding 1
:in-nested-42-binding 42
:after-nested-42 1
:in-later-binding :later
:exiting 0

Markus21:02:03
replied to a thread:I have a question regarding optimization. The following function calculating the f1-score is painfully slow sometimes, reaching values of `280000ms` when evaluating with `time`, but I am unable to pinpoint the problem. (defn f1-score [word-map func] (let [{:keys [valid-words invalid-words]} word-map valid-func (partial func :true-pos :false-neg) {true-pos :true-pos false-neg :false-neg} (reduce valid-func {:true-pos 0 :false-neg 0} valid-words) invalid-func (partial func :false-pos :true-neg) {false-pos :false-pos} (reduce invalid-func {:false-pos 0 :true-neg 0} invalid-words) ;; it is important to catch a division by 0 precision (if (&gt; (+ true-pos false-pos) 0) (/ true-pos (+ true-pos false-pos)) 0) recall (if (&gt; (+ true-pos false-neg) 0) (/ true-pos (+ true-pos false-neg)) 0)] (if (&gt; (+ precision recall) 0) (* 2 (/ (* precision recall) (+ precision recall))) 0))) In my case `func` is given like this: (defn regex-reducer "Updates the counter of a map corresponding to value." [k1 k2 map value] (if value (assoc map k1 (inc (map k1))) (assoc map k2 (inc (map k2))))) and `word-map` looks like this for `valid-words` and `invalid-words`: (nil nil nil "aa" "abab" nil nil "abaaa" ...) The word-map initially contains randomly generated words, which are then matched with different regex strings. The code above calculates the f1-score for each regex. Has someone got an idea what might be the issue? Im open to adding more details and/or code if needed.

Well, I tried to profile my code and it now kinda shows me, that not the code I provided earlier is the issue, but rather matching all words in word-map… I am currently using this function to match on all strings with a given regex like (a*)|b:

(defn find-matches
  "Returns a list of the full match or nil for every word in string-list matched by regex."
  [regex-str string-list]
  (map #(-> regex-str re-pattern (re-matches %) (nth 0 nil)) string-list))
What would be the most efficient way to do this?