Fork me on GitHub
#clojure
<
2024-04-23
>
pez06:04:52

The Calva Clojure “parser” is struggling with ignore markers. Because I am struggling with understanding them. The structural editor in Calva treats ignore markers as forms, which is not ideal at all, especially since it not consistently applied. Trying to fix this I want help in calibrating my understanding. Let me start with how I understand reader tags in general • Reader tags apply to the next form. #rt form1 form2 makes #rt form1 and form2 different “units”. Calva’s move-by-sexp commands will see these units. • Reade tags “stack” such that several adjacent tags will apply to the next form. #rt #rt #rt2 form1 form2, will make the units be #rt #rt #rt2 form1 and form2. Ignore markers/tags: • Ignore tags are a bit like reader tags in that they apply to the next form, #_ form1 form2 creates the “units” #_ form1 and #form2. • Ignore tags “stack” such that each adjacent tag will apply to the next “unit”, #_ #_ form1 form2 then forms a unit, with some kind of “sub-unit” of #_ form1 I hope this makes sense. From here I have questions, but I’d first like to calibrate wether this understanding is adequate and complete enough. Am I missing something? Is there a different “language” I should use to help my understanding?

chucklehead07:04:26

not sure where it falls in your classification, but one thing I learned/noticed today is that #_ thing/a is fine and #_ thing is fine, but #_ thing/ is not fine.

pez07:04:45

Is it because thing/ is not fine? From a Calva structural editing perspective all would be fine, I’m pretty sure.

chucklehead07:04:38

yeah, it makes sense that the / terminates the symbol so the ignore tag doesn't apply to it unless it is a proper namespaced symbol, but there is a lingering part of my mental model that just thinks of commenting something out when I use the tag, and that part was surprised.

pez08:04:53

I think the intention by the user most often will be to ignore out the thing/, so that’s why I think a structural editor should treat it like that. Then the compiler will tell the truth about the code. And possibly the linter can help with that too. Haven’t checked what clj-kondo has to say about it.

phill10:04:21

Ignore is such fun. I sometimes wish the ctrl -> key treated #_ form1 as one unit and hopped over the whole thing. I suppose ctrl <- should exactly undo ctrl ->. But how could that be? If I ctrl -> from |#_ form1 then I pass form1, but if I ctrl -> from #_|form1 I also wind up at the same spot; so where should ctrl <- take me? At times like this we should look to Emacs, because everything in Emacs is more or less right. Plain -> and <- should move by one-character-at-a-time. I suppose I should press Del twice to remove #_ because it's two characters and Del is not characteristically form-oriented.

pez10:04:39

@U0HG4EHMH now you are asking the questions I am about to ask. 😃 This is where the Calva parser and structural editor struggles. Looking to Emacs is fine, but first I need to understand why Emacs would do it one way and not the other.

Alex Miller (Clojure team)12:04:44

Why is this different than other dispatch characters that require more reads to complete? Like #{ signals the opening of a set, which requires reading n more forms until a ) - #_ is like that but for exactly one more token

Alex Miller (Clojure team)12:04:27

These may nest, just like collections or any thing else

pez12:04:46

Calva doesn’t treat #{ as a dispatch. (In fact there is no concept of dispatch in Calva’s parser.) It treats #{ as the opening of a list which closes with a }. There’s this notion of token types and for this example #{ is an open token and } is a close token. This makes it different in how #_ is handled. But let me see if I can use your input to understand how I could think about the #_. “they may nest”… So it is rather nesting than stacking, I take it? I think that alone will help me.

Alex Miller (Clojure team)13:04:46

Yes, nesting is the right model (which forms a stack of course based on your nesting level)

pez13:04:12

Oh, yes, it forms a stack. I used “stack” above to note that it doesn’t exactly match stack for me for reader tags. I think these work in Calva today, but I’m interested in I could think about these too, since it is some early understanding of mine about them that is encoded in Calva. Does it seem correct how I think about them the way I describe it in TS?

Alex Miller (Clojure team)13:04:50

reader tags are the same thing - a reader tag is a tag and a nested form (which may also be a reader tag etc)

Alex Miller (Clojure team)13:04:53

(#rt (#rt (#rt2 form1))) X(# X(# form1) form2) if those help visualize - the difference here is #_ discards / throws away the thing it reads so if it happens to be in a nested context, the read needs to keep going

pez13:04:52

Funny, I was about to visualize it similarly; #__( #__( form1 ) form2 ) vs #rt( #rt( #rt2( form1 ))) form2

pez13:04:44

> the difference here is #_ discards / throws away the thing it reads so if it happens to be in a nested context, the read needs to keep going :thinking_face:

Alex Miller (Clojure team)13:04:01

this is literally how it's implemented

Alex Miller (Clojure team)13:04:58

and then you can really break your noodle with stuff like #_^{:a 1} [2]

Alex Miller (Clojure team)13:04:30

(hint: metadata is also a nesting reader)

pez13:04:35

A sample of my noodles: 😃

// open parens
toplevel.terminal(
  'open',
  /((?<=(^|[()[\]{}\s,]))['`~@?^]\s*)*(['`~#@?^]*[({"]|['`~@?^]*[[])/,
  (l, m) => ({
    type: 'open',
  })
);

Alex Miller (Clojure team)13:04:41

I guess you could just always think of the discard reader as something that reads two nested forms and then it's not really different

yuhan13:04:56

ooh, thinking of it as 'nesting' does clear up some of my confusion over how these work

#_#_#_#_ a b c d  ~ ((((a) b) c) d)

#_#_#_ a b #_ c d ~ (((a) b) (c) d)

#_#_ a #_#_ b c d ~ ((a) ((b) c) d)

pez13:04:49

Thanks @U064X3EF3, I’ll update my mental model and hopefully will be in a better place once it has simmered a bit.

souenzzo13:04:14

How to work with XML in clojure? One step beyond https://github.com/clojure/data.xml. Receive an XML entity and parse it into clojure entity like {:id ... :name ...}, then transform it back into xml.

Alex Miller (Clojure team)13:04:44

that seems well within data.xml

souenzzo13:04:33

I expected something that went from <a>b</a> to {:a "b"}. data.xml only "parse" it into {:tag :a :content ...}... Yeah I can dig the data. But this is how XML is actually used? Maybe my answer is "XML is not actually used" 👀

p-himik13:04:48

How would you handle <a x="1"><b>2</b><b>3</b></a>? XML is used, a lot.

souenzzo13:04:25

> How would you handle IDK, I'm searching and trying to understand the XML realworld

p-himik13:04:53

But why? What's is your actual goal? XML is much more than what meets the eye when you see some plain documents. There's usually a schema, the order of items matters, tags have namespaces, and so on.

jussi13:04:42

Very simple intro to xml-handling

(ns xml-demo
  (:require 
   [clojure.data.zip.xml :as zip.xml]
   [clojure.xml]
   [clojure.zip]))

(let [root-node (-> input-stream
                    (clojure.xml/parse)
                    (clojure.zip/xml-zip))]
  (prn (zip.xml/xml-> root-node
                      :html
                      :body
                      :div
                      zip.xml/text)))
You can thus operate xml elements as keywords.

cch115:04:36

If your xml is SOAP, I've had some success with https://github.com/xapix-io/paos - but beware that it is not actively maintained as best I can tell.

👀 1
souenzzo15:04:44

@U0698L2BU that is the kind of thing I was looking for.

cch115:04:01

I'm using it in production, but it's a bit scary seeing the age of some of the dependencies.

souenzzo15:04:50

you must be scared with a library released two days ago. use a library that is not receiving is not a issue by itself 😉

cch115:04:53

The scary parts are (a) when the dependency tree conflicts with "up-to-date" dependencies I already have (so far I have worked around the one time that happened) and (b) when the obscure maven repo hosting the underlying java dep goes offline (happened yesterday).

cch115:04:27

I sincerely hope you find it to your liking because the more that use it, the more likely we can help each other when such problems occur.

Thierry11:04:06

The project I maintain has an API for parsing XML inspired by https://www.juxt.pro/blog/xpath-in-transducers/ Our old api was something like: `(defn elements-by-tag [element tag] (filter #(and (map? %) (= (:tag %) tag)) (:content element)))` `(defn element-by-tag [element tag] (first (elements-by-tag element tag)))` `(defn content-of-tag [element tag] (-> (element-by-tag element tag) :content first))` We use the latest clojure.data.xml.

👀 1
cch115:04:26

I like that ^ idea for reading/parsing complex XML. Unfortunately, my use case requires writing XML as well as reading it. I would probably roll a zipper-based writer if I weren't using SOAP and https://github.com/xapix-io/paos did not exist. The paos lib has some nice bonuses as well (derived from the tighter SOAP specs) including reading in the SOAP schema and validating against it. It's definitely got some warts and while it does hide away the Java very nicely, I seem to recall being disappointed in the Clojure code itself -at least by modern standards.

Thierry15:04:27

Why? We write aswell. Do you need to write the same data back as XML after manipulation? Or do you write new structures as XML? We do both

cch115:04:44

I'm generating an XML doc from scratch (well, from the SOAP spec). The validation is pretty reassuring (even though my particular spec is too loose to my taste).

Thierry15:04:44

Here's a simple example:

(let [soap-envelope
      (fn [body headers ns-map]
        [:soapenv:Envelope (merge {"xmlns:soapenv" ""} ns-map)
         (when (seq headers)
           [:soapenv:Header (seq headers)])
         [:soapenv:Body body]])]
  (->
   [:someTag
    [:anotherTag {:attribute "att-value"}]
    [:multipleTag
     [:multiple
      [:tag-one {:att 1} [:valueType "value"]]
      [:tag-two {:att 2} [:valueType true]]]]]
   (soap-envelope [[:my:emptykey]
                   [:my:otherkey "value"]]
                  {"xmlns:my" ""})
   clojure.data.xml.prxml/sexp-as-element
   clojure.data.xml/emit-str))
Where the XML output would be a one line string with escaped double-quotes that would look like this when formatted:
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv=""
  xmlns:my="">
  <soapenv:Header>
    <my:emptykey/>
    <my:otherkey>value</my:otherkey>
  </soapenv:Header>
  <soapenv:Body>
    <someTag>
      <anotherTag attribute="att-value"/>
      <multipleTag>
        <multiple>
          <tag-one att="1">
            <valueType>value</valueType>
          </tag-one>
          <tag-two att="2">
            <va/lueType>true</valueType>
          </tag-two>
        </multiple>
      </multipleTag>
    </someTag>
  </soapenv:Body>
</soapenv:Envelope>

Thierry15:04:06

That's one way we emit XML

Thierry15:04:14

There are others too

cch115:04:59

I'm going to file this conversation away for the rainy day when the paos lib is no longer tenable due to lack of updates.

Thierry15:04:05

If you want to output pretty printed (formatted/ indented) XML into a file you can use this:

(defn ppxml
  "Function to format an XML string representation to an indented XML string
   credits: "
  [xml]
  (let [in (javax.xml.transform.stream.StreamSource.
            (java.io.StringReader. xml))
        writer (java.io.StringWriter.)
        out (javax.xml.transform.stream.StreamResult. writer)
        transformer (.newTransformer
                     (javax.xml.transform.TransformerFactory/newInstance))]
    (.setOutputProperty transformer
                        javax.xml.transform.OutputKeys/INDENT "yes")
    (.setOutputProperty transformer
                        javax.xml.transform.OutputKeys/STANDALONE "yes")  ;; Added to fix non linebreak between xml tag and first content tag
    (.setOutputProperty transformer
                        javax.xml.transform.OutputKeys/OMIT_XML_DECLARATION "no")  ;; Added to fix non linebreak between xml tag and first content tag
    (.setOutputProperty transformer
                        "{" "2")
    (.setOutputProperty transformer
                        javax.xml.transform.OutputKeys/METHOD "xml")
    (.transform transformer in out)
    (-> out .getWriter .toString)))

cch115:04:19

The cheapest way to emit XML that I have used is simple templating. My current use case is too complex for that and your approach seems like a nice middle ground.

Thierry15:04:15

We handle communication with different apis and standards and everything is Clojure

Thierry15:04:30

Haven't run into any limitations

✔️ 1
cch115:04:12

Is anybody familiar with a library accessible from CLojure that allows one to deal with an algebra of time spans? I'm looking for testing relationships like spans, overlaps, starts-with, contains, etc. I vaguely recall reading about such a thing in the Clojure world, but it has been a while.

cch115:04:07

I did review tick -and it's pretty good. But the lib I'm thinking of went much further.

jjttjj15:04:49

There's also these for interval trees: https://github.com/dco-dev/interval-tree https://github.com/helins/interval.cljc But they're a little different, and your description seems to more closely match what tick implements (Allen's interval algebra)

cch115:04:39

That's what I'm looking for! Thanks!

jjttjj15:04:12

No problem 🙂

Max17:04:21

Be careful with helins/interval, it has some fairly nasty bugs. It’s possible to work around them if you’re careful, but definitely read the issues first so you know what they are.

cch117:04:06

Funny enough, I was just reading through the issues on that repo. And indeed, there were some scary bugs working with sets. Which it’s so happens is my case.

Max03:04:54

There’s also https://github.com/dco-dev/interval-tree, which supports fewer operations (eg no subseq). Guava also has a few range set/tree set classes (some are even immutable!), and someone posted in #announcements recently about a soft fork that lets you only pull in the bits you need. There are other problems with the guava impls though (I forget the details) It would be really nice if someone could put out a decent Clojure interval set/map library, but at the moment the options aren’t great.

emccue03:04:35

ey thats me!

emccue04:04:09

let me make a demo

emccue04:04:36

(ns main
  (:import (dev.mccue.guava.collect Range)))


(def r1 (Range/closed 0 10))

(println (.contains r1 4))
(println (.contains r1 11))

(println (.encloses r1 (Range/lessThan 10)))
(println (.encloses r1 (Range/closed 1 5)))

(println (.intersection r1 (Range/greaterThan 4)))
(println (-> r1 
             (.span (Range/greaterThan 4))
             (.span (Range/lessThan 1))))
deps.edn:
{:deps {dev.mccue/guava-collect {:mvn/version "33.1.0"}}}

emccue04:04:58

i don't know what to look for exactly or if it has all the methods you want, but it does work pretty well from clojure

emccue04:04:22

and it is an immutable type

Max04:04:06

Iirc the issue was with the immutable range set, it makes you go though a separate builder class to get a new modified set, and if you try to add overlapping ranges it throws an exception

jpalharini17:04:02

I'd like some input on the possible performance impact of using macros. I'm not that well-versed on them and usually prefer not to use, but they've been proving quite useful in recent projects involving Java Interop. Out of curiosity I decided to compare the performance of a function that calls a macro vs. a function that implements the macro code. I first had a few issues making sure type-hints were functioning properly, which would definitely impact performance due to use of reflection. However, (set! *warn-on-reflection* true) no longer complains and I still see a 5-6% difference against the macro-calling function. • Is it expected for macros to have a performance penalty? • Am I setting type-hints incorrectly at some point? • Or perhaps, am I measuring things wrong? The code in question interacts with Redis, sending the same command of adding a string key-value pair to a database in my local network.

kennytilton17:04:51

We might need some code. 🙂

Alex Miller (Clojure team)17:04:00

you will probably need to be more specific on the code to really answer. macros are invoked (once) during compilation so there is really not a performance impact from macros themselves other than during compilation. the real question is in what the function does vs what the expansion does

jpalharini17:04:38

Perhaps I should work on a simpler repro, and maybe I will. But what I have is the following: • A component (defrecord) that implements a custom protocol. • This protocol has two functions (set) and (set-nx), but they were written in a way that they do the same thing just for this test. This is the macro:

(defmacro with-conn
  [pool redis-command & args]
  `(let [conn# (.getResource ~(with-meta pool {:tag `JedisPool}))]
     (try
       (. ^Jedis conn# ~(symbol redis-command) ~@args)
       (finally
         (.close ^Jedis conn#)))))
This is the function calling it:
(set [_ k v]
    (with-conn pool "set" ^String k ^String v))
And this is the function that does the same thing without the macro:
(set-nx [_ k v]
   (let [conn (.getResource ^JedisPool pool)]
     (try
       (.set ^Jedis conn ^String k ^String v)
       (finally
         (.close ^Jedis conn)))))

Alex Miller (Clojure team)17:04:39

type hints in macros can be tricky - you generally need to emit the code that applies the meta, not rely on ^ to apply metadata

jpalharini17:04:23

There is the symbol resolution of the Java method on the macro which could complicate things.

kennytilton17:04:36

Macros are gone at run time. This can give them a performance advantage, if they do clever things. Edi Weitz outdid Perl regex with his CL implementation.

Alex Miller (Clojure team)17:04:47

something like this:

(binding [*print-meta* true]
  (pprint
    (macroexpand
      '(with-conn pool "set" ^String k ^String v))))
may be closer to what the post-compile is doing

jpalharini17:04:49

Also, tests were done in the REPL, so maybe this will change after compilation.

Alex Miller (Clojure team)17:04:08

the REPL is still compiling (in most cases)

Alex Miller (Clojure team)17:04:23

I don't think those type hints in the with-conn call are doing anything there

jpalharini17:04:40

(let*
 [conn__4051__auto__
  (.getResource ^redis.clients.jedis.JedisPool pool)]
 (try
  (.
   ^redis.clients.jedis.Jedis conn__4051__auto__
   set
   ^String k
   ^String v)
  (finally (.close ^redis.clients.jedis.Jedis conn__4051__auto__))))
This is what I get from that command, @U064X3EF3. I think they may be getting passed on?

Alex Miller (Clojure team)17:04:18

yeah, seeing them there looks good

jpalharini17:04:25

Also, this is the test harness I put together:

(loop [times-macro []
       times-plain []]
  (letfn [(done? [] (> (count times-macro) 10000))]
    (if (done?)
      [(/ (apply + times-macro)
          (double (count times-macro)))
       (/ (apply + times-plain)
          (double (count times-plain)))]
      (let [[k1 k2] [(rand-str) (rand-str)]
            [v1 v2] [(rand-str) (rand-str)]
            start-macro (System/nanoTime)
            _           (proto/set redis k1 v1)
            end-macro   (System/nanoTime)
            start-plain (System/nanoTime)
            _           (proto/set-nx redis k2 v2)
            end-plain   (System/nanoTime)]
        (recur (conj times-macro (- end-macro start-macro))
               (conj times-plain (- end-plain start-plain)))))))

Alex Miller (Clojure team)17:04:00

as a secondary check, you may also consider using something like https://github.com/clojure-goes-fast/clj-java-decompiler to decompile or disassemble the call and ensure you are seeing the expected invocation (and not a call to Reflector)

jpalharini17:04:47

I'll check that out. I saw that tool in your talk about Java Interop but never had the chance to play around with it.

Alex Miller (Clojure team)17:04:00

measuring individual calls is rarely going to tell you an answer you should believe. would be much better to run the full loop for each independently and measure the whole loop timing (and then run that in an outer loop say 10-20 times). you should see timings come down as the C2 compiler kicks in and then stabilize. and if you don't see that, you should not believe what you're seeing.

Alex Miller (Clojure team)17:04:35

I usually aim for the inner loop to be big enough that it's measured in milliseconds or seconds. I have a high level of trust that clock timings in that region are well above the clock granularity and things like jvm safepoints are unlikely confouding. and short enough that gc is an obvious ignoreable outlier when it occurs.

jpalharini17:04:37

Indeed, it stabilized

Alex Miller (Clojure team)17:04:01

and are timings same or different?

jpalharini17:04:07

Negligible difference between macro and "plain call"

👍 1
jpalharini17:04:38

So I guess my measurement was the thing at fault

jpalharini17:04:08

Thanks for all the input, folks! 💚

Alex Miller (Clojure team)17:04:12

when you move to 1.12, you may want to consider building a param-tagged qualified symbol for invocation rather than using the dot form in the macro as those will enforce resolution to a non-reflective method (and error otherwise, which you would want here I think). and depending how you conveyed that info, you won't care about the arg type hints anymore. what you will want to emit is something like: (^[String String] redis.clients.jedis.Jedis/.set ...) you probably want a macro api that took a symbol then to have a place to convey the param-tags

jpalharini17:04:49

I'll look into that. I actually tried passing a fully-qualified method and learned it wasn't possible. Good to know it will be in 1.12.

Alex Miller (Clojure team)18:04:20

Some of this is not yet available in the alphas so can’t look into it yet, but real soon