Fork me on GitHub
#clojure
<
2022-07-16
>
vlaaad05:07:25

I don't know if it's useful, but yesterday I came up with an idea of how to stick tagged literals into json: treat all maps with a single key that starts with "#" as a tagged literal. E.g. {"#date": "2022-07-16"}

Ben Sless06:07:14

Parsing structurally encoded json is a pain. Not impossible but very annoying 😞

wevrem18:07:15

Idiomatic way to transform an empty string to nil, but leave a non-empty string as is? Right now I have: (when (not-empty s) s) But something feels off about that.

p-himik18:07:58

Just use (not-empty s).

wevrem18:07:54

I love it when the answer is simple but I also learn something from it.

wevrem18:07:03

I answered my own question. (not-empty s) I sort of knew about not-empty without knowing the details and was using it as if it were a predicate returning true or false. But in looking it up I learned it is not a predicate but returns a transformed nil or the original collection. That probably explains why the name doesn’t end with ? and why my original use felt somehow off.

πŸ‘ 2
borkdude18:07:01

Alex once recommended to me to use str/blank? instead to not coerce the string into a seq

p-himik18:07:02

not-empty does not coerce its argument.

p-himik18:07:38

Or rather, it doesn't coerce it to produce the result - only to check the emptiness. And blank? has different semantics.

borkdude18:07:08

yes, but not-empty does transform the string into a seq for the seq check

p-himik18:07:23

Right, that's what I meant in my second message. But it's just a construction of a wrapper object for a non-empty string - not a big deal.

borkdude18:07:14

Just sharing that Alex once recommended that to me, that's all ;)

p-himik18:07:46

Of course. It's just that one has to be really careful with the semantics here - much more careful than with a construction of a throwaway object (that on modern JVM seems to be a non-issue at all).

user=> (str/blank? "      ")
true

borkdude18:07:01

Sure, I wasn't saying that this is the answer to a question, just a comment that you might not want to create seq garbage from strings if you don't have to. May or may not be important.

πŸ‘ 1
borkdude18:07:40

But now that I'm reading back to the question, you're right that this is not what wevrem was after - I guess it was a knee-jerk reaction to: oh someone's using seq on a string again ;)

p-himik18:07:52

One man's garbage is another man's treasure. :) Clojure calls seq everywhere - I wouldn't even blink at someone calling it over a string, to be honest. It's doesn't actually transform a string, it doesn't traverse it. It just inspects its length, exactly once. If something is slow - one should profile it. I'm willing to bet that, unless there's a tight loop doing barely anything than checking string non-emptiness, seq will not come even close to the top of the "self time" on the profiler report.

borkdude18:07:14

Yeah, I agree with that, but since Alex said this to me, I can't unsee it ;)

πŸ˜„ 1
Cora (she/her)22:07:45

has anyone tried putting together a formal spec for which characters are allowed where in the language? for example, which are valid in keywords, which are valid in symbols, etc

Bob B23:07:58

I'm not sure if this is sufficient, but the LispReader class in clojure.lang has a symbolPat regex that looks like it might be used to read a symbol or keyword - <https://github.com/clojure/clojure/blob/5ffe3833508495ca7c635d47ad7a1c8b820eab76/src/jvm/clojure/lang/LispReader.java#L66>

Alex Miller (Clojure team)23:07:57

This is intentionally not formalized. There are characters that are explicitly allowed and a few things explicitly disallowed, and a large intentionally ambiguous area for future expansion

Alex Miller (Clojure team)23:07:29

Which is not to say you can't tokenize it (obviously the reader does)

Cora (she/her)23:07:36

the thing is that I'm going to be tokenizing documentation that may have clojure symbols sprinkled within it and so tokenizing it as part of the full grammar isn't really possible

Cora (she/her)23:07:18

and so having an idea of what is allowed would let me guess if a stretch of text is a valid symbol or not

Alex Miller (Clojure team)23:07:38

Symbols are intended to allow a pretty wide set of allowable things

Cora (she/her)23:07:08

sure seems like it from that pattern

Cora (she/her)23:07:49

doesn't a large ambiguous area mean that future expansion will likely break backwards compat for code in the wild?

Cora (she/her)23:07:31

just trying to understand things. this is super helpful πŸ™‚

Bob B23:07:31

It would depend on what the expansion entails, wouldn't it? If it's expansion of disallowed, yeah, but if it's expansion of explicit allows, then it's "requiring less".

Cora (she/her)23:07:08

well from the sounds of it the ambiguous area is where things will be expanded, giving certain characters new meaning, which in turn may change the meaning of your code

Cora (she/her)23:07:40

if that's what you mean

Cora (she/her)23:07:50

sorry, I'm not 100% sure what you meant there exactly

Alex Miller (Clojure team)23:07:43

There are probably not a lot of those kinds of things, but things like | for delimiting (similar to Common Lisp) is one thing we've looked at a couple times

Alex Miller (Clojure team)23:07:48

If you want to do what Clojure does, then certainly follow LispReader (which has rarely changed)

Cora (she/her)23:07:16

cool, thanks alex πŸ™‚

Bob B23:07:12

fwiw, I just meant that if, for example, the pipe symbol went from ambiguous/unspecified to explicitly disallowed, that'd be a breaking change, but if it went to explicitly allowed, that'd be a non-breaking change

Alex Miller (Clojure team)23:07:51

Yes, which is why any such change would only be made with a lot of thinking and early notice

Cora (she/her)22:07:22

I ask because I want to change tokenizing on cljdoc's docset search so that it can find valid symbols and keywords and such