beginners

Nim Sadeh 2025-05-11T03:29:18.512559Z

I have a philosophical question about the Clojure collections API. It's often used as an example of polymorphism in clojure - lots of data structures respond to functions such as map conj etc. In practice, I find it to be brittle and leaky. Given that it is held in such high regard, I figured I must be using it wrong. I'll use conj as an example but I have many: conj behaves differently if you pass a vector vs a lazyseq. The former adds to the end of the collection, and the latter to the beginning. Different transforms can take one type of collection and transform it into the other, so you have to really know what you're doing at every step or you'll get unexpected behavior. You have to know not only what the behavior of the function is, but what collection subtype it will return. I call this leaky because it requires you to know the implementation details. such as: does map preserve the collection type or return a different one? Will conj prepend or append to this collection? In practice I use concat to avoid having to guess. It's almost too unpredictable to be called truly polymorphic; What use is it that it doesn't throw an undefined operation exception when the behavior is unexpected?

Mario C. 2025-05-14T20:25:17.045729Z

I actually think it makes sense because when you are adding an element to a list (unordered) the fastest way would be to prepend it because it would just be a "swapping of pointers" so to speak (Thinking about linked lists) and if you add an element to a vector (ordered/indexed) that fastest method would be to do it at the end because prepending it would mean it would require an update of all indices. At least thats how I think about it

2025-05-14T20:27:19.195119Z

It is "leaky" in that sense. But I think only if you have the expectation that conj insert in a predictable position. If you assume the position is based on the type, then I think it's not as "leaky". It's a matter of semantics a bit.

Mario C. 2025-05-14T20:29:32.760279Z

I always just assume there is no guaranteed order, if order matters then its usually explicit

👍 1
2025-05-14T20:49:59.370069Z

I think for me, it would have been nice to have a tail pointer on List, and then conj could also be O(1) and consistent. But I think then sequence would get weird, since a seq is possibly infinite, you can really only conj on the front, and also you can't have a tail pointer or it would force realize

2025-05-14T21:07:21.160859Z

you can't do that for a persistent linked list

2025-05-14T21:08:39.692539Z

linked lists are the easiest data structure to make persistent, because you get structural sharing of the tail, once you have tail pointers like that it just doesn't work

2025-05-14T21:08:51.408809Z

you have to switch to some other list representation

Nim Sadeh 2025-05-14T21:16:52.307519Z

IMO "adds an element to a list in the fastest way possible" is an unusual contract. If you're working with a list I assume order matters. 90% of the time when I am looking to add an element to a collection, the order matters, and I usually cannot guarantee the concrete type so I assume I can't use conj

2025-05-14T21:18:01.590569Z

the problem you are having is not really with conj it is that you are confusing seqs with collections

➕ 2
2025-05-15T00:23:54.836519Z

> and I usually cannot guarantee the concrete type so I assume I can't use conj I'd say, this is an issue from Rich's perspective. He values highly choosing the proper data-structure for various use-case. That is what drove his design decisions I believe. In fact, I think I saw some place where he mentioned even that he purposely doesn't offer certain conveniences to force you to refactor your data-structure to be more efficient. So I guess he would say, if you are using a list to append, you need to refactor to use a vector. And he made it annoying for you to keep using a list on purpose.

➕ 1
James Amberger 2025-05-15T00:28:25.995909Z

I mean it would be better to have an example to work on but just in general if one can’t know the concrete type of xthen one can’t rely on the semantics of (seq x) , iow the meaning of the “order” of the things in x; you have to reconsider what you are actually doing. That’s why all this is not the price of clojure, but the payoff. Right?

2025-05-15T00:29:10.027849Z

I will admit that it's a trade off, and sometimes I don't care about it being O(n), I still want to do it, and it be nice to have a function like prepend and append that you can use on all collections and just does that (including seqs and does force realize them if needed). I'd also like something that insert at an arbitrary location that also works even when it's O(n). That said, I also understand not wanting too, and kind of frowning upon it, so that it is not used accidentally and than people say Clojure is so slow 😛

James Amberger 2025-05-15T00:31:10.566389Z

Here is a thing that surprised me that I’m putting on this thread because it inspired me to eval it:

(let [A {1 :a 2 :b}
        B {2 :b 1 :a}
        A' (hash-map 1 :a 2 :b)
        B' (hash-map 2 :b 1 :a)]
    (vector
     (= A B)
     (= (seq A) (seq B))
     (= A' B')
     (= (seq A') (seq B'))))
=> [true false true true]

2025-05-15T00:37:52.157489Z

hash maps don't guarantee their order

James Amberger 2025-05-15T00:46:21.920939Z

no, I know

James Amberger 2025-05-15T00:46:43.910489Z

i found it surprising that the order in which one types the map literal has any observable effect at all

James Amberger 2025-05-15T00:48:07.836569Z

and that hash-map acts differently is further surprising. But this kind of hits on what we are talking about: this surprise wouldn’t hurt be because I would never rely on the order of (seq coll) if coll could be a plain hash mpa

2025-05-15T00:59:53.103419Z

My guess, and @nnnsadeh can clarify, is in the case where your function might either be at the start of a chain of transforms, or in the middle. In which case, if you had say:

(->> [1 2 3]
     transform-1
     transform-2)
The transform function might get a vector if it starts the chain, or a seq if it's in the middle or at the end. When it gets a vector, if it wants to "append", it could conj, but when it gets a seq, it cannot. So how does one go about implementing a function like that which is "generic" to the position inside the seq transformation pipeline?

2025-05-15T01:01:32.227179Z

I think, and I admit it's not a practice I've seen mentioned or that I have ever thought of doing myself, but logically, if you are going to implement a new "sequence" function, you should also call seq on your input.

2025-05-15T01:02:03.588449Z

your can't because seqs are built to be interacted with as a whole. you're thinking about the seq as a "collection", which it is not. thread-last functions are for operating on the entirety.

➕ 1
2025-05-15T01:03:48.328039Z

@nbtheduke hum, I feel this is a common thing. You have a series of seq transforms that you use everywhere, you extract it into its own. It's all lazy, they'll combine properly and return a lazy-seq, with nothing evaluated until it is pulled later.

2025-05-15T01:06:17.110169Z

oh sure, i just mean that's at cross purposes from "i care about this piece being at a specific index"

2025-05-15T01:09:00.145709Z

Say you want to prepend to the seq, you can use conj, and it remains a seq? after doing so, it wraps in a cons cell which is a seq? So it works, but one day, someone starts with your function a sequence pipeline, and now it breaks, because it appended.

2025-05-15T01:10:36.576809Z

I think the issue is what @hiredman said. You need to think in terms of seq -> seq. And if you want your function to work on seqable? then you have to call seq on the input first, same as all the clojure.core sequence functions do.

2025-05-15T01:14:36.002289Z

It would be a rare edge-case I guess, where if inside your function you first use any sequence function, they will call seq on the input for you. But if you happen to use conj first, it would not. Because conj is not a sequence function, but is actually polymorphic over sequences. So it does not call seq on the input, but instead has an implementation for various colls and for seq. This is why this isn't normally an issue, and you don't normally have to explicitly call seq on your input I guess. Is that "leaky"? I don't know, definitely something that can trip you up.

2025-05-15T01:17:31.518999Z

I think what @nnnsadeh says, of just using cons and concat are probably better when working with sequences. Though, the Clojure cheatsheet shows conj inside the sequences section, maybe conj is a weird one, being it shows up in other places.

daveliepmann 2025-05-15T05:58:45.556609Z

@nnnsadeh > IMO "adds an element to a list in the fastest way possible" is an unusual contract. Rich addresses this directly: > It is an important aspect of Clojure that, in general, performance guarantees are part of the semantics of functions. In particular, functions are not supported on data structures where they are not performant.

2025-05-11T04:09:39.071989Z

map is not a collection operation

2025-05-11T04:10:10.085969Z

It operates on seqs, which are like a functional version of an iterator

2025-05-11T04:10:58.468399Z

It just happens to call seq on whatever you pass on, and collections produce seqs that are a view over their contents when you call seq on them

2025-05-11T04:12:43.439719Z

https://insideclojure.org/2015/01/02/sequences/ may be a useful resource

Bob B 2025-05-11T04:32:14.745289Z

• behaving differently based on the type of the input is more or less the defintion of polymorphism • map returns a seq • I'd argue that knowing the return type of a function doesn't make it leaky... it seems like the alternative is "this function can return a value of any type", which would seem to imply that there's basically nothing you can do with it at that point... if map could return a number sometimes, then you can't reliably use conj (or basically any other function) on the returned value.

Nim Sadeh 2025-05-11T05:24:59.144869Z

I guess I would expect it to return the same type that was fed into it? If I run a piece of code like (conj (map f coll) item) I'd want the resulting sequence to be the same regardless of the origin of coll , no? Otherwise the caller has to know what type of sequence it can pass to the function to get a certain result, meaning it would always have to know how it's implemented.

2025-05-11T05:34:25.772129Z

No

2025-05-11T05:36:22.582059Z

Like I said map is not a collection function, it is a function of, in the language of the article I linked above, seqable to lazy seq, same for filter, mapcat, take, portion, etc, etc

2025-05-11T05:36:43.929309Z

Those always return lazy seqs

2025-05-11T05:37:22.932939Z

Seq here is like a java Iterator, it is a view that can be walked over to consume elements in order

2025-05-11T05:37:45.140259Z

Seqable is like java Iterable

2025-05-11T05:38:30.449329Z

So map is sort of like a function that takes an Iterable and returns an Iterator

Nim Sadeh 2025-05-11T05:41:39.422349Z

I know what it does, I'm pondering the why Say I write a function

(defn inc-prepend-zero
 "Increments the items and prepends a zero"
  [items]
  (->> (conj coll 0)
       (map inc)))
My docstring is incorrect. I either have to specify that this only works on a subset of sequences, as it behaves differently for a vector, and hope that the caller reads the docstring (as the function won't throw unless I force it to). Even worse, if I switch the order of the computations, I get a different result

2025-05-11T05:43:54.095529Z

cons is for prepending to a seq

2025-05-11T05:44:40.209389Z

And you already have plenty of other bugs in that code to worry about, it actually prepends and then increments, and is missing passing in the collection

2025-05-11T05:57:46.025469Z

If you insert your own call to seq instead of waiting for map to do it, then you will get uniform behavior from conj

yuhan 2025-05-11T06:06:46.799199Z

You may be interested in the operation conventionally called fmap - not in Clojure core but you can find implementations in external libs like funcool.cats or algo.generic.

yuhan 2025-05-11T06:07:21.429899Z

But note the tradeoffs as mentioned above - by being polymorphic across arbitrary user-extensible collection types you'd be widening the API contract and losing the ability to reason about performance characteristics

yuhan 2025-05-11T06:11:47.456199Z

For many practical applications it turns out that restricting yourself to predictable constructs like 'map' over seqs / coercing to vec before a conj is more useful - at least from my view that's the philosophy being exemplified by the design of clojure's core lib

daveliepmann 2025-05-11T06:20:23.992699Z

> I use concat to avoid having to guess. concat, like map, returns a lazy seq instead of preserving collection type:

(type (concat [:foo] [:bar])) 
;; => clojure.lang.LazySeq

(type (map identity [:bar]))
;; => clojure.lang.LazySeq

daveliepmann 2025-05-11T06:37:54.007619Z

Suffice to say you're not alone in finding the seq API and conj behavior odd at first, though they are well regarded and IMO quite nice once you're used to them.

daveliepmann 2025-05-11T06:48:07.090539Z

Oh, and about "What use is it that it doesn't throw an undefined operation exception when the behavior is unexpected?": Rich on https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#shouldnt-nth-nil-1-throw-outofbound-exception`nil`https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#shouldnt-nth-nil-1-throw-outofbound-exception

2025-05-11T12:46:11.151429Z

i think a pertinent point here is that vectors are not considered seqs. they're collections (like maps and sets). i've never heard anyone object to conj having different behavior on maps than on seqs, but of course they do.

2025-05-11T13:01:45.585799Z

because of their similarity to seqs and the necessity to operate on them as a seq without losing their concrete type, there's mapv etc, but i think of those how i think of clojure.set functions

James Amberger 2025-05-11T17:30:30.792219Z

I have less than two years with clojure and here are my 2¢. One source of confusion is that the seq functions call seq on their argument for you. One could imagine a clojure where, e.g., (map f coll) throws where (map f (seq coll)) succeeds. This would be enlightening for the new and inconvenient for the not-new. These functions take the seq as a second argument, whereas conj, into, etc. take the coll first, which is usually the, uh, “polymorphic position,” i.e. dispatch happens on the type of that first argument. In terms of @nnnsadeh’s OP, I would say map is not itself polymorphic: it calls seqwhich is. If a beginner were interested in my advice I would say use into with transducers when I have a coll and want a coll back, and be explicit, either in my head or even right there in the code, about where I am calling seq on something.

Ludger Solbach 2025-05-11T20:57:45.075959Z

As James said, If I want a concrete collection type to work with after calling functions working on seqs (and returning lazy seqs), I use into with the concrete collection type. If I want a vector as a result of map or filter, I use mapv or filterv instead. That way I know how to handle the result and proceed in my code.

2025-05-11T23:08:44.102369Z

You're clubbing different functions together here. Some of them are polymorphic, like conj, others are not, like map and concat. The latter two will coerce the input to a sequence and return one.

2025-05-11T23:16:51.724299Z

I just want to make that clear, because map, filter, and so on are NOT part of the collections API. They are part of the sequence API. This distinction is very very important to understand. Collections are polymorphic, meaning the same function has specialized implementations for more than one type of collection. assoc works on PersistentHashMap, PersistentArrayMap, PersistentTreeMap, all records, TransientHashMap, PersistentVector, and all custom types from libraries or your own implementing Associative.

2025-05-11T23:19:39.875019Z

The sequence API is NOT polymorphic!! Everything works only on sequences. But the functions will attempt to coerce the input to a sequence if it is possible to do so.

2025-05-11T23:24:50.388959Z

As for conj , I think your issue is that you think of it as append or prepend, but it's simply called conj for conjoin. Which means join or combine. All it does is combine the element into the collection. There is no mention of where it goes. It simply says, add an element to a collection in ~O(1) time (effectively O(1), it's not true O(1))

2025-05-11T23:39:58.939009Z

For example, you can conj on a set or a map:

(conj {} {:a 1})
(conj #{} 1)
where the idea of putting in somewhere like the start or the end doesn't even make sense.

2025-05-11T10:19:23.518749Z

I'm trying out clojure again after a long time and wrote a basic json parser. Would love some feedback on code style and idioms and any other feedback you folks have: https://github.com/mkp7/clojure-json-parser Cheers ✌️

2025-05-12T16:31:30.483249Z

the first thing that stands out is the usage of deeply nested if instead of cond or case . on a more stylistic note, I see a tendency to put things on one line (eg. the entire defn of parse-null on one line, or if branches that return nil on the same line as the condition). IMHO that makes the code harder to read, and I don't see a benefit of using fewer lines.

2025-05-12T16:34:55.141499Z

(if (some-condition?) nil
 (run-some-code))
can be re-written as
(when-not (some-condition?)
  (run-some-code))
every form in clojure has a return value

✅ 1
2025-05-12T16:39:00.786289Z

(if (nil? key-value-match) nil
              (recur (into data (get key-value-match 0)) (get key-value-match 1)))
can be written as
(when key-value-match
   (recur (into data (key-value-match 0)) (key-value-match 1)))
hash-maps, vectors, and sets can be used as a functions

2025-05-12T16:42:49.092719Z

also I would reformat the recur as

(recur (into data ...)
       (key-value-match ...))
whitespace is there for human readers, and for human readers newlines are cheap and long lines are expensive (by which I mean, reading difficulty increases with long lines, requiring more effort from the reader, and increasing the likelihood of errors)

✅ 1
2025-05-14T17:04:29.892799Z

it's not idiomatic to use ->> to thread the body into a conditional or let form - clearly it works but I think it obfuscates control flow and binding scope

👀 1
2025-05-15T01:45:38.515679Z

That could be. I personally don't mind reading the ->> form bottom-up. What do you think about this macro:

(defmacro --> [& form]
  (conj (reverse form) '->>))

;; example using -->
(-->
  (let [user-name "John"])
  (when user-name)
  (let [user-items [1 2 3]])
  (when user-items)
  {user-name user-name
   user-items user-items})
Compare to this:
;; example using ->>
(->>
  {user-name user-name,
   user-items user-items}
  (when user-items)
  (let [user-items [1 2 3]])
  (when user-name)
  (let [user-name "John"]))
clj-kondo also supports core and other macros so it still lints scope correctly. I was also able to configure clj-kondo for this custom macro which also works perfectly.

2025-05-15T02:02:50.221319Z

Yeah, don't

2025-05-15T02:39:04.341569Z

Why not? • It's quit readable (I believe). • Avoids deep nesting. • clj-kondo lints the scopes correctly.

2025-05-15T02:42:20.117959Z

No one else will want to read it

➕ 2
2025-05-15T02:47:54.108629Z

Fair enough. Without it, its readable enough for small enough code:

(let [user-name "John"]
  (when user-name
    (let [user-items [1 2 3]]
      (when user-items
        {user-name user-name
         user-items user-items}))))
But for larger code blocks macros like this may make code easier to follow. Just a thought. Anyways, its an interesting experiment.

2025-05-15T02:53:13.160889Z

The correct fix in that example case is to just remove the when's and collapse the lets

2025-05-15T03:16:49.753369Z

You mean like this?

(let [user-name "John"
      user-items (when user-name [1 2 3])
      user-data (when user-items
                  {user-name user-name
                   user-items user-items})]
  user-data)
I'm also considering for possible nil values. This also looks pretty neat. Thanks!

2025-05-15T03:25:35.997089Z

It really depends, and is hard to say with the given example, because I the obvious way to improve it is to remove the when's, because in the given example it is immediately obvious that they don't do anything

phronmophobic 2025-05-15T03:26:48.042979Z

Depending on the use case, using cond-> or merge are common for these types of things:

;; using cond->
(let [user-name "John"
      user-items [1 2 3]]
  (cond-> {}
    user-name (assoc user-name user-name)
    user-items (assoc user-items user-items)))
;; using merge
(merge
 {}
 (when-let [user-name "John"]
   {user-name user-name})
 (when-let [user-items [1 2 3]]
   {user-items user-items}))

✅ 1
2025-05-15T03:27:50.703769Z

Those are better

2025-05-15T03:28:33.557869Z

The real problem with the original code isn't nesting, it is interleaving binding and condition checking

2025-05-15T03:29:03.139699Z

And the --> version still does that, just in a whacky way

😂 1
2025-05-15T03:54:59.335209Z

My bad with the previous example. This is a very rudimentary example to emulate possible nil values:

(defn get-user [user-name]
  (when (= user-name "John")
    {:user-name "John" :other-details {}}))

(defn get-user-items [user-name]
  (when (= user-name "John")
    {:user-items [1 2 3] :other-details {}}))

(let [user-record (get-user "John")
      user-items-record (get-user-items (get user-record :user-name))
      user-data (when user-items-record
                  {:user-name (get user-record :user-name)
                   :user-items (get user-items-record :user-items)})]
  user-data)
;; {:user-name "John", :user-items [1 2 3]}

(let [user-record (get-user "JohnD")
      user-items-record (get-user-items (get user-record :user-name))
      user-data (when user-items-record
                  {:user-name (get user-record :user-name)
                   :user-items (get user-items-record :user-items)})]
  user-data)
;; nil
using cond-> or merge would still give me empty {} in case of nil values.
;; using cond->
(let [user-name (get-user "JohnD")
      user-items (get-user-items "JohnD")]
  (cond-> {}
    user-name (assoc :user-name user-name)
    user-items (assoc :user-items user-items)))
;; {}

;; using merge
(merge
  {}
  (when-let [user-name (get-user "JohnD")]
    {:user-name user-name})
  (when-let [user-items (get-user-items "JohnD")]
    {:user-items user-items}))
;; {}

2025-05-15T04:15:31.084619Z

Start with nil in the cond->

🙌 1
2025-05-15T04:19:59.667619Z

Thanks! That also works.

phronmophobic 2025-05-15T05:35:08.402089Z

As a side note, you usually want to avoid caring about whether you have a nil or an empty map.

2025-05-13T18:47:22.474609Z

Thanks a lot, refactored the code following your suggestions. Looks much more neat and readable.

2025-05-13T21:15:29.664559Z

Also it's quite refreshing to refactor the code using -> and ->> macros, it removed fair bit of deep nesting improving the readability.