I have a philosophical question about the Clojure collections API.
It's often used as an example of polymorphism in clojure - lots of data structures respond to functions such as map conj etc.
In practice, I find it to be brittle and leaky. Given that it is held in such high regard, I figured I must be using it wrong.
I'll use conj as an example but I have many:
conj behaves differently if you pass a vector vs a lazyseq. The former adds to the end of the collection, and the latter to the beginning. Different transforms can take one type of collection and transform it into the other, so you have to really know what you're doing at every step or you'll get unexpected behavior. You have to know not only what the behavior of the function is, but what collection subtype it will return. I call this leaky because it requires you to know the implementation details. such as: does map preserve the collection type or return a different one? Will conj prepend or append to this collection? In practice I use concat to avoid having to guess. It's almost too unpredictable to be called truly polymorphic; What use is it that it doesn't throw an undefined operation exception when the behavior is unexpected?
I actually think it makes sense because when you are adding an element to a list (unordered) the fastest way would be to prepend it because it would just be a "swapping of pointers" so to speak (Thinking about linked lists) and if you add an element to a vector (ordered/indexed) that fastest method would be to do it at the end because prepending it would mean it would require an update of all indices. At least thats how I think about it
It is "leaky" in that sense. But I think only if you have the expectation that conj insert in a predictable position. If you assume the position is based on the type, then I think it's not as "leaky". It's a matter of semantics a bit.
I always just assume there is no guaranteed order, if order matters then its usually explicit
I think for me, it would have been nice to have a tail pointer on List, and then conj could also be O(1) and consistent. But I think then sequence would get weird, since a seq is possibly infinite, you can really only conj on the front, and also you can't have a tail pointer or it would force realize
you can't do that for a persistent linked list
linked lists are the easiest data structure to make persistent, because you get structural sharing of the tail, once you have tail pointers like that it just doesn't work
you have to switch to some other list representation
IMO "adds an element to a list in the fastest way possible" is an unusual contract. If you're working with a list I assume order matters. 90% of the time when I am looking to add an element to a collection, the order matters, and I usually cannot guarantee the concrete type so I assume I can't use conj
the problem you are having is not really with conj it is that you are confusing seqs with collections
> and I usually cannot guarantee the concrete type so I assume I can't use conj I'd say, this is an issue from Rich's perspective. He values highly choosing the proper data-structure for various use-case. That is what drove his design decisions I believe. In fact, I think I saw some place where he mentioned even that he purposely doesn't offer certain conveniences to force you to refactor your data-structure to be more efficient. So I guess he would say, if you are using a list to append, you need to refactor to use a vector. And he made it annoying for you to keep using a list on purpose.
I mean it would be better to have an example to work on but just in general if one can’t know the concrete type of xthen one can’t rely on the semantics of (seq x) , iow the meaning of the “order” of the things in x; you have to reconsider what you are actually doing. That’s why all this is not the price of clojure, but the payoff. Right?
I will admit that it's a trade off, and sometimes I don't care about it being O(n), I still want to do it, and it be nice to have a function like prepend and append that you can use on all collections and just does that (including seqs and does force realize them if needed). I'd also like something that insert at an arbitrary location that also works even when it's O(n). That said, I also understand not wanting too, and kind of frowning upon it, so that it is not used accidentally and than people say Clojure is so slow 😛
Here is a thing that surprised me that I’m putting on this thread because it inspired me to eval it:
(let [A {1 :a 2 :b}
B {2 :b 1 :a}
A' (hash-map 1 :a 2 :b)
B' (hash-map 2 :b 1 :a)]
(vector
(= A B)
(= (seq A) (seq B))
(= A' B')
(= (seq A') (seq B'))))
=> [true false true true]
hash maps don't guarantee their order
no, I know
i found it surprising that the order in which one types the map literal has any observable effect at all
and that hash-map acts differently is further surprising. But this kind of hits on what we are talking about: this surprise wouldn’t hurt be because I would never rely on the order of (seq coll) if coll could be a plain hash mpa
My guess, and @nnnsadeh can clarify, is in the case where your function might either be at the start of a chain of transforms, or in the middle. In which case, if you had say:
(->> [1 2 3]
transform-1
transform-2)
The transform function might get a vector if it starts the chain, or a seq if it's in the middle or at the end.
When it gets a vector, if it wants to "append", it could conj, but when it gets a seq, it cannot.
So how does one go about implementing a function like that which is "generic" to the position inside the seq transformation pipeline?I think, and I admit it's not a practice I've seen mentioned or that I have ever thought of doing myself, but logically, if you are going to implement a new "sequence" function, you should also call seq on your input.
your can't because seqs are built to be interacted with as a whole. you're thinking about the seq as a "collection", which it is not. thread-last functions are for operating on the entirety.
@nbtheduke hum, I feel this is a common thing. You have a series of seq transforms that you use everywhere, you extract it into its own. It's all lazy, they'll combine properly and return a lazy-seq, with nothing evaluated until it is pulled later.
oh sure, i just mean that's at cross purposes from "i care about this piece being at a specific index"
Say you want to prepend to the seq, you can use conj, and it remains a seq? after doing so, it wraps in a cons cell which is a seq?
So it works, but one day, someone starts with your function a sequence pipeline, and now it breaks, because it appended.
I think the issue is what @hiredman said. You need to think in terms of seq -> seq. And if you want your function to work on seqable? then you have to call seq on the input first, same as all the clojure.core sequence functions do.
It would be a rare edge-case I guess, where if inside your function you first use any sequence function, they will call seq on the input for you. But if you happen to use conj first, it would not. Because conj is not a sequence function, but is actually polymorphic over sequences. So it does not call seq on the input, but instead has an implementation for various colls and for seq.
This is why this isn't normally an issue, and you don't normally have to explicitly call seq on your input I guess.
Is that "leaky"? I don't know, definitely something that can trip you up.
I think what @nnnsadeh says, of just using cons and concat are probably better when working with sequences.
Though, the Clojure cheatsheet shows conj inside the sequences section, maybe conj is a weird one, being it shows up in other places.
@nnnsadeh > IMO "adds an element to a list in the fastest way possible" is an unusual contract. Rich addresses this directly: > It is an important aspect of Clojure that, in general, performance guarantees are part of the semantics of functions. In particular, functions are not supported on data structures where they are not performant.
that's from https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-clojure-doesnt-have-a-generic-insert-lookup-append-that-works-the-same-on-all-collections`insert`https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-clojure-doesnt-have-a-generic-insert-lookup-append-that-works-the-same-on-all-collections`lookup`https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-clojure-doesnt-have-a-generic-insert-lookup-append-that-works-the-same-on-all-collections`append`https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-clojure-doesnt-have-a-generic-insert-lookup-append-that-works-the-same-on-all-collections However I think your concerns are more directly addressed by https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-cannot-last-be-fast-on-vector because he was making these decisions in the context of Common Lisp, which by having a less principled approach highlighted its downsides.
map is not a collection operation
It operates on seqs, which are like a functional version of an iterator
It just happens to call seq on whatever you pass on, and collections produce seqs that are a view over their contents when you call seq on them
https://insideclojure.org/2015/01/02/sequences/ may be a useful resource
• behaving differently based on the type of the input is more or less the defintion of polymorphism
• map returns a seq
• I'd argue that knowing the return type of a function doesn't make it leaky... it seems like the alternative is "this function can return a value of any type", which would seem to imply that there's basically nothing you can do with it at that point... if map could return a number sometimes, then you can't reliably use conj (or basically any other function) on the returned value.
I guess I would expect it to return the same type that was fed into it?
If I run a piece of code like (conj (map f coll) item) I'd want the resulting sequence to be the same regardless of the origin of coll , no?
Otherwise the caller has to know what type of sequence it can pass to the function to get a certain result, meaning it would always have to know how it's implemented.
No
Like I said map is not a collection function, it is a function of, in the language of the article I linked above, seqable to lazy seq, same for filter, mapcat, take, portion, etc, etc
Those always return lazy seqs
Seq here is like a java Iterator, it is a view that can be walked over to consume elements in order
Seqable is like java Iterable
So map is sort of like a function that takes an Iterable and returns an Iterator
I know what it does, I'm pondering the why Say I write a function
(defn inc-prepend-zero
"Increments the items and prepends a zero"
[items]
(->> (conj coll 0)
(map inc)))
My docstring is incorrect. I either have to specify that this only works on a subset of sequences, as it behaves differently for a vector, and hope that the caller reads the docstring (as the function won't throw unless I force it to).
Even worse, if I switch the order of the computations, I get a different resultcons is for prepending to a seq
And you already have plenty of other bugs in that code to worry about, it actually prepends and then increments, and is missing passing in the collection
If you insert your own call to seq instead of waiting for map to do it, then you will get uniform behavior from conj
You may be interested in the operation conventionally called fmap - not in Clojure core but you can find implementations in external libs like funcool.cats or algo.generic.
But note the tradeoffs as mentioned above - by being polymorphic across arbitrary user-extensible collection types you'd be widening the API contract and losing the ability to reason about performance characteristics
For many practical applications it turns out that restricting yourself to predictable constructs like 'map' over seqs / coercing to vec before a conj is more useful - at least from my view that's the philosophy being exemplified by the design of clojure's core lib
> I use concat to avoid having to guess.
concat, like map, returns a lazy seq instead of preserving collection type:
(type (concat [:foo] [:bar]))
;; => clojure.lang.LazySeq
(type (map identity [:bar]))
;; => clojure.lang.LazySeqA few links I like on this topic: https://clojure.org/guides/faq#_collections_sequences_and_transducers, mailing list thread https://groups.google.com/g/clojure/c/znPyDzGkBgA/m/bSGxmchGCwAJ, reddit post/comment about https://www.reddit.com/r/Clojure/comments/4ve288/conj_i_just_dont_get_it_can_someone_help_me/d5xq2k4/
Suffice to say you're not alone in finding the seq API and conj behavior odd at first, though they are well regarded and IMO quite nice once you're used to them.
Oh, and about "What use is it that it doesn't throw an undefined operation exception when the behavior is unexpected?": Rich on https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#shouldnt-nth-nil-1-throw-outofbound-exception`nil`https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#shouldnt-nth-nil-1-throw-outofbound-exception
i think a pertinent point here is that vectors are not considered seqs. they're collections (like maps and sets). i've never heard anyone object to conj having different behavior on maps than on seqs, but of course they do.
because of their similarity to seqs and the necessity to operate on them as a seq without losing their concrete type, there's mapv etc, but i think of those how i think of clojure.set functions
I have less than two years with clojure and here are my 2¢.
One source of confusion is that the seq functions call seq on their argument for you. One could imagine a clojure where, e.g., (map f coll) throws where (map f (seq coll)) succeeds. This would be enlightening for the new and inconvenient for the not-new. These functions take the seq as a second argument, whereas conj, into, etc. take the coll first, which is usually the, uh, “polymorphic position,” i.e. dispatch happens on the type of that first argument.
In terms of @nnnsadeh’s OP, I would say map is not itself polymorphic: it calls seqwhich is.
If a beginner were interested in my advice I would say use into with transducers when I have a coll and want a coll back, and be explicit, either in my head or even right there in the code, about where I am calling seq on something.
As James said, If I want a concrete collection type to work with after calling functions working on seqs (and returning lazy seqs), I use into with the concrete collection type. If I want a vector as a result of map or filter, I use mapv or filterv instead. That way I know how to handle the result and proceed in my code.
You're clubbing different functions together here. Some of them are polymorphic, like conj, others are not, like map and concat. The latter two will coerce the input to a sequence and return one.
I just want to make that clear, because map, filter, and so on are NOT part of the collections API. They are part of the sequence API.
This distinction is very very important to understand.
Collections are polymorphic, meaning the same function has specialized implementations for more than one type of collection.
assoc works on PersistentHashMap, PersistentArrayMap, PersistentTreeMap, all records, TransientHashMap, PersistentVector, and all custom types from libraries or your own implementing Associative.
The sequence API is NOT polymorphic!!
Everything works only on sequences. But the functions will attempt to coerce the input to a sequence if it is possible to do so.
As for conj , I think your issue is that you think of it as append or prepend, but it's simply called conj for conjoin. Which means join or combine. All it does is combine the element into the collection. There is no mention of where it goes.
It simply says, add an element to a collection in ~O(1) time (effectively O(1), it's not true O(1))
For example, you can conj on a set or a map:
(conj {} {:a 1})
(conj #{} 1)
where the idea of putting in somewhere like the start or the end doesn't even make sense.I'm trying out clojure again after a long time and wrote a basic json parser. Would love some feedback on code style and idioms and any other feedback you folks have: https://github.com/mkp7/clojure-json-parser Cheers ✌️
the first thing that stands out is the usage of deeply nested if instead of cond or case .
on a more stylistic note, I see a tendency to put things on one line (eg. the entire defn of parse-null on one line, or if branches that return nil on the same line as the condition). IMHO that makes the code harder to read, and I don't see a benefit of using fewer lines.
(if (some-condition?) nil
(run-some-code))
can be re-written as
(when-not (some-condition?)
(run-some-code))
every form in clojure has a return value(if (nil? key-value-match) nil
(recur (into data (get key-value-match 0)) (get key-value-match 1)))
can be written as
(when key-value-match
(recur (into data (key-value-match 0)) (key-value-match 1)))
hash-maps, vectors, and sets can be used as a functionsalso I would reformat the recur as
(recur (into data ...)
(key-value-match ...))
whitespace is there for human readers, and for human readers newlines are cheap and long lines are expensive (by which I mean, reading difficulty increases with long lines, requiring more effort from the reader, and increasing the likelihood of errors)it's not idiomatic to use ->> to thread the body into a conditional or let form - clearly it works but I think it obfuscates control flow and binding scope
That could be. I personally don't mind reading the ->> form bottom-up. What do you think about this macro:
(defmacro --> [& form]
(conj (reverse form) '->>))
;; example using -->
(-->
(let [user-name "John"])
(when user-name)
(let [user-items [1 2 3]])
(when user-items)
{user-name user-name
user-items user-items})
Compare to this:
;; example using ->>
(->>
{user-name user-name,
user-items user-items}
(when user-items)
(let [user-items [1 2 3]])
(when user-name)
(let [user-name "John"]))
clj-kondo also supports core and other macros so it still lints scope correctly. I was also able to configure clj-kondo for this custom macro which also works perfectly.Yeah, don't
Why not? • It's quit readable (I believe). • Avoids deep nesting. • clj-kondo lints the scopes correctly.
No one else will want to read it
Fair enough. Without it, its readable enough for small enough code:
(let [user-name "John"]
(when user-name
(let [user-items [1 2 3]]
(when user-items
{user-name user-name
user-items user-items}))))
But for larger code blocks macros like this may make code easier to follow. Just a thought. Anyways, its an interesting experiment.The correct fix in that example case is to just remove the when's and collapse the lets
You mean like this?
(let [user-name "John"
user-items (when user-name [1 2 3])
user-data (when user-items
{user-name user-name
user-items user-items})]
user-data)
I'm also considering for possible nil values. This also looks pretty neat. Thanks!It really depends, and is hard to say with the given example, because I the obvious way to improve it is to remove the when's, because in the given example it is immediately obvious that they don't do anything
Depending on the use case, using cond-> or merge are common for these types of things:
;; using cond->
(let [user-name "John"
user-items [1 2 3]]
(cond-> {}
user-name (assoc user-name user-name)
user-items (assoc user-items user-items)))
;; using merge
(merge
{}
(when-let [user-name "John"]
{user-name user-name})
(when-let [user-items [1 2 3]]
{user-items user-items}))Those are better
The real problem with the original code isn't nesting, it is interleaving binding and condition checking
And the --> version still does that, just in a whacky way
My bad with the previous example. This is a very rudimentary example to emulate possible nil values:
(defn get-user [user-name]
(when (= user-name "John")
{:user-name "John" :other-details {}}))
(defn get-user-items [user-name]
(when (= user-name "John")
{:user-items [1 2 3] :other-details {}}))
(let [user-record (get-user "John")
user-items-record (get-user-items (get user-record :user-name))
user-data (when user-items-record
{:user-name (get user-record :user-name)
:user-items (get user-items-record :user-items)})]
user-data)
;; {:user-name "John", :user-items [1 2 3]}
(let [user-record (get-user "JohnD")
user-items-record (get-user-items (get user-record :user-name))
user-data (when user-items-record
{:user-name (get user-record :user-name)
:user-items (get user-items-record :user-items)})]
user-data)
;; nil
using cond-> or merge would still give me empty {} in case of nil values.
;; using cond->
(let [user-name (get-user "JohnD")
user-items (get-user-items "JohnD")]
(cond-> {}
user-name (assoc :user-name user-name)
user-items (assoc :user-items user-items)))
;; {}
;; using merge
(merge
{}
(when-let [user-name (get-user "JohnD")]
{:user-name user-name})
(when-let [user-items (get-user-items "JohnD")]
{:user-items user-items}))
;; {}Start with nil in the cond->
Thanks! That also works.
As a side note, you usually want to avoid caring about whether you have a nil or an empty map.
Thanks a lot, refactored the code following your suggestions. Looks much more neat and readable.
Also it's quite refreshing to refactor the code using -> and ->> macros, it removed fair bit of deep nesting improving the readability.