This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-02-09
Channels
- # announcements (26)
- # babashka (4)
- # beginners (17)
- # calva (21)
- # cider (13)
- # clerk (17)
- # clj-commons (23)
- # clj-kondo (3)
- # cljdoc (47)
- # cljsrn (10)
- # clojure (123)
- # clojure-belgium (2)
- # clojure-dev (25)
- # clojure-europe (34)
- # clojure-gamedev (2)
- # clojure-italy (1)
- # clojure-nl (3)
- # clojure-norway (4)
- # clojure-uk (4)
- # clojurescript (86)
- # cursive (12)
- # datahike (2)
- # datomic (2)
- # emacs (4)
- # fulcro (6)
- # funcool (15)
- # instaparse (1)
- # integrant (11)
- # jobs (1)
- # joyride (9)
- # kaocha (3)
- # membrane (8)
- # off-topic (1)
- # pathom (4)
- # practicalli (2)
- # quil (1)
- # rdf (1)
- # reagent (9)
- # remote-jobs (1)
- # shadow-cljs (27)
- # spacemacs (4)
- # specter (1)
- # sql (11)
- # tools-deps (55)
- # vim (1)
just an exclamation: I've been clojuring off and on since 2009. I today rediscovered defprotocol and defrecord (I had used them in Prod at Amazon 10 years ago, but forgot how good they are). Removed hundreds of lines of demultis from my code base today. Bravo, again, Clojure!
Your https://www.youtube.com/watch?v=wASCH_gPnDw was one of the things that motivated me (and I'm sure a lot of others) to learn Clojure. Thanks for that, and https://www.youtube.com/watch?v=ZhuHCtR3xq8.
Quick question about http://clojure.java.io/reader Why is it that when I do the following the reader gets messed up and the csv file changes order.
(let [reader (io/reader csvfile)
separator (re-find #"(?:\,|\;|\|)" (first (line-seq reader)))
parsed (csv/read-csv reader :separator (.charAt separator 0))]
(prn (first parsed))) ; print the first line of the csv file, typically the column headers
;=> ["value" "value" "value"]
But when I do it with a second reader it doesnt (which is expcected as it creates a second reader object).
(let [reader (io/reader csvfile)
separator (re-find #"(?:\,|\;|\|)" (first (line-seq (io/reader csvfile))))
parsed (csv/read-csv reader :separator (.charAt separator 0))]
(prn (first parsed))) ; print the first line of the csv file, typically the column headers
;=> ["col1" "col2" "col3"]
Is that because the line-seq does something to the reader object and messes up the order?Readers have internal positions, they are stateful objects. If you start reading from a reader, all further calls to read
will continue reading from the reached position - not from the very start.
Is there a way to keep the internal position to overcome having to use 2 reader objects?
The reader should have a .reset
method but I've never used it myself and have no clue whether there are instances when it cannot help (assuming it can help at all).
BTW note that your way to detect the separator won't work if e.g. the separator is ;
but some value has ,
inside. Or if the separator is \tab
.
thats correct, the csv file i made by hand tho so will work for now. but it indeed needs a better way to detect the separator
Ideally, there should be no need for detection at all. :) IMO if you control the format, it should be fixed.
.reset
behaviour depends on reader implementation. In some cases, even if you use .mark
, the reader's internal state might be incomplete. To avoid confusion when a change of csvfile
value makes .reset
unusable, you could rely on interop:
(let [reader (io/reader csvfile)
separator (re-find #"(?:\,|\;|\|)" (.readLine reader)) ; read first line
parsed (csv/read-csv reader :separator (.charAt separator 0))]
(prn (first parsed)))
That has the exact same problem though - the call to prn
will receive the second line, not the first one.
line-seq
does not chunk, if that's what you meant. So using interop won't change anything.
I fixed it like so:
(let [reader ( csvfile)
_ (.mark reader 100)
separator (re-find #"(?:\,|\;|\|)" (first (line-seq reader)))
_ (.reset reader)
parsed (clojure.data.csv/read-csv reader :separator (.charAt sep 0))]
(prn (first parsed)))
In the final version I just removed the sep findingAnother, more predictable, alternative is to wrap the reader with PushbackReader
and then push the first read line back into it.
Yet another alternative is to call read-csv
twice - once for the header and the second time on the "remaining" reader, then combine the results.
(let [reader (io/reader csvfile)
first-line (.readLine reader)
separator (subs (re-find #"(?:\,|\;|\|)" first-line) 0 1)
parsed (cons (string/split first-line separator) (csv/read-csv reader :separator (first separator)))]
(prn (first parsed)))
I posted an example to illustrate the idea. Here is a full solution.That's not correct.
string/split
will fail because separator
is not a regex.
Using (re-pattern separator)
will fail when the separator is |
.
string/split
will return incorrect result when the column names are quoted.
Just... don't use string/split
for CSV. Use csv/read-csv
, even if it's 2 calls instead of 1.
Also, a bit beside the point - it makes more sense to store the separator as a character from the get go, instead of using subs
only to then use first
on it.
Thanks for the feedback, didn't have opened repl to test it. Still those are manageable problems to compare with creating two readers or using an implementation-dependent reset method.
I would prefer creating 2 readers. They're pretty lightweight and lazily read the data. Also, you need to close the reader when you're done with it.
I've been trying to wrap my head around the with-open part so the reader is closed afterwards. For some reason Stream closed
is thrown at the end of my function and this is after the file is parsed and handled. I tried with #dbg to see what happens and I can see it doing the things I want it to but at the end throws stream closed. Why is that?
The function:
(defn read-matrix-file
[csvfile]
(with-open [reader (io/reader csvfile)] ;; create a buffered reader from the path
(let [parsed (csv/read-csv reader :separator \;) ;; read the csv file with separator
header (first parsed) ;; get the header
rows (rest parsed) ;; get the rows
keymap (fn [map] ;; accepts a map with strings as k/v
(reduce ;; reduce the provided map
(fn [newmap [k v]] ;; run over the new map and key value pairs
(if (.contains k "groep") ;; if a key contains
(update newmap :groups conj v) ;; combine them in a new key/value pair
(assoc newmap (keyword k) v))) ;; otherwise return the key/value pair
{} ;; the new map
map))] ;; the provided map
(keep
(fn [row]
(->> (zipmap header row)
(keymap)))
rows))))
without the with-open and the reader inside the let it works just fine but the reader doesnt get closed, which is expected but unwanted
But it does work when I put #dbg infront of the keep , I can see it doing what it must do
Okay, huh? So it works with mapv but not with map or keep. It works with filterv but ofcourse does not output what it should, and does not work with filter
And if I change the keep into a reduce it works aswell
(reduce
(fn [v row]
(conj v (->> (zipmap header row)
(keymap))))
[]
rows)
Yeah. Mixing side effects and laziness is not recommended ;) ... This is one of the great things about transducers, laziness is decided at time of application rather than construction.
Like all the occurrences of the word do
in clojure, doall
should be considered a warning of "here be side effects" ... It marks a boundry between your functional world where laziness is fine and your side-effecty world where laziness breaks.
transducers don't have this problem because the way you get laziness is by applying the transducer in a lazy context with sequence
(defn group-keys [m]
(reduce-kv (fn [newmap k v]
(if (.contains k "groep")
(update newmap :groups conj v)
(assoc newmap (keyword k) v)))
{} m))
(defn process-csv-as-maps [xform rdr]
(with-open [reader (io/reader rdr)]
(let [[header & rows] (read-csv reader)]
(->> rows
(into [] (comp (map (partial zipmap header))
xform))))))
(process-csv-as-maps (map group-keys) (java.io.StringReader. "1,2,3\na,b,c\nd,e,f"))
for example, you can separate out the side-effecty processing of turning a reader into a vector of processed records, passing in something to filter
/`map` every record and perform some transformations ... if that makes sense?and if you were still trying to detect the separator, I would probably just create that as a separate function
Unexpected destructuring behaviour:
(let [{foo :foo, :or {foo 1}, :as bar} {}]
[foo bar])
;; => [1 {}], not [1 {:foo 1}]
Is this a bug? bar is bounded to the entire map, which is {}, only foo is bounded to 1 if :foo is null, so it is the correct behavior
Hmm, right. Still seems 'wrong' though! At least from the standpoint of, '`:or` defines default values'.
:or
works per-key, not on the whole value. for example
(let [{foo :foo bar :bar, :or {foo 1}} {}]
[foo bar])
;; => [1 nil]
if your mental model is “:or means it replaces the input if the key is missing” then it’s wrong. mental model should be “for each key that is missing, take the value from the :or part”
I still think there's an argument for the behaviour I expected. But I suppose it would complect :or
and :as
, heh.
I think it is a normal thing to expect, but also the other way around, so :man-shrugging:
map argument is immutable, you should not expect any instruction to mutate it. including the combination of :as and :or
> I still think there's an argument for the behaviour I expected...
E.g. think of the case where the RHS is a variable supplied by argument—or the destructuring is happening in a function's arg vector. In that case it would be especially nice to think of :or
as (really) supplying default values to bar
, as opposed to having to explicitly use foo
everywhere it's needed.
it is useful in some cases for sure, but then you don't have a way of having a reference to the original value when using :or
which will break the expectation of people wanting to always have :as
pointing to the original input, which is also useful, so, choices
kind of late
it is helpful to keep in mind that destructuring is at heart about binding locals, not a machine to apply transformation or integration of data (you have lots of Clojure functions for that in your code). :or supplies defaults when you bind
if you want to develop a better intuition about this, destructuring is backed by an undocumented function clojure.core/destructure
and you can apply that yourself to see what it's being translated to. As this is a function called by a macro, it takes the quoted binding vector as input (manually reformatted this to make it look like normal code):
user=> (pprint (destructure '[{foo :foo, :or {foo 1}, :as bar} {}]))
[map__6 {}
map__6 (if (seq? map__6)
(if (next map__6)
(PersistentArrayMap/createAsIfByAssoc (to-array map__6))
(if (seq map__6)
(first map__6)
PersistentArrayMap/EMPTY))
map__6)
bar map__6
foo (get map__6 :foo 1)]
> destructuring is at heart about binding locals, not a machine to apply transformation or integration of data
💯 , however I too have been baffled by this in the past! I wonder why so many Clojure beginners expect :or
to "pollute" :as
(myself included, I can't really explain it).
note that the default 1 just shows up at the end as the default when foo is bound
clearly this is a good topic to add to the clojure puzzlers book I'm working on! :)
> I wonder why so many Clojure beginners expect :or
to "pollute" :as
(myself included, I can't really explain it)
I think because it is not a crazy expectation that the :or will shadow the map values, even in the :as
binding, since when you use it in a function parameter most of the time you just want that
Right, it's not crazy but if it worked like that you'd lose the original map, which is worse because you can't get it back.
I'd much rather get rid of :or
than have an :as
that maybe contains the original map (if I had to choose). Performance aside, (merge {:foo 1} bar)
is quite clean if you need that behaviour imho.
> clearly this is a good topic to add to the clojure puzzlers book I'm working on! 🙂 Glad I asked @U064X3EF3! :))
hey fambly - mapped my shift keys to parens and getting back into clojure, in part because I started using logseq (https://github.com/logseq/logseq/). there have been a bunch of posts about it in various channels, but no dedicated channel to talk about it so I made one at #logseq
ah didn't know that, thanks! I imagine the channel here will be low traffic, but in case others dislike discord as much as I do maybe it will be a good space to chat about the clojure-specific elements of it
Thanks for kicking this off! I'd been meaning to but also didn't know if there was interest. There aren't too many clj(s) hackers in discord so worth a shot
Can someone tell me what's going on here?
user=> (= (edn/read-string "\"123\\n456\"") (edn/read-string "\"123\n456\""))
true
the first string has a literal "\n" and the second has a \n newline character
and they both read as a string with a literal \n
I'm just talking through it. did that actually help?
I think so. The first, literal newline, is read as a newline, but the non-literal newline is really more like:
(edn/read-string "123
456")
right?might help to println
those strings
the first is a 10 character string with the characters \ and n in it, the second is a 9 character string with a \n char in it
but the reader is turning \ n into the \n char ?
yeah, it does that
seems so yes:
user=> (count (edn/read-string "\"123\\n456\"") )
7
user=> (count (edn/read-string "\"123\n456\"") )
7
user=> (println (edn/read-string "\"123\n456\""))
123
456
nil
user=> (println (edn/read-string "\"123\\n456\""))
123
456
nil
I mean, pretty obvious in the code
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LispReader.java#L564
maybe too good :)
well then you're in whitespace, not in Clojure strings
ednreader is a copy of lispreader so they are in general in sync (but with some features removed)
I wonder at what level of application it becomes appropriate to "take in" the program config from a global var. Currently I would pass a config from the main entry point down everywhere as argument
Ah you mean in a small project you get away with implicit input and in a big project it should be explicit always?
organize your config like your modules and merge them https://github.com/juxt/aero#using-aero-with-components
Another thing I found works well with that is more components. smaller components. It makes sense in the end
I was watching a talk by Rich Hickey about Reducers (from QCon NY 2012): https://www.youtube.com/watch?v=IjB-IOwGrGE Back then he called this "Reducers library", is it the same thing he later called "Transducers"?
no, they have some similarities, in some ways the reducer stuff a can be seen as a precursor to transducers, but the implementation is different, and the reducers library has some stuff that hasn't been re-implemented for transducers (the parallel fold stuff)
the reducer api usage is very familiar, clojure.core/map takes a seq and returns a seq clojure.core.reducers/map takes a reducible collection and returns a reducible collection
the transducer api is kind of a radical departure from that, instead you build up a transformation that is applied to the function you use to reduce the reducible collection. reducers sort of do this internally, incrementally, clojure.core.reducers/map returns a reducible collection that when reduced transforms the reducing function by combining it with the function passed to map, then uses the transformed function to reduce the collection passed to map