Fork me on GitHub
#beginners
<
2020-08-05
>
lsolbach00:08:11

:jvm-opts ["-Dglass.gtk.uiScale=200%"]

lsolbach00:08:24

@cb.lists That was real gold, now my REBL is scaled and usable on Ubuntu HighDPI

lsolbach00:08:04

made it in my deps.edn file 🙂

lsolbach00:08:43

@seancorfield BTW, thanks a million for sharing your deps.edn file on GitHub. That's a really good template to start with. :-D

ami06:08:47

Helo folks, new clojurian here with a (probably simple) question: given two maps with identical keys, I want to update the first map with the other map, by iterating all the keys so something like this: m1 {:a "A" :b "B"} m2 {:a "c" :b "d"} -> m1 {:a "Ab" :b "Bc"} . what the approach here? for ?

ilari.tuominen06:08:02

I’m still much a beginner but I’d use (merge-with str m1 m2) , though I’m a bit confused by your output since m2 has different values than your expected output. My suggestion would output {:a "Ac", :b "Bd"}

ami06:08:42

thx! I know of merge-with and indeed for the example I posted its likely the best approach. I come from python and iterating over keys is something I’m used to doing. Is there a similar approach in clojure?

valtteri11:08:27

merge-with str is perfect here. The more general way would be to use reduce or reduce-kv , for example

(reduce-kv (fn [m k v] (update m k str v)) m1 m2)

ami12:08:08

thank you!

noisesmith13:08:24

another common idiom is into, consuming pairs from a map:

(into {}
      (map (fn [[k v]]
             [(string/reverse k)
              (* v v)]))
      {"dog" 0
       "fish" 1
       "cat" 2
       "cow" 3})
{"god" 0, "hsif" 1, "tac" 4, "woc" 9}

noisesmith13:08:54

where a two element vector will be turned into a key value, and nil / empty will be skipped

jimka.issy07:08:23

I have a record which I've defined with (defrecord Dfa ...). In my program I create instances of this record two different ways. First, from scratch using (map->Dfa {:key value :key value ...}) and also using an existing object such as (map->Dfa (assoc old-dfa :key value :key value ...)) . Now, I'd like to enforce some consistency within the object to prevent me from creating objects which don't make semantic sense. I can easily write a verify function which checks the interrelation of all the key/value pairs. But how can I ensure that that verification function is run every time a Dfa object gets allocated? One thought is to write a factory function. How can I a function which does either of the constructions listed above depending on whether an initial object is given. Given such a factory function I can of course call any consistency checking code I want, spec or otherwise.

nbtheduke13:08:11

This sounds like a job for spec or for another schema-like library.

jimka.issy21:08:07

yea, but the question is can I define a spec which is validated whenever map->Dfa is called? It is defrecord who writes map->Dfa, not me.

nbtheduke21:08:16

Oh. I don’t know about that. You could make your own map->Dfa and call that instead.

sritchie0919:08:33

@jimka.issy yes, Clojure I think is pretty explicit that you should feel comfy writing your own constructors

sritchie0919:08:41

the generated ones are nice

michael.e.loughlin07:08:51

Does anyone know of an overview of the performance characteristics of clojure's persistent data structures?

michael.e.loughlin08:08:47

I'm messing around converting interview-style whiteboard problems from JavaScript to Clojure, and there's a whole lot of guff in the JS versions I suspect isn't required at all in Clojure

jakob.durstberger08:08:31

Could this be of interest to you? https://hypirion.com/musings/understanding-persistent-vector-pt-1 It also has a link to the paper by Bagwell.

michael.e.loughlin08:08:58

thanks, I'll take a look

jimka.issy09:08:20

what does lein test do? I mean does it assume I'm using a particular Clojure testing framework? or does it try to figure it out? Or is my testing framework registered somewhere to enable lein to configure itself?

tzafrirben09:08:26

It is not very well documented, but you can “mark” failed test by adding metadata key to deftest or ns to specify which tests to skip. lein help test once you define a deftest with metadata, for example

(deftest ^:skip my-failure-test ...)
Add selectors to your project.clj file
:test-selectors {:default (complement :skip)
                 :errors :skip}
And execute lein test :default

jimka.issy11:08:42

can I use ^:known-failure rather than ^:skip or is skip a word which has some meaning to the system ?

tzafrirben12:08:40

you can use which ever name you want for the metadata tag - in fact, the example used in lein test help is ^:integration

noisesmith13:08:59

it doesn't use a testing framework, it uses the built in clojure.test

noisesmith13:08:55

the test selectors are a DSL that lein uses, which primarily use metadata on tests to select which to run, there is no mechanism for marking known failures

noisesmith13:08:28

the keywords are arbitrary, you can use anything

noisesmith13:08:14

so yes, :known-failure works just fine

noisesmith13:08:46

if it's not clear, the selectors are functions called on the test's metadata -> :foo being a function that looks up :foo in a hash map, so it returns truthy if the key is mapped to something truthy

noisesmith13:08:09

so you can use arbitrary functions, which test for arbitrary combinations or even values of keys on the metadata of the test

jimka.issy09:08:41

I have some test cases which are known failures. Is there a way to tell the testing framework that they are expected failures? I.e., they are documented loopholes in my implementation, which need to be fixed someday, but fixing them is difficult and of dubious importance.

ben.sless10:08:04

Do they throw exceptions or just fail?

jimka.issy11:08:00

They just fail

jimka.issy11:08:12

The idea being that if a system has been shipping for a long time, and someone discovers a bug which has been there for a long time, that doesn't mean you have to stop shipping. That ship has sailed. But it is good to document the known failure.

jimka.issy09:08:02

When I use (for ...) to specify concentric iteration loops, I can decorate each level of the loop with :when, :while, and :let , but for as I understand always computes some accumulated return value. On the other hand if I want to iterate centric loops simply for side effect, such as test assertion, I can use doseq but then I loose the :when, :while, and :let , capability. Is there someway to tell for not to accumulate any values, but rather iterate for side effect only?

ben.sless10:08:15

doseq has the same API as for, you don't lose all of those

meepu11:08:50

Can someone help me understand collection types here? Expected: #{{:order_number 1, :id "1234"} {:order_number 2, :id "435"} {:order_number 3, :id "34894"}}      got: #{[[{:order_number 2, :id "435"}]] [[{:order_number 1, :id "1234"}]] [[{:order_number 3, :id "34894"}]]} (using = ) What collections do #{{ and #{[[{ represent?

tobyclemson11:08:21

The expected is a set containing maps, the actual is a set containing vectors of vectors of maps.

tobyclemson11:08:40

They aren't different types of collections, they are nested collections

meepu11:08:26

Thanks! I think I figured where I went wrong.

jimka.issy11:08:20

does run-tests suppress the stdout of the tests? Is there a way to run the tests and look at the stdout, i.e., the output of all my calls to println in the testing functions? It's curious, because sometimes the print messages seem to appear, and sometimes they don't. I haven't figure out what's happening. maybe I'm just confused?

jimka.issy11:08:22

Ahhh, perhaps I understand. The following prints nothing, but if I change for to doseq it prints. I think that means with for it didn't make any assertions, related to the fact that for creates a lazy sequence. right?

(deftest t-acceptance
  (testing "acceptance"
    (for [exit-value [42 true -1]
            rte '((:* Long)
                  (:* Short)
                  (:or (:* (:cat String Long))
                       (:not (:+ (:cat String Short)))))
            :let [dfa (rte-to-dfa rte exit-value)
                  dfa-trim (trim dfa)
                  dfa-min (minimize dfa)
                  dfa-min-trim (trim dfa-min)]
            seq-root '([]
                       [1]
                       ["hello" 42 "world"])
            reps (range 5)
            :let [seq-long (reduce concat (repeat reps seq-root))
                  match? (boolean (rte-match dfa seq-long))]
            ]
      (do (println [:match seq-long match?])
          (is (= match?
                 (rte-match dfa-trim seq-long)))
          (is (= match?
                 (rte-match dfa-min seq-long)))
          (is (= match?
                 (rte-match dfa-min-trim seq-long)))))))

rob70313:08:14

Hi guys, I have a fairly big question. I hope it's alright to ask this here. So some days ago I found myself in a new project which seems to scratch all kinds of AI, Logic and Optimization problems. Now, I'm not an expert in any of these domains, but I strive to be someday. Also, this is my first real clojure project. My immediate problem is something like this: > "For a room of the type 4 person office, I need a minimum of X m² as determined by a list of factors Y. Also, if a particular factor Y is chosen, it may have further constraints on the possible values of Z". > > Y could be some room/building structure, where if a particular structure is chosen I can further only place a subset of Z in it. I will have a bunch of (semi-)structured data, that I would need to store and query somewhere. This data forms a graph with probably thousands of edges. A simple blob of data could look like this:

{ :id   1
  :type :roof
  ;;...
  :requires 2}

{ :id   2
  :type :heater
  ;;...
  :requires [3 4 5 ...] }
Just thinking about doing this in SQL terrifies me and I hope Datomic/Datalog & core.logic can help me build a solution. But I haven't used either yet, so I'm not sure if I'm on the right track. So I guess my question is: Can datalog, datomic and core.logic help me to model and query an arbitrary graph of data, which is used as a set of inputs into a generative application? I know there's also datahike and crux which may suite my needs, but I'm not sure about the pro's/con's there either.

taylor.jeremydavid14:08:19

Hi :) I work on Crux (your mention brought me here). My initial impression is that Datalog will be an excellent fit for such a project, and unless you have extremely complicated constraints that Datalog alone is unable to solve efficiently I don't imagine you will also need core.logic It's worth wrapping your head around the ability to define rules and custom predicates as I think the combination will be needed

rob70314:08:25

Awesome, I'm glad to hear this! Thank you 🙂 Yeah I'm currently going through datomic and datalog tutorials. The reusable rules sound really good so far So if I understand this correctly, core.logicis complementary to datalog?

rob70314:08:44

To be more specific, as far is a understand core.logic is a full fledged logic programming library which implements miniKanren, but datalog is a subset of prolog, so they actually don't have anything in common at first. So I assume you would write some custom code in core.logic on a set of data in memory, that I might have already pulled from somewhere else via a datalog query, to restrict my result set further, correct?

taylor.jeremydavid14:08:12

core.logic can certainly be used like that, yep. I've never tried such a thing myself though. Some of the best advice I ever heard was to model your problem small-scale and in-memory first, and worry about larger-scale persisted solutions later. Based on that I would recommend playing with combinations of DataScript+`core.logic` ahead of thinking about Datomic/Datahike/Crux. By the time you're familiar enough with DataScript to understand how Datalog solves your challenges I think you'll find that translating your working model to any of those dbs is easy enough 🙂

rob70314:08:37

Alright, that sounds like a good plan actually. Thanks! All this stuff blows my mind by the minute right now haha

jimka.issy13:08:57

as an anecdote, a common problem which happens to me which I discover when debugging Clojure programs, is that there are multiple false values. In Common Lisp there is only one false value. Thus in CL any two values which are false, are also equal. In Clojure two false values may not be equal because one is nil and the other is false. This causes subtle bugs in my programs. not complaining, just noting...

jimka.issy13:08:49

case in point how can I best change the following expression (map (group-by :accepting (states-as-seq dfa)) [true false]) so that all false-ish values are mapped to valse. I'd like to write something like: (map (group-by (compose :accepting boolean) (states-as-seq dfa)) [true false]) (fn [x] (boolean (:accepting x)))

nbtheduke13:08:09

(comp boolean :accepting) should work. comp applies the functions from right to left. https://clojuredocs.org/clojure.core/comp

jimka.issy13:08:15

ahh so the arguments are in the opposite order of the order they are applied. that makes sense and is confusing at the same time. 😉

noisesmith13:08:11

@jimka.issy my a-ha moment was realizing that comp just deletes parens (fn [x] (f (g (h x)))) is the same as (comp f g h)

ben.sless13:08:31

But then you get to the mind-bend of transducers applying from left to right after you got used to it 🙂

noisesmith13:08:13

right - but the transducers aren't called on the transduced data, they are called on each other (each one is the callback that the one after it uses)

noisesmith13:08:41

it's a weird little knot to tie, but the result is so useful

ben.sless13:08:19

very. With transducers I just sat down once and did the substitution evaluation myself and they finally clicked

noisesmith13:08:44

yeah, working the stuff out on paper is underappreciated

ben.sless14:08:21

Well, I'm a philistine and just write it in an emacs scratch buffer, but agreed

noisesmith14:08:52

haha, I like paper because it lets me draw circles and arrows :D

ben.sless14:08:07

Allow me to introduce artist-mode

ben.sless14:08:20

you can later export them to PNG using ditaa 😊

ben.sless14:08:58

but yeah, I sometimes draw diagrams on paper. Later I may source control them with dot language

noisesmith14:08:21

yeah, for me the point of the diagram is to activate my shape and line neural coprocessor wetware, and I really need the finger tactile element for that interface

noisesmith14:08:57

which is a silly way of saying "there's a part of my brain that's good at shapes and spacial relationships and helps me think and doesn't work in a text UI"

noisesmith14:08:26

it's like a GPU but it comes from evolution instead of nvidia 😆

ben.sless14:08:45

completely right. There's some literature on various techniques for activating both hemispheres when solving problems

noisesmith14:08:08

but maybe there's something I can do with the touch screen on my xps-13....

ben.sless14:08:37

Now you're thinking with portals...

ben.sless14:08:53

Everything which engages several senses will work. Barely audible music, periodically shifting lighting. It has to be subtle enough not to distract, but if you nail it it ought to work. Even with tactile engagement, you can keep a pile of items with different textures. you can play around with them while thinking. Should have similar effects.

noisesmith13:08:22

same thing, just less nesting

noisesmith13:08:04

once I realized that, comp became my favorite thing

jimka.issy14:08:27

its a bit surprising that the function is called comp rather than compose , as if it is to be considered a function you should use frequently. constantly is not called cons and I use that function pretty frequently. na ya

noisesmith14:08:30

I think partial, comp, apply etc. would be even more useful if they had terser shorthands / symbols

ben.sless14:08:19

got any symbol / shorthand suggestions? P,`o` for partial and comp, respectively, sound good

noisesmith14:08:36

yeah, with unicode there's · (raised dot, like in math), I like P

noisesmith14:08:58

I guess o is kind of like the dot

ben.sless14:08:05

Emacs clojure-mode has a few fancy symbol replacements, I just ripped it off. Any suggested shorthand for apply?

noisesmith14:08:22

hmm - the thing that gets me about that is that it creates a facade between the contents of the file and the UI - a layer of indirection I'm not used to. at least it's opt-in - I don't know what would work for apply...

noisesmith14:08:55

* and @ would make sense, if they weren't already taken

ben.sless14:08:14

Sadly we don't have unicode keyboards and source files are still universally ASCII unless you're fooling around in plan9

ben.sless14:08:06

$ is the application operator in haskell :man-shrugging:

noisesmith14:08:27

aha, yeah, makes sense

noisesmith14:08:04

I have often thought we should find a use for | - it would have been a decent alternate name for ->

ben.sless14:08:03

maximizing keyboard utilization. We need a use for | and !

noisesmith14:08:00

and % can find use outside #()

ben.sless14:08:22

Feels dangerous overloading meaning

noisesmith14:08:12

which reminds me of a great little "do you know clojure as well as you think you do?" example:

user=> (#(%2%1%):a{})
:a

noisesmith14:08:33

% inside #() doesn't follow the parsing rules the rest of the language uses (but this should only come up in pathological code, so it's not something we need to know or fix)

ben.sless14:08:46

> pathological code general statement regarding #() and % or particular cases?

noisesmith14:08:13

the weirdness only comes up if you remove spaces between tokens

noisesmith14:08:25

only pathological code does that

ben.sless14:08:17

yes. Pardon the traumatic response, been exposed to some pathological code

noisesmith14:08:23

that #(%2%1%) is only a valid multi element form because #() has a weird parser

ben.sless14:08:44

Never seen anyone abuse that, yet

noisesmith14:08:05

right, the only real reason to do so is perversity, there's no pragmatic use for it, but as a counter example it shows us how regular the language is 😄

noisesmith14:08:46

and IMHO if you aren't using comp constantly, you're missing out

jimka.issy14:08:06

I had a bad experience with compose several years back. A coworker wrote a relational algebra engine using extreme-scheme-ish programming full of invert/compose. It was really difficult to understand and debug. Of course I understand the temptation, as it is mathematically elegant.

stebokas14:08:30

Hello, please remind me ho to stringify such [object Object] printout in ClojureScript I use prn .

lsolbach14:08:36

(str object)?

noisesmith14:08:06

in many cases (.stringify js/JSON o) is the best result

dpsutton14:08:38

if in a browser (.log js/console thing) can give the best results as well

cancandan16:08:52

Fun fact, i tried some of the editors, it seems the only thing that works as advertised is emacs.

noisesmith16:08:59

emacs, vim, atom, intellij/idea, vscode all work

noisesmith16:08:30

emacs might have a better pre-packaged working instantly out of the box experience, that wouldn't suprise me

dpsutton16:08:34

it would surprise me if its the most easy to set up for a beginner to emacs

noisesmith16:08:58

there's zero friction stuff like spacemacs

dpsutton16:08:52

sorta. even those require some manual reading to learn how to visit a file. what m-x means, etc

cattabanks17:08:28

A friend recommend spacemacs, but the improvement ravine for learning helm and m-x was to steep to climb. Spent a weekend trying to set things up and went back to GNU Emacs on Monday just to boost productivity.

noisesmith16:08:58

but I agree with what I think is the mainstream take: don't learn a new editor in order to use clojure

noisesmith16:08:31

for most people clojure has enough new ideas, a new editor too is a bit much

noisesmith16:08:40

especially one of the weird old ones

seancorfield16:08:59

My advice: if your current editor has a Clojure integration, use that while you're learning the basics of the language and then explore other editors when you want to look at changing/improving your workflow.

seancorfield16:08:26

The key thing is to find a REPL integration for your editor that allows you to just eval the current form (or top-level form) with a hot key so it becomes second nature to evaluate changes in a file as you make them, rather than making a bunch of changes, then shifting to some separate process for evaluating code or running tests.

seancorfield16:08:07

Corollary: avoid typing directly into the REPL -- put exploratory code in a file, inside a (comment ..) form for example, and evaluate from the file, "always".

cattabanks17:08:58

I see what you mean, but doesn't this break the REPL development loop since you have to toggle between file and REPL?

seancorfield17:08:34

Not sure what you mean. Why would you need to toggle to the REPL?

cattabanks17:08:18

Because you would be typing in a file, but evaluating the forms in the REPL - or am I misunderstanding something?

seancorfield17:08:23

You evaluate from the file. You only need a results view open/visible but you should almost never need to actually switch to it?

seancorfield17:08:05

(I'm used to results appearing inline in Atom/Chlorine, as they did in Atom/ProtoREPL -- and I think VS Code/Calva and IntelliJ/Cursive can also do this?)

seancorfield18:08:01

I also work with Cognitect's REBL and Vlaaad's Reveal data browsers which is where I usually view my results in expanded form (e.g., as tables or graphs).

lasse.maatta18:08:55

The Cursive user guide has a neat animation showing what sean is describing, https://cursive-ide.com/userguide/repl.html#interaction-with-the-editor

cattabanks18:08:54

Thanks @ I use stock emacs

cb.lists02:08:50

Cursive doesn't display the results inline (an oft requested feature that the IntelliJ platform apparently doesn't easily afford). But it has great in-editor evaluation facilities, which works very nicely alongside a visible REPL window. Having a bit of a procrastinatory tool-fiddling habit, I've given each of the big 4 (Emacs/Cider, IntelliJ/Cursive, Atom/Chlorine & VSCode/Calva) a decent try, and all of them are very capable. Clojure editor/IDE support seems solid enough across the board to me that beginners reading up on editors shouldn't agonise too much. There are plenty of viable options.

seancorfield03:08:18

And there also some interesting data visualization tools available now to supplement the editor/REPL: Cognitect's REBL, Reveal, Portal...

cancandan16:08:20

Eg. You follow the advice at http://clojurescript.com and go ahead and install proto repl, only to find that it only works with shadow

seancorfield17:08:12

ProtoREPL has been abandoned for ages. If you're using Atom, use #chlorine which is very actively maintained (although I can't speak to its ClojureScript story since I only ever use Clojure).

seancorfield16:08:19

Stu Halloway's talks on REPL-Driven Development and Running With Scissors are good examples of REPL workflows. Eric Normand's REPL-Driven Development is an excellent course if you're willing to spend some money.

cancandan17:08:47

Well i just followed the official docs, and it says proto repl there lol

seancorfield17:08:25

@cancandan Can you provide a link? I was just looking at the ClojureScript getting started and I don't see it mentioned.

seancorfield17:08:42

If I know which page it is on, I can submit a PR to fix it.

seancorfield17:08:31

Thanks. I'll open an issue on the clojurescript-site and tag the maintainer of Chlorine and see if we can get this updated.

seancorfield17:08:25

@cancandan If you have any additional details of what didn't work for you, feel free to add them there. Also, if you feel inclined to try Chlorine with Atom, the #chlorine channel is very helpful and active.

cancandan17:08:49

Excellent, thanks a lot

cancandan17:08:15

Yeah i think i confused it with chlorine, which is something that i also checked out, it says it only works with shadow. https://github.com/mauricioszabo/atom-chlorine

cancandan17:08:53

A lot of stuff to make work together unless you are using emacs lol.

seancorfield17:08:24

I used Emacs for about twenty years before I switched to Atom back in 2015. It was ProtoREPL that caused me to switch originally so I was glad to jump to Chlorine when that appeared, since ProtoREPL had been abandoned. I like that I only need a Socket REPL and zero dev dependencies so I can have the exact same workflow with REPLs running locally as well as REPLs running inside production processes (yes, I connect Atom to live production servers from time to time).

seancorfield17:08:04

I have used Shadow-cljs with Atom/Chlorine just to hack on Chlorine itself -- it's a pretty slick workflow. What are you using for ClojureScript, if not Shadow-cljs?

cancandan17:08:30

Just following some tutorial with the thing called figwheel. I am not sure why there are multiple tools doing the same thing. Like lein, cli, boot. Or fighwheel, shadow

seancorfield17:08:33

Choice is good 🙂

seancorfield17:08:27

But, yes, it can be a bit overwhelming as a beginner. I started with Clojure and Leiningen about a decade ago -- Leiningen was the only "build" tool available. We switched to Boot in 2015 at work because we needed more customization/programmability than Leiningen really offered. We switched to the new Clojure CLI / deps.edn in late 2018 because we were running into bugs / quirks in Boot due to our codebase being so large (it's over 100k lines today).

noisesmith17:08:29

Every tool that can be created in one weekend of free time has alternatives. It turns out with clojure you can do a lot in a weekend.

seancorfield17:08:47

Back when we tried ClojureScript at work -- 2014/2015 -- there really weren't good tools at all around cljs. It was all very fragile and quite frustrating. We built a proof of concept in Om, then rebuilt it in Reagent, then just gave up on cljs. I think we'd consider it now for some new projects, and use Shadow-cljs as the tooling -- with Reagent / re-frame / etc. But we went with JS for our front end (React.js, Redux, Immutable.js, etc).

cancandan17:08:55

Hmm, so lein is unfixable?

cancandan17:08:10

I dunno, one selling point of cljs for me is shielding me from web development bs a bit.

noisesmith17:08:30

lein works, it's also slower to start, more complex, and monolithic than many people desire

noisesmith17:08:02

clojure doesn't have nearly as much fashion / marketing driven churn as js does

noisesmith17:08:19

cljs is a bit less stable, but it's really not even in the same league

noisesmith17:08:03

lein -> deps.edn migrations can be done, and are not painful and messy

seancorfield17:08:46

Leiningen is still the most popular tooling today but there's a shift toward using the new (and "official", i.e., from Cognitect) CLI / deps.edn tooling, especially for new projects.

seancorfield17:08:54

See https://clojure.org/news/2020/02/20/state-of-clojure-2020 for analysis of various trends (that survey is run every year so it's interesting to look back a few years too, and see what has changed).

valtteri17:08:14

I learned emacs with Clojure. I agree it was a steep hill to climb but I think it eventually paid off.

micah68817:08:50

Anyone using Sublime Text? That’s my current (non-Clojure) editor, and I think I’m gonna have to switch.

flowthing10:08:59

I’ve been working on Clojure package for Sublime Text for the past couple of months: https://github.com/eerohele/tutkain

flowthing10:08:09

It remains very much under development, but I’m currently using it as my daily driver at my day job. If you can live with some breaking changes and very little documentation, you could give it a try. If you come across any issues, feel free to message me.

flowthing10:08:32

Or even if you just need help getting started with it.

flowthing10:08:55

I’d love to have someone besides me trying it out.

noisesmith17:08:49

@micah688 I can't vouch for it personally, but coworkers have used sublime-repl

noisesmith17:08:26

I have a hunch that sublime might be behind the state of the art with tooling, but as a beginner and if you stick to lein at first, it should work as described

jcronk05217:08:50

I’ve got a newbie question about holding onto your head - here’s the situation: I have a 900MB CSV file with a little over 3 million records. This file isn’t valid CSV because it’s coming from a mainframe report that’s sending a header and footer, a la ************ START ************ and so on. Also, the CSV file has headers, but they repeat in the file at the places it needs to be split. I managed to get it to work with data.csv, but I had to set my max heap size to 10GB, which makes me wonder if I’m holding onto a reference. This is the most recent thing I’ve tried:

(def comm-count
  (atom 0))

(defn group-id
  [row]
  (second row))

(defn commission-count
  [row]
  (if (= (group-id row) "GROUP") ; this is a header row, so it starts a new commissions report
    (swap! comm-count inc)
    @comm-count))

(defn by-agent
  [coll]
  (partition-by commission-count coll))

(defn write-file
  "Using the group ID as the filename, write out a commissions report as CSV"
  [coll]
  (let [[header & data :as table] coll
        group (-> data first group-id)]
    (with-open [w (io/writer (str group ".csv"))]
      (csv/write-csv w table))))

(defn process-file
  "Take a file, remove the first and last lines, group by agent, then write out a file for each agent"
  [file]
  (with-open [r (io/reader file)]
    (let [agents (->> r csv/read-csv butlast rest by-agent)]
      (dorun (map write-file agents)))))
I don’t like having to use an atom, but I can’t figure out another way to divide the file by header rows. I’m not sure whether the memory usage indicates that I’m holding onto a reference, or if it’s just the fact that the largest partition is about 19,000 records. Is there a better way to do this? I’m completely stuck.

smith.adriane18:08:51

I think butlast realizes the whole sequence

smith.adriane18:08:25

(def 
 ^{:arglists '([coll])
   :doc "Return a seq of all but the last item in coll, in linear time"
   :added "1.0"
   :static true}
 butlast (fn ^:static butlast [s]
           (loop [ret [] s s]
             (if (next s)
               (recur (conj ret (first s)) (next s))
               (seq ret)))))

noisesmith18:08:32

wow, great catch, with the existence of drop-last, I wonder why you'd even use butlast

noisesmith18:08:22

honestly I find it really surprising that clojure has a core sequence operation that could be lazy but just... isn't

jcronk05218:08:58

Wow, I replaced butlast with drop-last and it immediately sped up by a LOT. 😮 Thanks!

noisesmith17:08:18

small thing, you can replace (dorun (map write-file agents)) with (run! write-file agents) - doesn't address your issue though

noisesmith18:08:59

I'm not seeing any obvious places where you are holding onto the head - btw. this might be a good thing to move over to #code-reviews

jcronk05218:08:22

Thanks! I didn’t know there was a code reviews channel - I’ll put that in my channels list.

mario.cordova.86220:08:23

When should you use partial as opposed to a function that takes in arg and returns a new function using that arg?

mario.cordova.86220:08:37

For example:

(defn parse-term [index term-name value]
  (str term-name value index))

;; Using function above
(map-indexed (fn [index coll]
               (map (partial parse-term index) (keys coll) (vals coll))) coll)


(defn parse-term [index]
  (fn [term-name value]
    (str term-name value index)))

;; using function above
(map-indexed (fn [index coll]
               (map (parse-term index) (keys coll) (vals coll))) coll)

noisesmith20:08:49

I have seen the second version, but never seen evidence it's useful

noisesmith20:08:26

the exception to this is if you can pre-calculate something the fn would have to calculate twice - clojure doesn't do that partial evaluation for you, so it makes sense there

noisesmith20:08:44

eg.

(defn parse-terms [index]
  (let [offset (expensive-calculation index)]
     (fn [term-name value]
       (str offset term-name value index))))

mario.cordova.86220:08:04

hmm so generally speaking using partial is the way to go?

smith.adriane20:08:10

using the anonymous function syntax is also common:

(map-indexed (fn [index coll]
               (map #(parse-term index %1 %2) (keys coll) (vals coll))) coll)

mario.cordova.86220:08:11

Thats what I had originally but wanted to avoid the anonymous fn and its % args so I figured to use partial. But I believe that partial returns an anonymous function as well so no real difference. Just style I guess

noisesmith20:08:58

right - I always prefer partial over #(f x %1 %2) because that's what partial is for

noisesmith20:08:12

it's the weaker tool, therefore better

noisesmith20:08:31

weaker / more specialized

smith.adriane20:08:39

I'm inferring that the outer coll is a collection of maps. if that's the case, I would probably write something like:

(for [[i m] (map-indexed vector ms)]
  (for [[k v] m]
    (parse-term i k v)))

smith.adriane20:08:39

I'm inferring that the outer coll is a collection of maps. if that's the case, I would probably write something like:

(for [[i m] (map-indexed vector ms)]
  (for [[k v] m]
    (parse-term i k v)))

noisesmith20:08:51

the difference is that instead of one coll of terms per m, you get a continuous lazy-seq from all m's - so it's not identical, and there is a usage for each (I really did mean my question as a question haha)

smith.adriane20:08:03

so I think the two for loop version is most similar to original example?

noisesmith20:08:12

yes (though I like to avoid the term "for loop" in the #beginners channel)

noisesmith20:08:59

we all know for isn't C / Java etc. for, a beginner often doesn't

smith.adriane20:08:36

oh, good point

noisesmith20:08:29

I wouldn't even mention it, but the frequency with which "why doesn't my for loop do anything" comes up gives me pause 😄

alexmiller20:08:57

I always prefer (fn ...) or #() over partial :)

noisesmith20:08:59

couldn't that be a single for?

smith.adriane20:08:19

actually, I don't know. I think I've confused myself

mario.cordova.86220:08:24

I tried that and it worked too

alexmiller20:08:32

it should :)

mario.cordova.86220:08:58

I think the second iteration of the for loop actually works better in my case since it avoids a call to flatten

mario.cordova.86220:08:09

Thanks guys! 👍

noisesmith20:08:25

@mario.cordova.862 a good rule of thumb is to always replace flatten with apply concat or use mapcat or for in the first place, I've never used flatten in a decade of using Clojure professionally

noisesmith20:08:10

it's overly aggressive and leads to weird bugs if you change what's in your collection (which happens a lot)

nfedyashev20:08:26

What could be a good way to rewrite this code? The goal is to minimize compute(reuse data instead of computing it again) and to keep code as simple as possible. Ideally without examining very complex logic and proper sequence of computes and passing lots of arguments around. I was thinking of something like per-request core cache(some weird cache key names?) / memoization(doesn't seem to possible because compute result is cached infinitely).

(defn compute-metric1 []
  (Thread/sleep 1000)
  42)

(defn compute-metric2 [metric1]
  (Thread/sleep 1000)
  (+ metric1 24))

(defn compute-metric3 [metric1 metric2]
  (+ metric1 metric2 + 1))

(defn page [request]
  (let [metric1 (compute-metric1)
        metric2 (compute-metric2 metric1)]
    {:status 200
     :body {:metric1 metric1
            :metric2 metric2
            :metric3 (compute-metric3 metric1 metric2)
            ;; + a couple dozens more metrics like these
            }})
Or perhaps core.async would be a better solution?

noisesmith20:08:03

what decides the metric needs recalculation - is it a question of time?

nfedyashev20:08:18

yes, I can't cache it for long time, only per request

dpsutton20:08:40

as far as i can tell you're only computing them once per request?

dpsutton20:08:01

and if you only want to cache it per request, and you're only computing it once, there's nothing to do

nfedyashev20:08:54

sorry, what I meant is that it needs to be computed only once but this function might be called 12 times in other metric functions

noisesmith20:08:35

you can attach any key you like to a request

noisesmith20:08:13

make a middleware that calculates the metrics (or better yet creates delays that are forcible but will cache), attach to the request object, and use them from the other functions

noisesmith20:08:38

there's other middleware that does similar with eg. db connections

noisesmith20:08:42

common pattern

nfedyashev20:08:38

that's interesting. Thank you. I'll try it

noisesmith21:08:12

(defn wrap-metrics
  [handler]
  (fn [request]
    (handler (assoc request :metrics {:metric1 (delay (compute-metric-1))
                                      :metric2 (delay (compute-metric-2))
                                      ...}))))
(defn wrap-log-metric1
   [handler]
   (fn [request]
     (log :something @(get-in request [:metrics :metric1]))
     (handler request)))
etc.

smith.adriane21:08:37

if each request is only processed within a single thread, you could try something like:

(def ^:dynamic *compute-cache* nil)
(defn cached-fn
  [f]
  (fn [& args]
    (if-let [e (find *compute-cache* args)]
      (val e)
      (let [ret (apply f args)]
        (set! *compute-cache* (assoc *compute-cache* args ret))
        ret))))

(defmacro with-cache [& body]
  `(binding [*compute-cache* {}]
     [email protected]))

(defn compute-metric1 []
  (Thread/sleep 1000)
  42)
(defn compute-metric2 [metric1]
  (Thread/sleep 1000)
  (+ metric1 24))
(defn compute-metric3 [metric1 metric2]
  (+ metric1 metric2 + 1))

(def compute-metric1-cached (cached-fn compute-metric1))
(def compute-metric2-cached (cached-fn compute-metric2))
(def compute-metric3-cached (cached-fn compute-metric3))

(defn page [request]
  (with-cache
    (let [metric1 (compute-metric1-cached)
          metric2 (compute-metric2-cached metric1)]
      {:status 200
       :body {:metric1 metric1
              :metric2 metric2
              :metric3 (compute-metric3-cached metric1 metric2)
              ;; + a couple dozens more metrics like these
              }})))

noisesmith21:08:01

with a middleware you could just use binding instead of set! which I think is more functional

noisesmith21:08:12

but also, you already have the request object, which is idiomatically used to attach data that other middleware or the handler would use, so I find dynamic bindings to be a less desirable alternative

smith.adriane21:08:52

I don't think binding would work for some call trees

smith.adriane21:08:03

since it undoes the changes when the binding goes out of scope

smith.adriane21:08:06

given: f1 -> f3 f2 -> f3 f3 would be computed twice if you use binding rather than set!

smith.adriane21:08:15

I do agree that if you want a per request cache, attaching the cache to the request makes a lot of sense

smith.adriane21:08:57

the other issue with my implementation is that the compute functions have to call the cached versions of the other compute functions

noisesmith21:08:28

oh - right, I missed the detail that you put all the values inside one dynamic var

smith.adriane21:08:12

this could probably be turned into some kind of middleware