Fork me on GitHub
#clojure
<
2019-05-28
>
todo00:05:18

Is there anyway to get Scheme hygienic macros for Clojure?

hlolli01:05:33

I'm hitting a very strange bug atm, I'm using node-jre to run a (uber)jar. But somehow the resource paths are weird, in that I can only io/resource files from jar, if they are file and not folder, and the file is not nested. Not the newest jre running the jar, could that be the reason or something else? (think I solved it, had some pom.xml from totally unrelated java project lurking around)

restenb02:05:54

anybody good at compojure routing? I want to have different wrappers for my login routes vs. other routes, then combine it all into one ring handler

hlolli02:05:38

@restenb it's a bit of a hassle if you're using cemerick's friend. I hade a terrible time getting the auth sent the right way. But solveable, just combine the protected routes with public routes in the handler.

hlolli02:05:43

and make sure the session cookie isn't dissoced anywhere in some middleware

seancorfield02:05:21

@restenb You can use (routes ..) to combine multiple set of routes (since they're just handlers).

(routes
  (wrap-login (routes ... your login routes ...))
  (wrap-other (routes ... your other routes ...)))
something like that I think...

restenb02:05:05

wait, you can nest routes? doh

seancorfield02:05:03

routes is just a function that takes handlers as arguments and combines them. Compojure goes through the list of routes/handlers until one of them returns non-`nil`, which with routes like (GET "/foo" [] something) which is a handler will return nil if it doesn't match...

seancorfield02:05:51

user=> ((routes (routes (GET "/foo" [] "Get Foo") (GET "/bar" [] "Get Bar"))
                (routes (GET "/quux" [] "Get Quux") (POST "/quux" [] "Post Quux"))) {:uri "/quux" :request-method :get})
{:status 200, :headers {"Content-Type" "text/html; charset=utf-8"}, :body "Get Quux"}
user=> ((routes (routes (GET "/foo" [] "Get Foo") (GET "/bar" [] "Get Bar")) (routes (GET "/quux" [] "Get Quux") (POST "/quux" [] "Post Quux"))) {:uri "/bar" :request-method :get})
{:status 200, :headers {"Content-Type" "text/html; charset=utf-8"}, :body "Get Bar"}
user=> ((routes (routes (GET "/foo" [] "Get Foo") (GET "/bar" [] "Get Bar")) (routes (GET "/quux" [] "Get Quux") (POST "/quux" [] "Post Quux"))) {:uri "/bar" :request-method :post})
nil

seancorfield02:05:41

I didn't add any middleware into that, but hope it gives the idea? @restenb

restenb02:05:49

yeah I got the idea at your first post @seancorfield, thanks. somehow I was not seeing that (compojure/routes) just takes handlers & assembles a handler

Ahmed Hassan06:05:10

How to set Content-Security-Policy headers in Pedestal? Application is giving following errors.

markx08:05:21

How do I profile the overhead of using lazy seq? Assuming my app is slow, how do I find out if it’s because I’m using too many lazy operations?

val_waeselynck09:05:11

You can also use VisualVM

rickmoynihan09:05:04

be careful though… laziness is hard to benchmark because it also includes the cost of doing the work.

markx11:05:10

Yeah I assume it’s hard, because it’s impossible to compare it to the non-lazy version, unless we have it implemented, which is hard again, and is the very reason why we implement the lazy version in the first place.

rickmoynihan11:05:45

Yeah you really need two implementations… but a profiler can guide you to the right decision… lazy sequences are usually fast enough though; but it depends on how much data you have going through them; and how many transformations you’ve stacked ontop — and of course how fast you need it to be 🙂 A sign laziness is a problem is lots of GC activity; but that can also be caused by other things. Also if you’re holding onto the head of seq accidentally can cause a lot of memory pressure/OOMs etc.

ivana08:05:50

I add (time to some forms trying to find a bottleneck

Saikyun08:05:08

how do I eval a fn declaration and keep type hints? I can't get it to work. the expected result is that (eval generated-fn2) won't give me reflection warnings.

Saikyun08:05:28

here's the problem on http://repl.it, if someone wants to give it a go: https://repl.it/repls/ExcitedDetailedRedundancy

misha09:05:39

is this an implementation detail, or one can rely on it? (map f coll) returns "chunked lazy seq", which gets realized 32 elements at a time (map f coll1 coll2 ...) returns "proper lazy seq", when gets realized 1 element at a time

bronsa09:05:15

chunking is always an impl detail

bronsa09:05:50

(both its presence, lack of, or rate of)

misha09:05:52

(I mean "rely on the 1 at a time for 2nd arity")

bronsa09:05:01

no (you can't rely on that either)

misha09:05:38

so if I'd want the laziest seq, I should do proper lazy-seq implementation myself?

bronsa09:05:24

(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (next s))))))
is usually how it's done

misha09:05:56

(->> coll (map f) unchunk)?

misha09:05:36

(question came from reading lots of duplicated code effort here https://github.com/dakrone/clojure-opennlp/blob/master/src/opennlp/tools/lazy.clj )

bronsa09:05:40

(->> coll unchunk (map f))

bronsa09:05:47

what controls the chunking is the input collection

misha09:05:12

so in my example it is range who initiates chunks of 32?

bronsa09:05:00

it's map who decides to iterate on chunks rather than els, based on the type of the input coll

bronsa09:05:41

if you pass an "unchunkable" coll to map it will traverse it one element at a time

bronsa09:05:46

that's what unchunk does

misha09:05:00

(defn map  
([f coll]
   (lazy-seq
    (when-let [s (seq coll)]
      (if (chunked-seq? s)
        (let [c (chunk-first s)
              size (int (count c))
              b (chunk-buffer size)]
          (dotimes [i size]
              (chunk-append b (f (.nth c i))))
          (chunk-cons (chunk b) (map f (chunk-rest s))))
        (cons (f (first s)) (map f (rest s)))))))

misha09:05:27

so map just checks, but not enforces

bronsa09:05:44

(chunked-seq? (range 30)) => true (chunked-seq? (unchunk (range 30))) => false

bronsa09:05:20

the functions that "understand chunking" work by preserving the chunkiness

bronsa09:05:28

so if chunked-seq comes in, chunked-seq comes out

bronsa09:05:34

if normal seq comes in, normal seq comes out

misha09:05:26

is there a way to know which (core?) fns return chunked seqs?

misha09:05:45

"rule of thumb" or something

misha09:05:18

or rather "when would I want to return chunked seq?"

misha09:05:55

why isn't unchunk - a core fn? is it because chunks are an implementation detail?

bronsa09:05:59

there's really no good documentation on this, because you "shouldn't be thinking about it, it's an implementation detail" is the core position I think

misha09:05:17

on the other hand things might get expensive

bronsa09:05:15

¯\(ツ)

bronsa09:05:41

I can tell you that the current impl produces chunked seqs off of: - vector or their seqs - bounded integer range - Iterables that are not otherwise Seqable - the return values of sequence used with a transducer - the return values of iterator-seq

bronsa09:05:03

and most of the seq functions (`map`, filter, keep, for etc) are "chunkiness preserving", while reduce/`trasnduce` "understand" the chunkiness of its input coll

henrik12:05:04

I've used Criterium to measure performance of running some data imports on my MacBook vs. my PC (PowerShell vs. WSL), and got these results:

WSL: 9.075001 seconds
PowerShell: 5.753443 seconds
MacBook: 12.337244 seconds
Things are better on the PC, but not as good as I expected (16GB 2017 MacBook vs. 32GB AMD Ryzen 2700X PC). Upon inspecting the PC, I notice that only four cores out of eight in total are utilised. The function I'm testing is using a simple pmap to process things in parallel. Is there a limitation to pmap in only using four cores, or are there things I might need to tweak with the JVM? (Oh, and BTW, the tax paid by using WSL seems very high indeed)

henrik12:05:52

Right. Well, that is 3 threads for the one processor then. I guess I have to look into other means of parallelization.

jumar12:05:42

3 threads for one processor?

henrik12:05:07

Yes, there is one physical processor, albeit 8 cores.

henrik12:05:46

Or did I misunderstand what "processor" means in this context?

jumar12:05:48

pmap should then use up to 10 parallel threads (or 18 if you you're talking about physical cores and have hyperthreading)

jumar12:05:57

it's a logical core as seen by operating system

jumar12:05:26

typically on machines with HT, that means 2 * #physical cores + 2

henrik12:05:51

Ah, gotcha. Yes, 16 logical cores. Alright, so for some reason it only reached 4 physical cores. Something else is going on then.

jumar12:05:38

it may depend a lot on the type of operations you're doing. you might not get benefits you hope for just by running things in parallel In extreme cases, it may even be slower

henrik12:05:32

In this case it's reading fairly large XML documents, extracting data, and doing transformations. pmaping should be a pretty straightforward way of optimizing in my mind. Each document is fairly long-running.

henrik12:05:17

Maybe it's strangled by IO and I need to buffer some documents ahead of time.

jumar12:05:33

Maybe, but you may get a lot of IO or memory contention. As always, it's best to measure 🙂

👍 4
jumar12:05:27

Btw. for pmap this is the interesting line related to "semi-lazy" -> it tries to stay ahead "enough" (factor n): https://github.com/clojure/clojure/blob/clojure-1.9.0/src/clj/clojure/core.clj#L6948

jumar12:05:41

in general, Claypoole is interesting alternative to consider: https://github.com/TheClimateCorporation/claypoole

👍 4
tavistock12:05:01

that may be it, also i never end up using pmap bc they dont work with transducers and transducers are typically faster for me

henrik12:05:33

The entire thing is lazy from start to end, with no transducers mixed in. It should be realized only on the accumulation of results outside of pmap. Yep, defing the pmap expression returns immediately.

andy.fingerhut13:05:39

If you look at the source code for pmap you can see the expression it uses to determine the maximum parallelism: (+ 2 (.. Runtime getRuntime availableProcessors)) You can run that expression in a REPL on your system to see what the JVM considers that number to be.

4
andy.fingerhut13:05:27

Even if it is (+ 2 8) or 10, pmap can only give good performance increases if a couple of things are true: (a) the work done for each element of the sequence is large, compared to the work required for the JVM to create a new thread, which is what pmap does for each element. (b) Each element should take about the same amount of time to process. If they take wildly different amounts of time to complete, then pmap limits the parallelism because it does not work more than a certain amount ahead of the last unfinished element.

andy.fingerhut14:05:24

There are other libraries, e.g. using Java's ExecutorService, that try to keep N threads busy at all times, which avoids issue (b) that pmap's implementation has.

andy.fingerhut14:05:15

I have not used it myself, but I believe this library offers some bit of Clojure API around Java's capabilities, but I have heard that several people go straight to Java interop for this, too: https://github.com/TheClimateCorporation/claypoole

henrik14:05:38

Cool, thanks @andy.fingerhut (and @U06BE1L6T, @U0C7D2H1U). That might be the bottleneck in that case. The documents I process are wildly different in size. I think the minimum time I've spotted for a single document is ~40ms for an outlier. Is this small compared to the cost of setting up a thread? If so, I guess I might need to chunk them into linear bits. Even then, it wouldn't load every core evenly. I guess I could try to organise them into chunks of roughly the same size.

henrik14:05:16

Or go with a threadpool solution.

andy.fingerhut14:05:30

I suspect one could do a lot of fiddling to try to make pmap use its max parallelism as often as possible, whereas the thread pool solution would be likely to get you there with less fiddling.

andy.fingerhut14:05:44

If max parallelism was a relatively important goal for you.

henrik14:05:30

Yeah, max parallelism is by far the easiest optimization I can do, as fiddly as it is.

henrik14:05:53

I'll look into the less fiddly versions though, that seems sensible.

henrik16:05:53

claypoole and some minor refactoring sees a good improvement: 3.416247 seconds in PowerShell.

henrik16:05:30

It's now keeping all cores nearly maxed out.

👍 4
manutter5115:05:03

We were just discussing keyword parameters in #beginners , and I think I remember a conversation somewhere back a ways where people were saying there was a reason to prefer plain maps over keyword params. Can’t remember what the reason was, though. Sound familiar to anyone?

lilactown15:05:57

the reason I don't use keyword params is because you can't pass them around as data

👍 4
Alex15:05:35

Yeah, as soon as you want to composite a fn with keyword params, it's caller is responsible for splatting out the options instead of just passing in some map it might have gotten from its own caller

lilactown15:05:03

(defn foo [big-ball-of-params]
  ;; I have to peel each arg one by one off of the map
  (bar :a (:a big-ball-of-params)
       :b (:b big-ball-of-params)
       :c (:c big-ball-of-params)))

(defn bar [& {:keys [a b c]]
  (println a b c))

manutter5115:05:11

That’s what it was, composition. Tks folks.

Alex15:05:16

To be fair you can still get there with apply, but as soon as your caller is drawing the keyword args from n>1 maps your caller has a lot more to juggle.

borkdude20:05:52

isn’t destructuring a little bit too forgiving in the :or here?

(let [{:keys [:a] :or {x 1}} {}] #_[a x]) ;; this is fine
(let [{:keys [:a] :or {x 1}} {}] [a b x]) ;;=> error about x
(let [{:keys [:a :x] :or {x 1}} {}] [a x]) ;; works

seancorfield20:05:13

Macroexpand it -- see what it expands to...

borkdude20:05:25

yeah, the x in :or is ignored when it’s not part of the previous symbols

andy.fingerhut20:05:32

It is the kind of thing a linter could warn about. Eastwood has not implemented it, although I created an issue to remind developers about it a while back: https://github.com/jonase/eastwood/issues/225

andy.fingerhut20:05:19

Maybe this one is related, too: Maybe this one, too: https://github.com/jonase/eastwood/issues/157

borkdude21:05:58

(@ghadi I saw you typing for a moment, if you have feedback on this, I would be happy to hear)

ghadi21:05:06

might want to lint on (let [{:keys [patient/id order/id]} value]...)

ghadi21:05:31

GIGO (seen in #beginners)

ghadi21:05:46

similarly for :syms

borkdude21:05:39

ah I see, multiple keys with the same “name” part

ghadi21:05:57

i mean, it would never get past simple REPL testing

ghadi21:05:13

(ugh MBP keyboards)

ghadi21:05:22

unless one of the symbols was unused

borkdude21:05:53

yes, the idea of clj-kondo is that it catches errors before your REPL sees it, there is this interval in time where thoughts are transformed into sexprs into your buffer, but have not been evaluated yet

ghadi21:05:46

@borkdude unused binding?

borkdude21:05:24

that particular binding is unused, because it’s shadowed by the next one?

borkdude21:05:27

the message could maybe be clearer, I’m all ears. this is the general message you get when you don’t use a binding, e.g. in (let [x 1])

andy.fingerhut21:05:14

Maybe "duplicate binding id", if you can have a different detection and/or message for this case vs. an unused binding?

andy.fingerhut21:05:25

I don't know if you maybe already have this for clj-kondo, but a short description or message, and then a link to a place you can get more details / examples / workarounds / etc., can be useful in explaining to users what is going on.

borkdude21:05:26

the detection mechanism is the same for all “unused bindings” but I could make a few tweaks here and there. that link idea is nice, it’s the same that shellcheck does, I might implement something like that

borkdude21:05:02

(btw if you’re not using shellcheck and write bash sometimes, it’s lovely)

andy.fingerhut21:05:25

I use such a small subset of bash's capabilities, that it is self-linting 🙂

borkdude21:05:00

one thing that it should have linted but didn’t: if [ false ]; then ...

Drew Verlee22:05:07

if i reduce over a lazy seq, will the list get realized one at a time? e.g (reduce ... (for [x (range 2)])

noisesmith23:05:14

I think that document answers a slightly different question the document answers the question, I misread

jumpnbrownweasel23:05:07

But not one at a time, they're realized in chunks. I thought this was the question.

noisesmith23:05:44

right, chunking means there's no guarantee of one at a time processing

henrik05:05:01

Thanks to some helpful people here, I started using Claypoole yesterday (https://github.com/TheClimateCorporation/claypoole). I was delighted to notice that when I reduce over the results of upmap, reduce is handed a result immediately when when upmap is done with it. It doesn't wait for the entire upmap expression to be done to do so. Feels like getting streaming for free.