clojure 2018-06-20 | Slack Archive

@alexmiller Hi Alex, great work with tools.deps! It's a game changer. I just wanted to say hi and thanks! 🙏

I love the authentication via settings.xml for private repos. This is the Maven way. Sometimes it seems that all roads lead to Maven 🙂

rutledgepaulv02:06:34

does anyone know of tooling/libraries for live editor validation of edn against a schema/spec? Looking for something to guide others (non-developers) writing pretty extensive configuration files. Basically looking for something akin to xsd / xml autocomplete and error highlighting

andrew.sinclair04:06:33

Hello. I heard I should stop using pmap for parallel mapping of "large" data streams. I'd like to perform database inserts on each partition of my data (csv, streamed over network), then aggregate the results.

andrew.sinclair04:06:03

Is pmap unable to throttle before running out of resources?

mg04:06:37

pmap has a fixed degree of parallelism, by default based on the number of cores. Only that many calls will be in flight at once.

mg04:06:01

@andrew.sinclair let me take this opportunity to promote a library a coworker

mg04:06:19

of mine wrote for problems like this: https://github.com/purefnorg/sentenza

andrew.sinclair04:06:18

Interesting, thank you! I'll take a look

andrew.sinclair04:06:44

If I understand... Since pmap only has a number of threads in flight related to your number of cores, then that sounds like it scales at any size of data, no?

mg04:06:38

It will have a fixed degree of parallelism irrespective of data size

mg04:06:38

I can't remember what the default factor is... might be number of cores * 2 or + 2 maybe. Anyway N threads. That's the number of threads that pmap will use

andrew.sinclair04:06:15

That aligns with what I thought initially. I heard cores + 2

andrew.sinclair04:06:12

Then someone was trying to convince me otherwise. Probably a misunderstanding. Thanks for your help

eriktjacobsen05:06:02

(+ 2 (.. Runtime getRuntime availableProcessors))

eriktjacobsen05:06:08

core.reducers or core.async will let you easily tune parallelism without needing to manage threads yourself.

andrew.sinclair05:06:50

I think core.reducers looks right for my scenario. I didn't see how to do a parallel map followed by reduce with core.async. Is there a simple way?

sundarj07:06:05

another library worth checking out: https://github.com/reborg/parallel

reborg09:06:28

Perhaps you could try folding and see if work-stealing plays a difference (in case your partitions are highly variable in timings):

(require '[clojure.core.reducers :as r])
(r/fold reduce-f (r/map process-f vector-input))

reborg09:06:19

where reduce-f is the way you aggreate, process-f is pulling and storing data and vector-input contains your partitioning

bjr12:06:20

I would use core.async/thread for each parallel operation then a regular clojure.core/reduce over the channels returned by each core.async/thread call

bjr12:06:14

that’s assuming multiple parallel db inserts makes the most sense, etc

andrew.sinclair15:06:30

Yes, I think that would work, thank you

noisesmith16:06:07

also, if using core.async/thread, you can start N threads listening to a shared input channel and writing to a shared output to have parallelism N

noisesmith16:06:36

just don't do IO on go blocks themselves, see #core-async if you want to discuss that possibility in more depth

gleenn06:06:04

I don't think there is a reduce equivalent for reduce, but core.reducers definitely does

gleenn06:06:23

Reduce equivalent to pmap

reborg08:06:43

@andrew.sinclair pmap concurrency is bound to the chunk size (usually 32 for vectors, ranges etc.). True concurrency is then bound to the number of cores for which those 32 threads compete. Note that if any of those threads takes much longer, it blocks processing of current chunk until done.

noisesmith16:06:58

it's more complicated than that - you get chunk-size+cores+2 in an empirical test - though of course for most of use 32 as the common chunk size well outsizes the number of cores available

reborg17:06:47

What is more complicated?

noisesmith17:06:47

the pmap parallelism

reborg17:06:18

you know you're going to get up to N max parallelism where N depends on chunk size and cores. For most problems where I use it, it is all I need to know

noisesmith17:06:44

right, all I meant was that the total isn't just chunk-size, it's chunk-size+cores+2, then hard limited by actual cores at the lowest level. not a super profound point, smothered by miscommunication of that point

reborg17:06:39

thanks, yeah, I said uncorrectly

yonatanel11:06:02

So I'm thinking whether a clojure on go-lang would actually solve a real problem, one being memory footprint. My clojure microservices started really high. Is there a reason to believe clj-on-go will consume less memory?

bjr12:06:07

what’s in memory?

bjr12:06:48

and how much memory is too much?

yonatanel12:06:50

too much is one aleph (netty) server doing mongodb queries or http calls to other services, taking 400MB

yonatanel12:06:46

Doesn't sound like much but I'd like to replicate some of them and have many more microservices like it

bjr13:06:21

large responses? unnecessary json parsing and keyword creation is a common memory hog

👍 4

bjr13:06:12

this may be helpful: https://github.com/clojure-goes-fast/clj-memory-meter

bjr13:06:12

I’ve never been sensitive to memory usage below 2-3gb because memory is cheap enough, but as you said, if you’re trying to get that number down profiling is a good place to start.

andre.stylianos12:06:49

I think a clojure implementation on go only because of memory footprint is kind of overkill

andre.stylianos12:06:00

Even more so now that there's graalvm which can compile clojure code to native

val_waeselynck12:06:04

@yonatanel if low memory footprint is what you're after, I would go for running it on Node - you'll get both low footprint and advanced GC

mpenet12:06:31

fennel lang is also a good option too depending on what you want to do (it can run on micro-controllers), might not be the best option for microservices tho

yonatanel12:06:57

Perhaps the first reply should be to do some profiling

😁 8

👌 4

sobel15:06:31

my shortcut for pmap is to think of it as good for compute tasks but not for waiting

sobel15:06:50

"going parallel" has at least those two major flavors

maleghast16:06:24

Anyone who wants to submit a talk proposal for Clojure eXchange 2018, the link is here - just scroll up a bit to find the Google Form embedded in the page: https://skillsmatter.com/conferences/10459-clojure-exchange-2018#get_involved

xiongtx18:06:30

@alexmiller I don't understand how n is the degree of parallelism for pmap here: https://stackoverflow.com/a/5022838/864684 AFAICT it's only used for the last n elements of coll.

noisesmith18:06:57

n ends up controlling the lazy consumption forcing the futures to spawn

xiongtx18:06:19

I recall a week ago we concluded that wasn't the case and that chunking was the thing controlling the degree of parallelism. Or was that not the case?

noisesmith18:06:10

you end up with chunk (if any) plus

noisesmith18:06:16

(plus n)

jdkealy18:06:55

Would it be possible to deliver a zip file over HTTP without persisting the zip on the disk? i.e. make a connection over ring and write the contents of the zip file by file ?

hiredman18:06:11

yes

hiredman18:06:46

if you mean sort of streaming it without even keeping it entirely in memory, that is possible but kind of annoying with normal ring

Alex18:06:55

You'll want to use https://docs.oracle.com/javase/8/docs/api/index.html?java/util/zip/ZipOutputStream.html

hiredman18:06:24

if you mean, generate the zip data in memory completely without ever writing it to disk and then serve that up, that is very straightforward

jdkealy18:06:02

well... the zips are pretty huge, would memory be a concern ?

hiredman18:06:25

it depends how much heap the jvm has

jdkealy18:06:30

like 10GB zips

jdkealy18:06:45

i think i'm giving it 2GB in jvm opts

hiredman18:06:54

so you compare that to the size of the heap on jvm * how many you will be generating concurrently

jdkealy18:06:30

right... so that sounds like a no go ?

hiredman18:06:24

so you will want to stream the results of creating the zip out, which is more complicated, but basically you use a piped stream, I think ring has a utility function for helping with this sort of thing

jdkealy18:06:52

cool, thanks, piped-stream i'll check that out

hiredman18:06:02

a piped stream gives you an inputstream connected to an output stream, so while you write zip data to the outputstream, ring reads it from the inputstream and serves it

jdkealy18:06:47

awesome, that sounds like it could work

joelsanchez19:06:14

I hate that most functions in clojure.* are totally fine with nil but some others aren't str/blank?, str itself and str/join are OK with nil but str/split and str/trimr aren't using first and second with nil is OK but using key and val with nil (insted of a MapEntry) is not

joelsanchez19:06:38

at least we have some-> I guess...

yorggen19:06:25

you could wrap them up in fnil to make them work as you would like them to

seancorfield19:06:24

@joelsanchez The clojure.string functions are pretty explicit about generally not allowing nil -- see the namespace docstring.

seancorfield19:06:09

I built a version of clojure.string that accepted nil and treated it as "" everywhere -- it was noticeably slower than the clojure.string original. So there's definitely a performance trade off.

👍 4

joelsanchez19:06:13

yes, now I realize that in clojure.string most fns throw a NPE

joelsanchez19:06:51

what about MapEntry? guess people don't use key and val that much? this isn't that important since one can simply use first and second

noisesmith19:06:17

I use key and val a lot but never in a context where my map entry could be nil - only while iterating a map

joelsanchez19:06:36

makes total sense...

seancorfield19:06:23

And, again, you trade off performance for "safety" with key/`val` c.f. first/`second`.

seancorfield19:06:04

If you know you have a map (and therefore valid MapEntry items), you can gain speed by using key/`val` -- precisely because they don't do any nil handling.

seancorfield19:06:54

In general, Clojure takes an approach that some folks categorize as "garbage-in, garbage-out" where nil has no valid semantics, in order to perform better.

seancorfield19:06:55

Coming from a C/C++ background, this is natural to me -- we called it "Undefined Behavior": when the semantics were only specified for a subset of all possible inputs.

👍 4

joelsanchez20:06:10

I was under the impression that key / val were faster, thanks for confirming my suspicions

Denis G20:06:11

latest github commit on clojure is due to Feb 9. Is the development of clojure over? Or are people working on “secret” repo and then push it to github?

the2bears20:06:03

Seems it's abandoned 😉

Denis G20:06:55

@the2bears it’s DEAD 😫 No pulse since Feb 9 😄

joelsanchez20:06:13

isn't it 22 days ago? https://github.com/clojure/clojure

joelsanchez20:06:33

or is this some kind of meme I don't know about?

Denis G20:06:48

yeah, but before it was in Feb 9

noisesmith20:06:51

~~clojure.core dev doesn't happen publicly, it never has~~ -- see comments below

joelsanchez20:06:04

you can see the latest activity in jira, latest patch added 1h ago https://dev.clojure.org/jira/browse/CLJ

noisesmith20:06:18

@denisgrebennicov clojure.core isn't really developed in an open source style - they don't accept PRs, they don't want people to go implement features and present them to their team

andy.fingerhut20:06:48

They do sometime accept patches (not PRs, but patches) for bug fixes and performance enhancements, but the rate of such acceptances has slowed down in the last 2 years or so. It has been about 8 years since new features developed by others were added to the core of Clojure. Of course, with macros many features can be added using those, or normal functions.

Alex Miller (Clojure team)20:06:13

clojure dev does happen publicly - every ticket is in jira, every patch is in jira, all commits happen on a public git repo

Alex Miller (Clojure team)20:06:30

occasionally a larger feature is developed independently and pulled in en masse

noisesmith20:06:43

right, I was trying to allude to that last element

noisesmith20:06:56

thanks for the more accurate representation

Alex Miller (Clojure team)21:06:00

that’s often more usually done b/c we expect it to be a lib (like spec)

Alex Miller (Clojure team)21:06:42

Clojure, as a 10 year old language, develops in a shall we say “measured” pace

Alex Miller (Clojure team)21:06:07

compared to your typical javascript github framework, that can be perceived as glacial

Alex Miller (Clojure team)21:06:57

but have no fear, I have spent all week working on Clojure patches :)

Alex Miller (Clojure team)21:06:58

and I expect these will become commits soonish

Alex Miller (Clojure team)21:06:54

https://dev.clojure.org/display/community/Contributing has some pointers to jira reports and the workflow that we use.

Alex Miller (Clojure team)21:06:39

Screened tickets are “ready for Rich” and are perhaps closest to the tip (although most of these are actually tickets for spec.alpha or core.specs.alpha atm) http://dev.clojure.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=10383

Alex Miller (Clojure team)21:06:40

Screenable tickets are a step farther back http://dev.clojure.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=10374 - mostly waiting for Stu b/c I wrote a lot of those patches

Alex Miller (Clojure team)21:06:53

etc

Alex Miller (Clojure team)21:06:27

back in my day (I’m 1000 years old), there was more than one way to do open source, rather than you kids and your centralized Microsoft version control software

😆 21

andy.fingerhut21:06:15

Shall we consider ourselves invited to remove ourselves from your lawn? 🙂

😂 12

rplevy12:06:57

#Also sent to the channel

I think the invitation is going to be to mow Alex's or Richs's lawn, but close enough. :rolling_on_the_floor_laughing:

Alex Miller (Clojure team)21:06:29

ha, I’m just kidding y’all

juhoteperi21:06:54

@alexmiller Btw. any plans on upgrading Jira? I find it has improved quite a bit in 7 years 🙂

Alex Miller (Clojure team)21:06:10

you would not believe the weeks of pain I have spent trying

Alex Miller (Clojure team)21:06:53

I actually have a cloud jira instance that I spent a long time trying to migrate to, with Atlassian’s help but I was not successful

Alex Miller (Clojure team)21:06:07

and at some point ran out of time to work on it

Denis G21:06:17

@alexmiller thanks for the extended answer 😉

Alex Miller (Clojure team)21:06:20

jira migrations are famously nightmarish

juhoteperi21:06:24

That's shame (and I can believe quite a bit, I've maintained some modded phpbb instances)

Alex Miller (Clojure team)21:06:42

and ours is old enough that it actually requires multiple migrations across various traumatic versions

andy.fingerhut21:06:55

Sounds painful, but then I've heard stories of possibly even more painful experiences Cisco customers have had trying to upgrade software on switches in running networks, so I guess I shouldn't throw stones too quickly at Atlassian.

manutter5121:06:39

Hmm, “Code/concurrency are painful” -> Clojure. “Databases are painful” -> Datomic. “Jira is painful” -> …?

😄 8

☝️ 4

mg21:06:13

Clubhouse probably 😄

bwstearns21:06:39

Taiga?

andy.fingerhut21:06:23

The new centralized Microsoft version control software on the block --> Github

andy.fingerhut21:06:08

(or perhaps Github is what Alex was referring to in his earlier use of that phrase?)

noisesmith21:06:20

that's how I read the joke yes

Denis G21:06:06

@alexmiller but what is the benefit of sharing the patches all around instead of having feature branches and letting people to check out that branch? you can delete branches anytime (e.g. after merge) Or do I miss smth?

hiredman21:06:13

https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-you-dont-accept-pull-requests

seancorfield21:06:35

@denisgrebennicov Rich prefers to work with patches, and it's his project.

noisesmith21:06:44

pull requests are github only, patches are portable

seancorfield21:06:07

It's been discussed to death on the mailing list for years. It usually comes up once a year, sometimes more, sometimes less.

noisesmith21:06:37

https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-you-dont-accept-pull-requests - this is a good resource

seancorfield21:06:49

Unfortunately, we can't turn off PRs on GitHub, only Issues. Which means a lot of people who don't bother to read the CONTRIBUTING.md doc in every Contrib repo go ahead and submit PRs anyway 😞

noisesmith21:06:52

linked directly to the answer about PRs

Denis G21:06:56

@noisesmith thanks 😅

andy.fingerhut21:06:36

Last question on the contributing FAQ here, with a link to an older Google groups discussion thread on the topic if you feel inclined to dig: https://dev.clojure.org/display/community/Contributing+FAQ

seancorfield21:06:04

https://clojure.org/community/contributing is the updated version

seancorfield21:06:28

(of the contributing page -- which has moved from /contributing which a lot of places still link to)

seancorfield21:06:39

e.g., the CONTRIBUTING.md docs in the repos.

andy.fingerhut21:06:09

I don't think Rich is trying to prove to anyone that patches are superior and everyone should use them -- it is his preference for Clojure core development.

noisesmith21:06:05

in principle I think using a feature of an open source tool (patch files), over a feature of a closed source service (prs in github) is always valid

seancorfield21:06:23

As a maintainer of several Contrib libraries, I've begun to find the workflow with patches to be more convenient for review and testing than GitHub's PR system, to be honest. And that actually surprised me.

noisesmith21:06:00

there's a whole ecosystem of things you can use with patches (you can attach them to emails, you can save them in a folder, you can grep them etc.) and none of this functionality applies to a github PR

noisesmith21:06:34

you could use links to a PR url or whatever, but the PR itself is not an object you can do something useful with

mpenet07:06:48

You can just add ".patch" to a PR url and you get a regular patch file. So there's that.

mpenet07:06:07

ex: https://github.com/lvh/caesium/pull/42.patch

noisesmith16:06:48

but the patch is not the PR - the PR is a thing inside github's closed source product

noisesmith16:06:00

it's cool that it's that easy to get a patch though

Alex Miller (Clojure team)21:06:55

I think the github CI integration for PRs is nice, and the jira integration as well

Alex Miller (Clojure team)21:06:12

I personally wouldn’t mind having that and using PRs

Alex Miller (Clojure team)21:06:08

but all of that is really not that important to me either way

hiredman21:06:44

I can imagine the constant barrage of "hey, why not do it this other way" to be incredibly tedious

mg21:06:44

I've found PRs to be a good teaching tool, but I can see how you just want things out of your way once a team is really going on something

juhoteperi21:06:02

I find that Jira and patch stuff anyway takes so small part of time of solving a issue that it doesn't matter really, though I guess there are smaller issues where it could matter

Alex Miller (Clojure team)21:06:03

tickets and prs can act as a denial of service attack on your attention and prioritization

Alex Miller (Clojure team)21:06:31

we try to drive from intention instead

Alex Miller (Clojure team)21:06:10

we’re not always successful at that and sometimes we’re too successful :)

Alex Miller (Clojure team)21:06:22

it’s hard to find the balance

Alex Miller (Clojure team)21:06:53

the reality is that most stuff in Clojure works and very few things need to be added “in the box”

leblowl22:06:17

Hey does anyone have suggestions for determining what does / does not work in cljc files ? For example clojure.core/format doesn't appear to work in Clojurescript. Is there any way to know this without experience?

hiredman22:06:15

I don't do much with clojurescript, but in generally, spending some time with the clojure and clojurescript jira, I would say you have to be very careful

hiredman22:06:39

there are many things that work in clojure, that work differently or break in clojurescript

hiredman22:06:58

case for example behaves differently in clojurescript if the cases aren't all keywords or numbers

hiredman22:06:19

core.async's clojurescript version of the go macro has trouble sometimes analyzing code because clojurescript macros can expand to just a chunk of javascript code, which core.async doesn't know how to pull apart and turn in to a state machine

leblowl22:06:17

@hiredman thanks, that's pretty interesting. I'll be careful then

Alex Miller (Clojure team)23:06:59

I would say the vast majority of things work exactly the same

hiredman23:06:38

I don't think there is any objective way to quantify the amount and degree to which clojure and clojurescript are different, but I have formed the impression that every place I have cared to looked in to it, I have found at least edge cases where the behavior doesn't match. I understand that some people are comfortable with the argument that the differences lie in corner cases which don't matter, but I would quote perlis "Programmers are not to be measured by their ingenuity and their logic but by the completeness of their case analysis."

➕ 4

rplevy12:06:57

replied to a thread:Shall we consider ourselves invited to remove ourselves from your lawn? :slightly_smiling_face:

I think the invitation is going to be to mow Alex's or Richs's lawn, but close enough. :rolling_on_the_floor_laughing:

2018-06-20

Channels