This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-06-20
Channels
- # beginners (94)
- # boot (8)
- # cider (21)
- # cljs-dev (3)
- # cljsjs (5)
- # cljsrn (10)
- # clojure (167)
- # clojure-italy (4)
- # clojure-norway (1)
- # clojure-russia (9)
- # clojure-spec (25)
- # clojure-uk (29)
- # clojurescript (20)
- # cursive (12)
- # datomic (55)
- # emacs (10)
- # fulcro (16)
- # graphql (1)
- # hoplon (18)
- # lein-figwheel (30)
- # off-topic (259)
- # onyx (8)
- # other-languages (13)
- # re-frame (1)
- # reagent (62)
- # ring (8)
- # ring-swagger (28)
- # shadow-cljs (187)
- # spacemacs (15)
- # specter (2)
- # testing (12)
- # tools-deps (38)
@alexmiller Hi Alex, great work with tools.deps! It's a game changer. I just wanted to say hi and thanks! 🙏
I love the authentication via settings.xml for private repos. This is the Maven way. Sometimes it seems that all roads lead to Maven 🙂
does anyone know of tooling/libraries for live editor validation of edn against a schema/spec? Looking for something to guide others (non-developers) writing pretty extensive configuration files. Basically looking for something akin to xsd / xml autocomplete and error highlighting
Hello. I heard I should stop using pmap
for parallel mapping of "large" data streams. I'd like to perform database inserts on each partition of my data (csv, streamed over network), then aggregate the results.
Is pmap
unable to throttle before running out of resources?
pmap
has a fixed degree of parallelism, by default based on the number of cores. Only that many calls will be in flight at once.
@andrew.sinclair let me take this opportunity to promote a library a coworker
of mine wrote for problems like this: https://github.com/purefnorg/sentenza
Interesting, thank you! I'll take a look
If I understand... Since pmap
only has a number of threads in flight related to your number of cores, then that sounds like it scales at any size of data, no?
I can't remember what the default factor is... might be number of cores * 2 or + 2 maybe. Anyway N threads. That's the number of threads that pmap will use
That aligns with what I thought initially. I heard cores + 2
Then someone was trying to convince me otherwise. Probably a misunderstanding. Thanks for your help
(+ 2 (.. Runtime getRuntime availableProcessors))
core.reducers
or core.async
will let you easily tune parallelism without needing to manage threads yourself.
I think core.reducers
looks right for my scenario. I didn't see how to do a parallel map followed by reduce with core.async
. Is there a simple way?
another library worth checking out: https://github.com/reborg/parallel
Perhaps you could try folding and see if work-stealing plays a difference (in case your partitions are highly variable in timings):
(require '[clojure.core.reducers :as r])
(r/fold reduce-f (r/map process-f vector-input))
where reduce-f
is the way you aggreate, process-f
is pulling and storing data and vector-input
contains your partitioning
I would use core.async/thread
for each parallel operation then a regular clojure.core/reduce
over the channels returned by each core.async/thread
call
Yes, I think that would work, thank you
also, if using core.async/thread, you can start N threads listening to a shared input channel and writing to a shared output to have parallelism N
just don't do IO on go blocks themselves, see #core-async if you want to discuss that possibility in more depth
I don't think there is a reduce equivalent for reduce, but core.reducers definitely does
@andrew.sinclair pmap
concurrency is bound to the chunk size (usually 32 for vectors, ranges etc.). True concurrency is then bound to the number of cores for which those 32 threads compete. Note that if any of those threads takes much longer, it blocks processing of current chunk until done.
it's more complicated than that - you get chunk-size+cores+2 in an empirical test - though of course for most of use 32 as the common chunk size well outsizes the number of cores available
the pmap parallelism
you know you're going to get up to N max parallelism where N depends on chunk size and cores. For most problems where I use it, it is all I need to know
right, all I meant was that the total isn't just chunk-size, it's chunk-size+cores+2, then hard limited by actual cores at the lowest level. not a super profound point, smothered by miscommunication of that point
So I'm thinking whether a clojure on go-lang would actually solve a real problem, one being memory footprint. My clojure microservices started really high. Is there a reason to believe clj-on-go will consume less memory?
too much is one aleph (netty) server doing mongodb queries or http calls to other services, taking 400MB
Doesn't sound like much but I'd like to replicate some of them and have many more microservices like it
this may be helpful: https://github.com/clojure-goes-fast/clj-memory-meter
I’ve never been sensitive to memory usage below 2-3gb because memory is cheap enough, but as you said, if you’re trying to get that number down profiling is a good place to start.
I think a clojure implementation on go only because of memory footprint is kind of overkill
Even more so now that there's graalvm which can compile clojure code to native
@yonatanel if low memory footprint is what you're after, I would go for running it on Node - you'll get both low footprint and advanced GC
fennel lang is also a good option too depending on what you want to do (it can run on micro-controllers), might not be the best option for microservices tho
Anyone who wants to submit a talk proposal for Clojure eXchange 2018, the link is here - just scroll up a bit to find the Google Form embedded in the page: https://skillsmatter.com/conferences/10459-clojure-exchange-2018#get_involved
@alexmiller I don't understand how n
is the degree of parallelism for pmap
here: https://stackoverflow.com/a/5022838/864684
AFAICT it's only used for the last n
elements of coll
.
n ends up controlling the lazy consumption forcing the futures to spawn
I recall a week ago we concluded that wasn't the case and that chunking was the thing controlling the degree of parallelism. Or was that not the case?
you end up with chunk (if any) plus
(plus n)
Would it be possible to deliver a zip file over HTTP without persisting the zip on the disk? i.e. make a connection over ring and write the contents of the zip file by file ?
if you mean sort of streaming it without even keeping it entirely in memory, that is possible but kind of annoying with normal ring
You'll want to use https://docs.oracle.com/javase/8/docs/api/index.html?java/util/zip/ZipOutputStream.html
if you mean, generate the zip data in memory completely without ever writing it to disk and then serve that up, that is very straightforward
so you compare that to the size of the heap on jvm * how many you will be generating concurrently
so you will want to stream the results of creating the zip out, which is more complicated, but basically you use a piped stream, I think ring has a utility function for helping with this sort of thing
a piped stream gives you an inputstream connected to an output stream, so while you write zip data to the outputstream, ring reads it from the inputstream and serves it
I hate that most functions in clojure.* are totally fine with nil
but some others aren't
str/blank?
, str
itself and str/join
are OK with nil
but str/split
and str/trimr
aren't
using first
and second
with nil
is OK but using key
and val
with nil
(insted of a MapEntry) is not
at least we have some->
I guess...
@joelsanchez The clojure.string
functions are pretty explicit about generally not allowing nil
-- see the namespace docstring.
I built a version of clojure.string
that accepted nil
and treated it as ""
everywhere -- it was noticeably slower than the clojure.string
original. So there's definitely a performance trade off.
yes, now I realize that in clojure.string
most fns throw a NPE
what about MapEntry? guess people don't use key and val that much? this isn't that important since one can simply use first and second
I use key and val a lot but never in a context where my map entry could be nil - only while iterating a map
makes total sense...
And, again, you trade off performance for "safety" with key
/`val` c.f. first
/`second`.
If you know you have a map (and therefore valid MapEntry
items), you can gain speed by using key
/`val` -- precisely because they don't do any nil
handling.
In general, Clojure takes an approach that some folks categorize as "garbage-in, garbage-out" where nil
has no valid semantics, in order to perform better.
Coming from a C/C++ background, this is natural to me -- we called it "Undefined Behavior": when the semantics were only specified for a subset of all possible inputs.
I was under the impression that key
/ val
were faster, thanks for confirming my suspicions
latest github commit on clojure is due to Feb 9. Is the development of clojure over? Or are people working on “secret” repo and then push it to github?
@the2bears it’s DEAD 😫 No pulse since Feb 9 😄
isn't it 22 days ago? https://github.com/clojure/clojure
or is this some kind of meme I don't know about?
clojure.core dev doesn't happen publicly, it never has -- see comments below
you can see the latest activity in jira, latest patch added 1h ago https://dev.clojure.org/jira/browse/CLJ
@denisgrebennicov clojure.core isn't really developed in an open source style - they don't accept PRs, they don't want people to go implement features and present them to their team
They do sometime accept patches (not PRs, but patches) for bug fixes and performance enhancements, but the rate of such acceptances has slowed down in the last 2 years or so. It has been about 8 years since new features developed by others were added to the core of Clojure. Of course, with macros many features can be added using those, or normal functions.
clojure dev does happen publicly - every ticket is in jira, every patch is in jira, all commits happen on a public git repo
occasionally a larger feature is developed independently and pulled in en masse
right, I was trying to allude to that last element
thanks for the more accurate representation
that’s often more usually done b/c we expect it to be a lib (like spec)
Clojure, as a 10 year old language, develops in a shall we say “measured” pace
compared to your typical javascript github framework, that can be perceived as glacial
but have no fear, I have spent all week working on Clojure patches :)
and I expect these will become commits soonish
https://dev.clojure.org/display/community/Contributing has some pointers to jira reports and the workflow that we use.
Screened tickets are “ready for Rich” and are perhaps closest to the tip (although most of these are actually tickets for spec.alpha or core.specs.alpha atm) http://dev.clojure.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=10383
Screenable tickets are a step farther back http://dev.clojure.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=10374 - mostly waiting for Stu b/c I wrote a lot of those patches
back in my day (I’m 1000 years old), there was more than one way to do open source, rather than you kids and your centralized Microsoft version control software
I think the invitation is going to be to mow Alex's or Richs's lawn, but close enough. :rolling_on_the_floor_laughing:
ha, I’m just kidding y’all
@alexmiller Btw. any plans on upgrading Jira? I find it has improved quite a bit in 7 years 🙂
you would not believe the weeks of pain I have spent trying
I actually have a cloud jira instance that I spent a long time trying to migrate to, with Atlassian’s help but I was not successful
and at some point ran out of time to work on it
@alexmiller thanks for the extended answer 😉
jira migrations are famously nightmarish
That's shame (and I can believe quite a bit, I've maintained some modded phpbb instances)
and ours is old enough that it actually requires multiple migrations across various traumatic versions
Sounds painful, but then I've heard stories of possibly even more painful experiences Cisco customers have had trying to upgrade software on switches in running networks, so I guess I shouldn't throw stones too quickly at Atlassian.
Hmm, “Code/concurrency are painful” -> Clojure. “Databases are painful” -> Datomic. “Jira is painful” -> …?
The new centralized Microsoft version control software on the block --> Github
(or perhaps Github is what Alex was referring to in his earlier use of that phrase?)
that's how I read the joke yes
@alexmiller but what is the benefit of sharing the patches all around instead of having feature branches and letting people to check out that branch? you can delete branches anytime (e.g. after merge) Or do I miss smth?
https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-you-dont-accept-pull-requests
pull requests are github only, patches are portable
It's been discussed to death on the mailing list for years. It usually comes up once a year, sometimes more, sometimes less.
https://gist.github.com/reborg/dc8b0c96c397a56668905e2767fd697f#why-you-dont-accept-pull-requests - this is a good resource
Unfortunately, we can't turn off PRs on GitHub, only Issues. Which means a lot of people who don't bother to read the CONTRIBUTING.md doc in every Contrib repo go ahead and submit PRs anyway 😞
linked directly to the answer about PRs
@noisesmith thanks 😅
Last question on the contributing FAQ here, with a link to an older Google groups discussion thread on the topic if you feel inclined to dig: https://dev.clojure.org/display/community/Contributing+FAQ
https://clojure.org/community/contributing is the updated version
(of the contributing page -- which has moved from /contributing
which a lot of places still link to)
e.g., the CONTRIBUTING.md docs in the repos.
I don't think Rich is trying to prove to anyone that patches are superior and everyone should use them -- it is his preference for Clojure core development.
in principle I think using a feature of an open source tool (patch files), over a feature of a closed source service (prs in github) is always valid
As a maintainer of several Contrib libraries, I've begun to find the workflow with patches to be more convenient for review and testing than GitHub's PR system, to be honest. And that actually surprised me.
there's a whole ecosystem of things you can use with patches (you can attach them to emails, you can save them in a folder, you can grep them etc.) and none of this functionality applies to a github PR
you could use links to a PR url or whatever, but the PR itself is not an object you can do something useful with
You can just add ".patch" to a PR url and you get a regular patch file. So there's that.
but the patch is not the PR - the PR is a thing inside github's closed source product
it's cool that it's that easy to get a patch though
I think the github CI integration for PRs is nice, and the jira integration as well
I personally wouldn’t mind having that and using PRs
but all of that is really not that important to me either way
I can imagine the constant barrage of "hey, why not do it this other way" to be incredibly tedious
I've found PRs to be a good teaching tool, but I can see how you just want things out of your way once a team is really going on something
I find that Jira and patch stuff anyway takes so small part of time of solving a issue that it doesn't matter really, though I guess there are smaller issues where it could matter
tickets and prs can act as a denial of service attack on your attention and prioritization
we try to drive from intention instead
we’re not always successful at that and sometimes we’re too successful :)
it’s hard to find the balance
the reality is that most stuff in Clojure works and very few things need to be added “in the box”
Hey does anyone have suggestions for determining what does / does not work in cljc
files ? For example clojure.core/format
doesn't appear to work in Clojurescript. Is there any way to know this without experience?
I don't do much with clojurescript, but in generally, spending some time with the clojure and clojurescript jira, I would say you have to be very careful
there are many things that work in clojure, that work differently or break in clojurescript
case for example behaves differently in clojurescript if the cases aren't all keywords or numbers
core.async's clojurescript version of the go macro has trouble sometimes analyzing code because clojurescript macros can expand to just a chunk of javascript code, which core.async doesn't know how to pull apart and turn in to a state machine
I would say the vast majority of things work exactly the same
I don't think there is any objective way to quantify the amount and degree to which clojure and clojurescript are different, but I have formed the impression that every place I have cared to looked in to it, I have found at least edge cases where the behavior doesn't match. I understand that some people are comfortable with the argument that the differences lie in corner cases which don't matter, but I would quote perlis "Programmers are not to be measured by their ingenuity and their logic but by the completeness of their case analysis."