This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-04-04
Channels
- # announcements (2)
- # babashka (7)
- # beginners (168)
- # boot (8)
- # cider (10)
- # clara (1)
- # clj-kondo (19)
- # cljdoc (8)
- # cljs-dev (16)
- # clojars (1)
- # clojure (208)
- # clojure-europe (10)
- # clojure-germany (1)
- # clojure-losangeles (1)
- # clojure-uk (56)
- # clojurescript (63)
- # conjure (23)
- # core-typed (2)
- # cursive (5)
- # data-science (1)
- # datomic (35)
- # emacs (1)
- # exercism (58)
- # graalvm (2)
- # graphql (1)
- # jobs (3)
- # kaocha (1)
- # lambdaisland (2)
- # malli (19)
- # meander (5)
- # off-topic (2)
- # pathom (25)
- # pedestal (3)
- # reagent (53)
- # reitit (4)
- # remote-jobs (2)
- # shadow-cljs (26)
- # spacemacs (3)
- # sql (22)
- # tools-deps (17)
Hey y'all, I would like to define a spec with a corresponding custom generator to generate random probability vectors (vector of positive floats that sum to 1 where each element is between 0 and 1).
(s/def ::probability (s/and float? #(< 0 % 1))
(s/def ::probabilities (s/with-gen (s/and (s/coll-of ::probability)
#(== 1 (reduce + %)))
GEN-FUNCTION))
What custom generating function would make this a possibility? Thanks!@nicholas.charchut IEEE floating point is going to make that a near impossible bar to achieve surely?
Exactly the need for a custom generator. I was thinking to generate a vector of positive integers in a reasonable range and then divide each of them by the sum (to normalize). I can't seem to find a way to preemptively "access" the generated values to find that normalizing constant, though. Any ideas?
Hmm, good point. We have fmap for generated sequences but I think in this case you'd need to generate specific random sequences and then turn that into the generated result somehow and I'm not sure how to do that with generators.
You could probably design a function f
that when given a small integer, it created a sequence that long of fractions that added up to 1, and then fmap
that function over a small integer generator...
One way would be to generate a random sequence of positive values, e.g. perhaps all integers, then have a function that divided every element by the total of the elements. The result is guaranteed to sum to 1 (within roundoff error, for IEEE floats)
Doing it that way seems to me to give you more control of the relative size of the different elements, versus methods that might generate a bunch of values in the range [0, 1], then the first time you generate a value that would make the total go over 1, add (1-total of all earlier elements) as the last element.
The "divide by total" approach also gives you straightforward control over the number of elements in the sequence, if that is important to you.
@U0CMVHBL2 That's a great idea, any way to get that total of all generated elements, so I could fmap
to divide each by said total?
(reduce + my-sequence)
?
Whatever function you give to fmap
can calculate the total and divide all elements by the total. No need for those operations to be in separate places that I can see.
(gen/sample (gen/let [unnorm (gen/vector (s/gen (s/and float?
#(< 0 % 1))))
Z (reduce + unnorm)]
Z))
Sorry for the egregious formatting ^ but it complains about the value of Z
. Namely, I get the following error in the vim-iced
REPL: (generator? generator)
A function like this defined separately from the generator code might help?
user=> (defn normalize-sequence [coll]
(let [sum (reduce + coll)]
(map #(/ % sum) coll)))
#'user/normalize-sequence
user=> (normalize-sequence [1 2 3])
(1/6 1/3 1/2)
I totally agree. The only reason I'm hell-bent on getting it in generating code is so I can randomly generate a more complex spec.
I mean, define that function somewhere, then use it in your generator. I don't have fmap
in my recent mental cache today, so can't quickly fire off a code snippet using a call to the normalize-sequence
function above.
My understanding is that fmap applies a function to each element, is that incorrect? If it applies to the whole sequence, we're golden
I thought it was entire generated value, whether that value is a number or an arbitrary collection, just as most things in Clojure allow that range for 'values'
Take a look at this example, which I believe is calling sort
on a generated collection to produce a sorted collection: https://github.com/clojure/test.check/blob/master/doc/generator-examples.md#sorted-seq-of-integers
@nicholas.charchut I think you're missing s/gen
.
(gen/sample (s/gen ::probability-distribution))
Of the many errors @U89MMNQRY, that was certainly one of them. Proud to say my understanding of generators is through the roof at this point.
As for the generator, I think you could generate rand
repeatedly and take the sequences that added up to no more than 1.0 and just substitute the difference for the last value?
usually the way I end up thinking about writing complicated generators is create some kind of abstract machine that can be used to build whatever complicated value, then generate programs for that abstract machine, and fmap an interpreter over that
so for example, a program for generating probability vectors might start with [:vector-size 10] then be followed by some sequence of something like [:halve-and-add 1 5]
and then you just fmap over that something which creates a vector of 10 elements, the first 1 and the rest 0, and then proceeds to mixing that 1 around into the rest of the elements
hi, just a general wondering - is there any mature lib for grpc? grpc seems to get more and more prevalent and was just thinking why clojure community is not jumping the hype, any reasons to remain in REST world? (outside of the obvious things like browser support and slightly more relaxed protocol)
Googled around for an example tutorial, this one appears to focus on using a gRPC java lib from Clojure: https://blog.jmibanez.com/2018/07/22/grpc-with-clojure-and-leiningen.html There’s also this new tool to ease working with protobufs: https://github.com/bufbuild/buf
@UCHV4JZ7A thanks, i have seen this guide, i was just wondering why there is no established way to deal with grpc in clojure... thanks anyway
speaking for me / the companies I've worked for, we explicitly didn't want an RPC approach and opted instead for event sourcing (specifically using kafka as it is data oriented, ordered, and persistent by default)
@roguas actually Clojure community doesn't just remain in REST world, they actively take the Graphql approach - in fact they even created a Clojure-native version of Graphql called EQL (Edn Query Language)
@myguidingstar sure, but graphql has a very different usecase, grpc is for internal lowlatency service mesh comms, whereas graphql seems oriented towards abstracting over query/command offering single fascade endpoint to fetch data
sure, sure I know of that option, was just wondering why it's not getting any buzz in clojure community... and i dont mean it in a bad way - just would like to hear opinon 🙂
Not every Java lib needs a Clojure wrapper, or reinplementation. I do not know if grpc use in Clojure could be made significantly better than using existing Java libs, but if not, or no one has, Java interop is there by design, for that reason
There are lots of other data transfer mechanisms than grpc - some may not like that you have to define all the message fields up front, for example, and there are multiple similar approaches that do not require that
I’ve had some trouble managing my exceptions with Slingshot. According to Slingshot documentation, if you throw an object instead of throwable it is simply wrapped in ex-info
. But when trying to use a matcher, it doesn’t match on ex-info
exceptions, but it does on thrown objects. Here’s an example:
(try+
(throw+ {:status 401})
(catch [:status 401] _ "Caught!"))
This will print “Caught!”
(try+
(throw+ (ex-info "" {:status 401}))
(catch [:status 401] _ "Caught!"))
This will not print caught. Does anyone know why not? I’ve been debugging this for a while now and it doesn’t seem to make any sense.@mruzekw Clojure for the Brave and True has lots of examples and is available freely online.
Well I’ve never had this problem before. A dependency of a dependency in leiningen includes [slingshot 0.10.3]
but at top level I specify [slingshot 0.12.2]
. The lein deps :tree
tells me I;m using 0.12.2, runtime classpath tells me I am using 0.12.2, the debugger tells me I am using 0.10.3….
IntelliJ
Guessing it’s lying. Go delete the slingshot versions from your m2 cache and then debug again.
So in my stacktrace I have slingshot.support/unwrap line 65
, but in actual source the unwrap function is around like 30, but in 0.10.3 source it is at line 65
also the acutal function works like the old version
so debugger isn’t lying, everything else is
Have you checked full class path being used by JVM process, e.g. using something like ps axguwww | grep java
or whatever is similar on your system?
I checked
"java.class.path"
system property in REPLsupposedly it’s using the version 0.12.2, but the behaviour is like in 0.10.3
(source slingshot.support/unwrap)
"Returns a Throwable given a context: the object in context if it's
a Throwable, else a Throwable context wrapper"
=> nil
Also did you try removing the artifacts from the maven cache and see which one ends up there when you restart?
yeah I did
only the correct one is there but behavior isn’t correct
It’ll take a bit
Given this code in 0.12.2: https://github.com/scgilardi/slingshot/blob/5125a79e2bd6b25384f41e2751608d9e2ee1580b/src/slingshot/support.clj#L45
(slingshot.support/unwrap (ex-info "" {:status 401}))
should return a non-null value, as ex-data is not nullbut actually it works like this 0.10.3 version: https://github.com/scgilardi/slingshot/blob/5fd2b1f330dc3bf9a4164f1f2b0de0b162f5ef2e/src/slingshot/support.clj#L66
that is, because of missing meta tag, that where evaluates to false and it returns nil
(source slingshot.support/unwrap)
"Returns a Throwable given a context: the object in context if it's
a Throwable, else a Throwable context wrapper"
=> nil
sling> (source support/unwrap)
(defn unwrap
"If t is a context wrapper or other IExceptionInfo, returns the
corresponding context with t assoc'd as the value for :wrapper, else
returns nil"
[^Throwable t]
(if-let [data (ex-data t)]
(assoc (make-context t)
:object (if (::wrapper? (meta data)) (:object data) data)
:wrapper t)))
I’ll try running lein repl
same thing in lein repl
I’ll try to run it with IntelliJ itself
and from lein repl you did
user> (require '[slingshot.support :as support])
nil
user> (support/unwrap (ex-info "" {:status 401}))
I’ll look at plugins next
calling lein repl
introduces old version of the library
i've tried this with clj and lein with simple projects and it works fine. so its not a dep of lein or anything. you seem to have a plugin or perhaps your project does something a bit strange
I’ll try without any plugins
yeah I tried, I also tried with no plugins at all
and the source function still doesn’t return source and the doc is all wrong
I don’t know at this point
Another possibility, perhaps already mentioned earlier but I have not followed full thread: some JAR earlier in your class path includes source and/or .class files for the older version of the lib.
how do I find that?
how would the class file be named for slingshot.support/unwrap
There may be better ways, but on way would be to do jar tvf path/to/myjar.jar
on each JAR file before it in the class path, and grep filenames for something you expect to find in the older version of the lib
Best way I know to find out what a class file would be named for a Clojure source file is to compile it locally to class files, and look through file system for what files were added from before that compilation run, to after
e.g. find some/dir > before.txt
before compiling to class files, and again after redirected to after.txt, then diff before.txt after.txt
I would not recommend doing jar tvf path/to/myjar.jar | grep class-name
with the full class name -- just enough of a name that it looks like it might be distinctive to that library.
I don't know full rules for class name munging of things that look like decimal numbers, e.g. in class names, and how repeatable those are.
I think it must be some such thing, I did search the classpath for clj file with the right name, and since source command fails I am sure it’s a compiled class file somewhere
The reason why I make this suggestion is that there have been occasions I have heard about (not dealt with personally, but other Clojurians have) where some JAR file is distributed that uses AOT compilation, and it includes class files for libraries it depends upon.
Identifying which JAR file is the cause is step 1 in replacing that JAR file with something that doesn't do it.
there might even be more than 1 such JAR file
found it
jesus christ this was annoying
You find another JAR earlier in the class path that had class files for the older slingshot lib? Or source?
found a jar that was obviously an uberjar with this class in it
cool. Makes sense.
(-> slingshot.support/unwrap class .getClassLoader (.getResource "slingshot/support$unwrap.class"))
It is always nice after such an endeavor when the world seems sane again.
yeah, and I lost 4 hours trying to solve this instead of working on the project 😕
Having such "war stories" and experiences is the difference between a developer with 3-4 years of experience, and someone fresh to the dev tools.
You get to the point that you can make better guesses of probable causes and experiments to try, so eventually the 4 hours goes down to 30 mins or less. At least for things you guess well on 🙂
From experienced Clojure devs, all of that you just experienced is the raw data and pain behind advice that is summarized as "don't AOT when publishing libraries"
or in a more nuanced detailed form "be very very aware and careful of what you include in AOT compiled libraries"
Thank you guys for helping me solve this
@roklenarcic I'm curious as to which JAR you found it in? Was it an uberjar uploaded to Maven or Clojars?
it was a library by my client, uploaded to a private maven repo, so it’s only a part of the project I’m working on, not a concern for the general public
I’ll have to talk to them about not releasing uberjars as maven artifacts
Is there a way I can return an input stream from running a python script within clojure ring? So basically, I'm performing image processing in open cv using python and then calling that script in clojure ring like so: (shell/sh "python3" "segmentation.py") Currently this python script saves a static file, and then I send the static file using (io/input-stream img.png). But I don't want to save the static file and directly send the input stream. How can I do that?
what’s wrong with saving the static file?
if you really want to avoid saving the static file, you can try having the python script write its output to stdout and use that as the inputstream response
it's redundant. If a hundred thousand requests come in at the same time, I don't know how to server will handle them. What if it creates hundred thousand files? What happens if two requests come at the same time? Also reading and writing from hard disk is several orders of magnitude slower than reading from the ram
are you actually expecting a 100k requests?
I don’t know what you’re building
optimizing performance almost always depends on context so it’s hard to give any suggestions without more context.
basically I have an image tag [:img {:src "/my-route"}], and the point is that this img tag shows the processed image that was uploaded half a second ago. at /my-route, I run this python script right?, and I don't want to waste time saving the file, and send the input-stream directly.
how likely is it for the image to be downloaded more than once?
there's got to be a way to return some kind of a buffer from the python script right?
did you see the suggestion about using stdout ?
if it’s going to be downloaded more than once, then you’ll probably want to save the result somewhere anyway
> if you really want to avoid saving the static file, you can try having the python script write its output to stdout and use that as the inputstream response
i’m happy to give advice and explain other options, but I feel like you’re asking rudely and not being very appreciative of the free advice
Another option: Create a "ram-disk" on the server, i.e. a file system backed by RAM, not disk or SSD, and save and serve the file from there.
@U0CMVHBL2, do you know if this can be done on heroku?
I have not tried.
And if you are truly expecting 100K requests in a short period of time, at some point you need multiple servers, no matter what method you use.
I agree that using a method that requires 10 servers is cheaper in machine rental costs than a method that requires 100 servers, so not saying your questions are unwarranted.
is there a way to abstract to multiple servers in clojure ring? I don't really understand how running multiple servers fits into how heroku manages running my backend over multiple machines
servers=machines. I did not mean anything different than that by the term servers.
and say if requests come in concurrently, and I have to save and serve these files, but the processed files are different depending on the source of the request. Can that create some kind of blockage because the server has to wait before saving and serving each file for each of those concurrent requests?
You want to invoke a Python program, or in general some other Linux process, that will create the file you want to server, I think?
If you cannot invoke that program before the HTTP request comes in, because you do not yet know what command line args or input to give to that program, then obviously it cannot start before the HTTP request arrives, and presumably you cannot send the response back until the file is ready.
So that is a lower bound on the latency of responding.
what if I'm willing to save img1.png, img2.png.... img100.png and cycle through them? Does that reduce the bottleneck?
If you want a web server that runs concurrent threads on a multi-CPU-core machine to increase throughput, that can certainly be done. You would need to ensure that the different files you save (if they are written to a file system) have distinct file names, if the file contents should be different. There are well-known ways to generate unique file names from the command line, e.g. mktemp
Linux command, but those techniques can be implemented in Clojure or any other programming language, too.
If a single machine is serving many request at a high rate, I would guess that 100 is too low a number of distinct file names to use, but thankfully it is easy to make that number nearly arbitrarily large, rather than only 100.
You would need some way of deciding when a file was no longer needed, and to remove it, of course. Deciding when it is no longer needed, or that you are at least willing to regenerate it if you need it again, depend upon your application requirements, probably.
in this case, the application requirement is to not need the file after it has been served
This collection of files is effectively a cache. Whether it will ever happen that you can take advantage of using the same file to server more than one request also depends upon your application and what these files are for. Whether you can reuse files for multiple requests, or not, you will need to remove them eventually.
But if you are referring to these files using src="filename.jpg"
in an HTML response, then I believe the filename can be requested many milliseconds after you send the HTML response, yes?
Or the user can hit reload on your HTML page (again, depending upon your web app whether that is expected to work), and request that file again, long after the HTML was generated.
Others who actually have more experience than I do writing web apps should chime in if I'm talking nonsense, hopefully.
It is definitely not something I've done for a day job, ever.
Actually, marking the file destroyable right after it's been served is probably not a good idea in my case I don't want to regenerate the file every time the user refreshes the page (which is critical to the application.) Seems like I would need to save the file anyway, and frankly I don't know much about the pros and cons of caching in web servers and how useful it is in practice for performance.
I don't know best practices in this area of web server software dev. Eventually dynamically generated files do need to be deleted, so you can't support reload 478 days later and expect the file to still be sitting in the file system.
Check this out: https://github.com/ptaoussanis/carmine
A redis data-store with logic to destroy-this-file-after-x-seconds and a cap on max-memory.
It doesn't surprise me at all if people have implemented such caching logic N times in N different ways, where N is at least 100
And have had detailed and perhaps even vehement discussion about the advantages and disadvantages of several of them, compared to each other.
Redis is a popular open source framework, and I've heard of it before. Not so much any others really.
I mean, assuming you want to load the whole image in memory anyways, you can use (:out (shell/sh …))
. That’s the stdout of the shell process.
(type (:out (shell/sh "ls"))) returns java.lang.String. So it would have to be a b64 output right? That seems hacky to me!
The reason for discussing file system stuff is, if an HTTP server wants to reference an image in a src="filename.jpg"
reference, it needs to logically be in a file system on some server somewhere in the world, right? Or is there a way in HTTP to send back HTML with actual in-line image data?
Uh, you just gotta send a bytestream of a jpg back? How could anyone possibly know if it was sitting on the disk or not?
@U010Z4Y1J4Q use `
:out-enc :bytes
@U0CMVHBL2 Pretty sure you set the response headers properly (e.g. Content-Type
) and you can put whatever in the body.
I haven't tried it but does that mean that I return an opencv mat from the python script, set the header to image/png in the response and I'm good to go?!
I didn't say disk, I said "file system", but even that is too specific. If you have an element img src="string"
link in an HTML file, then the browser is likely to later make a request for the contents of whatever "string"
refers to, whether it is on disk or generated on demand, yes?
I suppose that data could be generated on the fly at that time, rather than when the HTML containing img src="string"
was sent back.
I guess I need that second coffee today, after all 🙂
@U010Z4Y1J4Q I’m pretty sure, yes.
@U0CMVHBL2 At the end of the day: browser requests an image, server returns some image bytes. Don’t matter where they come from.
Right, understood. In the context of the discussion @U010Z4Y1J4Q has been asking about, it sounds like that means if he is willing to generate the images only when they are actually requested by the clients, then the important thing is to maintain a set of image names, but no actual storage for them if he doesn't want to. Storage space for image data could be maintained somewhere, but that is optional if waiting for the image data to be generated at the last moment before it is needed is acceptable in his use case.
His use case is unclear to me. But he asked a pretty straightforward question initially.
He did. He also had the misfortune to have me responding, instead of someone who didn't go down a garden path 🙂
@U010Z4Y1J4Q how long is segmentation.py?
but you do create a bottleneck with the python GIL that wouldn't be there in the shell case
but tbt if there isnt all that much going on in segmentation.py, you could rewrite it in clojure/java without much fuss
also, as is the case with all performance optimizations, i'd say do the laziest thing that works until it makes the app too slow
@U010Z4Y1J4Q It sure sounds like a premature optimization- running 100,000 scripts at the same time won’t be great no matter what. Also with the way clojure java shell works you’re likely going end up spawning 3x as many threads. Moreover, OS usually caches disk IO so reading from a file written just a moment ago doesn’t have to mean it’s read from disk. I’m curious what kind of service is this and if there’s a justification for such scaling requirements - I doubt so, especially since you’re actually asking these questions ;); I wouldn’t be surprised if it was more along the lines 100 r/s
I love libpython-clj. @U3JH98J4R, I can run several instances of python in libpython-clj to overcome gil
Yeah, that being said I don't think overcoming the Gil is a supported thing - with good reason
I don't follow you. Why is def a good reason? You can create multiple refs to the python interpreter and run the threads of course
So if you want to use python objects together they need to be from the same enviroment
Therefore it uses a global reference to an interpreter by default, which is the best user experience
And the first target audience is the data science crowd, which is the place where pythons ecosystem beats the jvm
And that space already uses a single thread to queue off native code hubaloo which probably runs as parallel as it can
So the Gil and multiple interpreters isn't really a thing that would matter all that much