Fork me on GitHub

Hey y'all, I would like to define a spec with a corresponding custom generator to generate random probability vectors (vector of positive floats that sum to 1 where each element is between 0 and 1).

(s/def ::probability (s/and float? #(< 0 % 1))
(s/def ::probabilities (s/with-gen (s/and (s/coll-of ::probability)
                                          #(== 1 (reduce + %)))
What custom generating function would make this a possibility? Thanks!


@nicholas.charchut IEEE floating point is going to make that a near impossible bar to achieve surely?


Exactly the need for a custom generator. I was thinking to generate a vector of positive integers in a reasonable range and then divide each of them by the sum (to normalize). I can't seem to find a way to preemptively "access" the generated values to find that normalizing constant, though. Any ideas?


Hmm, good point. We have fmap for generated sequences but I think in this case you'd need to generate specific random sequences and then turn that into the generated result somehow and I'm not sure how to do that with generators.


You could probably design a function f that when given a small integer, it created a sequence that long of fractions that added up to 1, and then fmap that function over a small integer generator...


One way would be to generate a random sequence of positive values, e.g. perhaps all integers, then have a function that divided every element by the total of the elements. The result is guaranteed to sum to 1 (within roundoff error, for IEEE floats)


Doing it that way seems to me to give you more control of the relative size of the different elements, versus methods that might generate a bunch of values in the range [0, 1], then the first time you generate a value that would make the total go over 1, add (1-total of all earlier elements) as the last element.


The "divide by total" approach also gives you straightforward control over the number of elements in the sequence, if that is important to you.


@U0CMVHBL2 That's a great idea, any way to get that total of all generated elements, so I could fmap to divide each by said total?


(reduce + my-sequence) ?


Whatever function you give to fmap can calculate the total and divide all elements by the total. No need for those operations to be in separate places that I can see.


(gen/sample (gen/let [unnorm (gen/vector (s/gen (s/and float? 
                     #(< 0 % 1))))
         Z (reduce + unnorm)]


Sorry for the egregious formatting ^ but it complains about the value of Z. Namely, I get the following error in the vim-iced REPL: (generator? generator)


A function like this defined separately from the generator code might help?

user=> (defn normalize-sequence [coll]
(let [sum (reduce + coll)]
  (map #(/ % sum) coll)))
user=> (normalize-sequence [1 2 3])
(1/6 1/3 1/2)


I totally agree. The only reason I'm hell-bent on getting it in generating code is so I can randomly generate a more complex spec.


I mean, define that function somewhere, then use it in your generator. I don't have fmap in my recent mental cache today, so can't quickly fire off a code snippet using a call to the normalize-sequence function above.


My understanding is that fmap applies a function to each element, is that incorrect? If it applies to the whole sequence, we're golden


I thought it was entire generated value, whether that value is a number or an arbitrary collection, just as most things in Clojure allow that range for 'values'

👍 4

The similarity of the name fmap to map is potentially misleading here.

👍 4

Take a look at this example, which I believe is calling sort on a generated collection to produce a sorted collection:


Am I missing something?

Ben Grabow13:04:22

@nicholas.charchut I think you're missing s/gen.

(gen/sample (s/gen ::probability-distribution))


Of the many errors @U89MMNQRY, that was certainly one of them. Proud to say my understanding of generators is through the roof at this point.


As for the generator, I think you could generate rand repeatedly and take the sequences that added up to no more than 1.0 and just substitute the difference for the last value?


usually the way I end up thinking about writing complicated generators is create some kind of abstract machine that can be used to build whatever complicated value, then generate programs for that abstract machine, and fmap an interpreter over that


so for example, a program for generating probability vectors might start with [:vector-size 10] then be followed by some sequence of something like [:halve-and-add 1 5]


and then you just fmap over that something which creates a vector of 10 elements, the first 1 and the rest 0, and then proceeds to mixing that 1 around into the rest of the elements


hi, just a general wondering - is there any mature lib for grpc? grpc seems to get more and more prevalent and was just thinking why clojure community is not jumping the hype, any reasons to remain in REST world? (outside of the obvious things like browser support and slightly more relaxed protocol)


Googled around for an example tutorial, this one appears to focus on using a gRPC java lib from Clojure: There’s also this new tool to ease working with protobufs:


@UCHV4JZ7A thanks, i have seen this guide, i was just wondering why there is no established way to deal with grpc in clojure... thanks anyway

👍 4

speaking for me / the companies I've worked for, we explicitly didn't want an RPC approach and opted instead for event sourcing (specifically using kafka as it is data oriented, ordered, and persistent by default)


@roguas actually Clojure community doesn't just remain in REST world, they actively take the Graphql approach - in fact they even created a Clojure-native version of Graphql called EQL (Edn Query Language)


@myguidingstar sure, but graphql has a very different usecase, grpc is for internal lowlatency service mesh comms, whereas graphql seems oriented towards abstracting over query/command offering single fascade endpoint to fetch data


@roguas there’s certainly mature libs for java, no? Why not just use that?


sure, sure I know of that option, was just wondering why it's not getting any buzz in clojure community... and i dont mean it in a bad way - just would like to hear opinon 🙂


Not every Java lib needs a Clojure wrapper, or reinplementation. I do not know if grpc use in Clojure could be made significantly better than using existing Java libs, but if not, or no one has, Java interop is there by design, for that reason


There are lots of other data transfer mechanisms than grpc - some may not like that you have to define all the message fields up front, for example, and there are multiple similar approaches that do not require that


There's a few gRPC clojure things about.


Does anyone have a favorite Clojure by example book?


💯 Not every Java lib needs a Clojure wrapper 🔥


I’ve had some trouble managing my exceptions with Slingshot. According to Slingshot documentation, if you throw an object instead of throwable it is simply wrapped in ex-info . But when trying to use a matcher, it doesn’t match on ex-info exceptions, but it does on thrown objects. Here’s an example:

  (throw+ {:status 401})
  (catch [:status 401] _ "Caught!"))
This will print “Caught!”
  (throw+ (ex-info "" {:status 401}))
  (catch [:status 401] _ "Caught!"))
This will not print caught. Does anyone know why not? I’ve been debugging this for a while now and it doesn’t seem to make any sense.

Braden Shepherdson18:04:16

@mruzekw Clojure for the Brave and True has lots of examples and is available freely online.

🙏 8

Well I’ve never had this problem before. A dependency of a dependency in leiningen includes [slingshot 0.10.3] but at top level I specify [slingshot 0.12.2] . The lein deps :tree tells me I;m using 0.12.2, runtime classpath tells me I am using 0.12.2, the debugger tells me I am using 0.10.3….


Which debugger?


Guessing it’s lying. Go delete the slingshot versions from your m2 cache and then debug again.


So in my stacktrace I have line 65, but in actual source the unwrap function is around like 30, but in 0.10.3 source it is at line 65


also the acutal function works like the old version


so debugger isn’t lying, everything else is


Have you checked full class path being used by JVM process, e.g. using something like ps axguwww | grep java or whatever is similar on your system?


I checked

system property in REPL


supposedly it’s using the version 0.12.2, but the behaviour is like in 0.10.3


Can you call source on the function and see what that reports?


  "Returns a Throwable given a context: the object in context if it's
  a Throwable, else a Throwable context wrapper"
=> nil


Also did you try removing the artifacts from the maven cache and see which one ends up there when you restart?


only the correct one is there but behavior isn’t correct


Post a snippet along with expectation a small observed behavior?


It’ll take a bit


( (ex-info "" {:status 401}))
should return a non-null value, as ex-data is not null


that is, because of missing meta tag, that where evaluates to false and it returns nil


can you call (source support/unwrap)?


  "Returns a Throwable given a context: the object in context if it's
  a Throwable, else a Throwable context wrapper"
=> nil


i'm wondering why your repl is messed up


sling> (source support/unwrap)
(defn unwrap
  "If t is a context wrapper or other IExceptionInfo, returns the
  corresponding context with t assoc'd as the value for :wrapper, else
  returns nil"
  [^Throwable t]
  (if-let [data (ex-data t)]
    (assoc (make-context t)
      :object (if (::wrapper? (meta data)) (:object data) data)
      :wrapper t)))


I’ll try running lein repl


same thing in lein repl


I’ll try to run it with IntelliJ itself


and from lein repl you did

user> (require '[ :as support])
user> (support/unwrap (ex-info "" {:status 401}))


I’ll look at plugins next


try lein deps :plugin-tree


calling lein repl introduces old version of the library


try with lein deps :plugin-tree


i've tried this with clj and lein with simple projects and it works fine. so its not a dep of lein or anything. you seem to have a plugin or perhaps your project does something a bit strange


I’ll try without any plugins


did you try that deps command i listed?


it will show you if a plugin overrides it


yeah I tried, I also tried with no plugins at all


and the source function still doesn’t return source and the doc is all wrong


I don’t know at this point


Another possibility, perhaps already mentioned earlier but I have not followed full thread: some JAR earlier in your class path includes source and/or .class files for the older version of the lib.


how do I find that?


how would the class file be named for


There may be better ways, but on way would be to do jar tvf path/to/myjar.jar on each JAR file before it in the class path, and grep filenames for something you expect to find in the older version of the lib


Best way I know to find out what a class file would be named for a Clojure source file is to compile it locally to class files, and look through file system for what files were added from before that compilation run, to after


e.g. find some/dir > before.txt before compiling to class files, and again after redirected to after.txt, then diff before.txt after.txt


I would not recommend doing jar tvf path/to/myjar.jar | grep class-name with the full class name -- just enough of a name that it looks like it might be distinctive to that library.


I don't know full rules for class name munging of things that look like decimal numbers, e.g. in class names, and how repeatable those are.


I think it must be some such thing, I did search the classpath for clj file with the right name, and since source command fails I am sure it’s a compiled class file somewhere


The reason why I make this suggestion is that there have been occasions I have heard about (not dealt with personally, but other Clojurians have) where some JAR file is distributed that uses AOT compilation, and it includes class files for libraries it depends upon.


Identifying which JAR file is the cause is step 1 in replacing that JAR file with something that doesn't do it.


there might even be more than 1 such JAR file


jesus christ this was annoying


You find another JAR earlier in the class path that had class files for the older slingshot lib? Or source?


found a jar that was obviously an uberjar with this class in it


cool. Makes sense.


(-> class .getClassLoader (.getResource "slingshot/support$unwrap.class"))


It is always nice after such an endeavor when the world seems sane again.


yeah, and I lost 4 hours trying to solve this instead of working on the project 😕


Having such "war stories" and experiences is the difference between a developer with 3-4 years of experience, and someone fresh to the dev tools.


You get to the point that you can make better guesses of probable causes and experiments to try, so eventually the 4 hours goes down to 30 mins or less. At least for things you guess well on 🙂


From experienced Clojure devs, all of that you just experienced is the raw data and pain behind advice that is summarized as "don't AOT when publishing libraries"


or in a more nuanced detailed form "be very very aware and careful of what you include in AOT compiled libraries"


Thank you guys for helping me solve this


@roklenarcic I'm curious as to which JAR you found it in? Was it an uberjar uploaded to Maven or Clojars?


it was a library by my client, uploaded to a private maven repo, so it’s only a part of the project I’m working on, not a concern for the general public


I’ll have to talk to them about not releasing uberjars as maven artifacts


Is there a way I can return an input stream from running a python script within clojure ring? So basically, I'm performing image processing in open cv using python and then calling that script in clojure ring like so: (shell/sh "python3" "") Currently this python script saves a static file, and then I send the static file using (io/input-stream img.png). But I don't want to save the static file and directly send the input stream. How can I do that?


what’s wrong with saving the static file?


if you really want to avoid saving the static file, you can try having the python script write its output to stdout and use that as the inputstream response


it's redundant. If a hundred thousand requests come in at the same time, I don't know how to server will handle them. What if it creates hundred thousand files? What happens if two requests come at the same time? Also reading and writing from hard disk is several orders of magnitude slower than reading from the ram


are you actually expecting a 100k requests?


do you know a better way to handle bigger loads?


I don’t know what you’re building


optimizing performance almost always depends on context so it’s hard to give any suggestions without more context.


basically I have an image tag [:img {:src "/my-route"}], and the point is that this img tag shows the processed image that was uploaded half a second ago. at /my-route, I run this python script right?, and I don't want to waste time saving the file, and send the input-stream directly.


directly, as soon as the python script finishes executing


how likely is it for the image to be downloaded more than once?


there's got to be a way to return some kind of a buffer from the python script right?


did you see the suggestion about using stdout ?


more than once? you got to specify a time frame with that question.


no, I didn't. What's the suggestion again?


if it’s going to be downloaded more than once, then you’ll probably want to save the result somewhere anyway


> if you really want to avoid saving the static file, you can try having the python script write its output to stdout and use that as the inputstream response


that's a cute hack but how to do this legitimately?


i’m happy to give advice and explain other options, but I feel like you’re asking rudely and not being very appreciative of the free advice


Another option: Create a "ram-disk" on the server, i.e. a file system backed by RAM, not disk or SSD, and save and serve the file from there.


@U0CMVHBL2, do you know if this can be done on heroku?


I have not tried.


And if you are truly expecting 100K requests in a short period of time, at some point you need multiple servers, no matter what method you use.


I agree that using a method that requires 10 servers is cheaper in machine rental costs than a method that requires 100 servers, so not saying your questions are unwarranted.


is there a way to abstract to multiple servers in clojure ring? I don't really understand how running multiple servers fits into how heroku manages running my backend over multiple machines


in a cluster


servers=machines. I did not mean anything different than that by the term servers.

👍 4

and say if requests come in concurrently, and I have to save and serve these files, but the processed files are different depending on the source of the request. Can that create some kind of blockage because the server has to wait before saving and serving each file for each of those concurrent requests?


or can it happen in parallel?


You want to invoke a Python program, or in general some other Linux process, that will create the file you want to server, I think?


yup, exactly


If you cannot invoke that program before the HTTP request comes in, because you do not yet know what command line args or input to give to that program, then obviously it cannot start before the HTTP request arrives, and presumably you cannot send the response back until the file is ready.


So that is a lower bound on the latency of responding.


what if I'm willing to save img1.png, img2.png.... img100.png and cycle through them? Does that reduce the bottleneck?


If you want a web server that runs concurrent threads on a multi-CPU-core machine to increase throughput, that can certainly be done. You would need to ensure that the different files you save (if they are written to a file system) have distinct file names, if the file contents should be different. There are well-known ways to generate unique file names from the command line, e.g. mktemp Linux command, but those techniques can be implemented in Clojure or any other programming language, too.


If a single machine is serving many request at a high rate, I would guess that 100 is too low a number of distinct file names to use, but thankfully it is easy to make that number nearly arbitrarily large, rather than only 100.


Yeah. What's the clojurian way to mktemp?


You would need some way of deciding when a file was no longer needed, and to remove it, of course. Deciding when it is no longer needed, or that you are at least willing to regenerate it if you need it again, depend upon your application requirements, probably.


in this case, the application requirement is to not need the file after it has been served


This collection of files is effectively a cache. Whether it will ever happen that you can take advantage of using the same file to server more than one request also depends upon your application and what these files are for. Whether you can reuse files for multiple requests, or not, you will need to remove them eventually.

💯 4

and regenerate it if needed rather than storing it


But if you are referring to these files using src="filename.jpg" in an HTML response, then I believe the filename can be requested many milliseconds after you send the HTML response, yes?


Or the user can hit reload on your HTML page (again, depending upon your web app whether that is expected to work), and request that file again, long after the HTML was generated.


Others who actually have more experience than I do writing web apps should chime in if I'm talking nonsense, hopefully.


It is definitely not something I've done for a day job, ever.


Actually, marking the file destroyable right after it's been served is probably not a good idea in my case I don't want to regenerate the file every time the user refreshes the page (which is critical to the application.) Seems like I would need to save the file anyway, and frankly I don't know much about the pros and cons of caching in web servers and how useful it is in practice for performance.


I don't know best practices in this area of web server software dev. Eventually dynamically generated files do need to be deleted, so you can't support reload 478 days later and expect the file to still be sitting in the file system.

😀 4

A redis data-store with logic to destroy-this-file-after-x-seconds and a cap on max-memory.


It doesn't surprise me at all if people have implemented such caching logic N times in N different ways, where N is at least 100


And have had detailed and perhaps even vehement discussion about the advantages and disadvantages of several of them, compared to each other.


Redis is a popular open source framework, and I've heard of it before. Not so much any others really.


also its heroku add-on is free for development/hobby, which is great


I mean, assuming you want to load the whole image in memory anyways, you can use (:out (shell/sh …)). That’s the stdout of the shell process.


by default it returns a byte array, which you can pass to an image lib


(type (:out (shell/sh "ls"))) returns java.lang.String. So it would have to be a b64 output right? That seems hacky to me!


The reason for discussing file system stuff is, if an HTTP server wants to reference an image in a src="filename.jpg" reference, it needs to logically be in a file system on some server somewhere in the world, right? Or is there a way in HTTP to send back HTML with actual in-line image data?


Uh, you just gotta send a bytestream of a jpg back? How could anyone possibly know if it was sitting on the disk or not?


@U010Z4Y1J4Q use `

:out-enc :bytes


(per the doc string of sh)


@U0CMVHBL2 Pretty sure you set the response headers properly (e.g. Content-Type) and you can put whatever in the body.


File servers just happen to set all of that for you.


I haven't tried it but does that mean that I return an opencv mat from the python script, set the header to image/png in the response and I'm good to go?!


I didn't say disk, I said "file system", but even that is too specific. If you have an element img src="string" link in an HTML file, then the browser is likely to later make a request for the contents of whatever "string" refers to, whether it is on disk or generated on demand, yes?


I suppose that data could be generated on the fly at that time, rather than when the HTML containing img src="string" was sent back.


I guess I need that second coffee today, after all 🙂


@U010Z4Y1J4Q I’m pretty sure, yes.


@U0CMVHBL2 At the end of the day: browser requests an image, server returns some image bytes. Don’t matter where they come from.


Right, understood. In the context of the discussion @U010Z4Y1J4Q has been asking about, it sounds like that means if he is willing to generate the images only when they are actually requested by the clients, then the important thing is to maintain a set of image names, but no actual storage for them if he doesn't want to. Storage space for image data could be maintained somewhere, but that is optional if waiting for the image data to be generated at the last moment before it is needed is acceptable in his use case.


His use case is unclear to me. But he asked a pretty straightforward question initially.


He did. He also had the misfortune to have me responding, instead of someone who didn't go down a garden path 🙂


happens to the best of us 🙂 (and you weren’t the only one)


@U010Z4Y1J4Q how long is


libpython-clj would let you avoid the call to shell and create an actual stream


but you do create a bottleneck with the python GIL that wouldn't be there in the shell case


but tbt if there isnt all that much going on in, you could rewrite it in clojure/java without much fuss


also, as is the case with all performance optimizations, i'd say do the laziest thing that works until it makes the app too slow

👍 4

@U010Z4Y1J4Q It sure sounds like a premature optimization- running 100,000 scripts at the same time won’t be great no matter what. Also with the way clojure java shell works you’re likely going end up spawning 3x as many threads. Moreover, OS usually caches disk IO so reading from a file written just a moment ago doesn’t have to mean it’s read from disk. I’m curious what kind of service is this and if there’s a justification for such scaling requirements - I doubt so, especially since you’re actually asking these questions ;); I wouldn’t be surprised if it was more along the lines 100 r/s


I love libpython-clj. @U3JH98J4R, I can run several instances of python in libpython-clj to overcome gil


I love how libpython-clj makes python interop so seamless


Yeah, that being said I don't think overcoming the Gil is a supported thing - with good reason


Def. Try it and profile


Or guestimate


But there's no shame in a little bottlenecking or excessive scaling

😀 4

I don't follow you. Why is def a good reason? You can create multiple refs to the python interpreter and run the threads of course


Well because libpython-clj marries a python environment to the jvm


So if you want to use python objects together they need to be from the same enviroment


Therefore it uses a global reference to an interpreter by default, which is the best user experience


And the first target audience is the data science crowd, which is the place where pythons ecosystem beats the jvm


And that space already uses a single thread to queue off native code hubaloo which probably runs as parallel as it can


So the Gil and multiple interpreters isn't really a thing that would matter all that much


That being said, it might be supported I just don't know


Either way I said to def profile Because even with the Gil you probably meet your performance requirements


And it's possible maybe that the filesystem approach is the fastest


You never know except by profiling, and guessing is an imprecise art

💯 4

I guess guessing takes less effort than profiling that's why people are more prone to doing it.