clojure 2020-04-04 | Slack Archive

charch06:04:27

Hey y'all, I would like to define a spec with a corresponding custom generator to generate random probability vectors (vector of positive floats that sum to 1 where each element is between 0 and 1).

(s/def ::probability (s/and float? #(< 0 % 1))
(s/def ::probabilities (s/with-gen (s/and (s/coll-of ::probability)
                                          #(== 1 (reduce + %)))
                                   GEN-FUNCTION))

What custom generating function would make this a possibility? Thanks!

seancorfield06:04:40

@nicholas.charchut IEEE floating point is going to make that a near impossible bar to achieve surely?

charch06:04:19

Exactly the need for a custom generator. I was thinking to generate a vector of positive integers in a reasonable range and then divide each of them by the sum (to normalize). I can't seem to find a way to preemptively "access" the generated values to find that normalizing constant, though. Any ideas?

seancorfield06:04:44

Hmm, good point. We have fmap for generated sequences but I think in this case you'd need to generate specific random sequences and then turn that into the generated result somehow and I'm not sure how to do that with generators.

seancorfield06:04:34

You could probably design a function f that when given a small integer, it created a sequence that long of fractions that added up to 1, and then fmap that function over a small integer generator...

andy.fingerhut08:04:35

One way would be to generate a random sequence of positive values, e.g. perhaps all integers, then have a function that divided every element by the total of the elements. The result is guaranteed to sum to 1 (within roundoff error, for IEEE floats)

andy.fingerhut08:04:59

Doing it that way seems to me to give you more control of the relative size of the different elements, versus methods that might generate a bunch of values in the range [0, 1], then the first time you generate a value that would make the total go over 1, add (1-total of all earlier elements) as the last element.

andy.fingerhut08:04:34

The "divide by total" approach also gives you straightforward control over the number of elements in the sequence, if that is important to you.

charch20:04:58

@U0CMVHBL2 That's a great idea, any way to get that total of all generated elements, so I could fmap to divide each by said total?

andy.fingerhut20:04:38

(reduce + my-sequence) ?

andy.fingerhut20:04:54

Whatever function you give to fmap can calculate the total and divide all elements by the total. No need for those operations to be in separate places that I can see.

charch20:04:58

(gen/sample (gen/let [unnorm (gen/vector (s/gen (s/and float? 
                     #(< 0 % 1))))
         Z (reduce + unnorm)]
              Z))

charch20:04:16

Sorry for the egregious formatting ^ but it complains about the value of Z. Namely, I get the following error in the vim-iced REPL: (generator? generator)

andy.fingerhut20:04:08

A function like this defined separately from the generator code might help?

user=> (defn normalize-sequence [coll]
(let [sum (reduce + coll)]
  (map #(/ % sum) coll)))
#'user/normalize-sequence
user=> (normalize-sequence [1 2 3])
(1/6 1/3 1/2)

charch20:04:09

I totally agree. The only reason I'm hell-bent on getting it in generating code is so I can randomly generate a more complex spec.

andy.fingerhut20:04:09

I mean, define that function somewhere, then use it in your generator. I don't have fmap in my recent mental cache today, so can't quickly fire off a code snippet using a call to the normalize-sequence function above.

charch20:04:43

My understanding is that fmap applies a function to each element, is that incorrect? If it applies to the whole sequence, we're golden

andy.fingerhut20:04:52

I thought it was entire generated value, whether that value is a number or an arbitrary collection, just as most things in Clojure allow that range for 'values'

👍 4

andy.fingerhut20:04:27

The similarity of the name fmap to map is potentially misleading here.

👍 4

andy.fingerhut21:04:08

Take a look at this example, which I believe is calling sort on a generated collection to produce a sorted collection: https://github.com/clojure/test.check/blob/master/doc/generator-examples.md#sorted-seq-of-integers

charch19:04:52

Am I missing something?

Ben Grabow13:04:22

@nicholas.charchut I think you're missing s/gen.

(gen/sample (s/gen ::probability-distribution))

charch06:04:58

Of the many errors @U89MMNQRY, that was certainly one of them. Proud to say my understanding of generators is through the roof at this point.

seancorfield06:04:48

As for the generator, I think you could generate rand repeatedly and take the sequences that added up to no more than 1.0 and just substitute the difference for the last value?

hiredman07:04:16

usually the way I end up thinking about writing complicated generators is create some kind of abstract machine that can be used to build whatever complicated value, then generate programs for that abstract machine, and fmap an interpreter over that

hiredman07:04:43

so for example, a program for generating probability vectors might start with [:vector-size 10] then be followed by some sequence of something like [:halve-and-add 1 5]

hiredman07:04:59

and then you just fmap over that something which creates a vector of 10 elements, the first 1 and the rest 0, and then proceeds to mixing that 1 around into the rest of the elements

rmxm11:04:18

hi, just a general wondering - is there any mature lib for grpc? grpc seems to get more and more prevalent and was just thinking why clojure community is not jumping the hype, any reasons to remain in REST world? (outside of the obvious things like browser support and slightly more relaxed protocol)

adamfeldman16:04:16

Googled around for an example tutorial, this one appears to focus on using a gRPC java lib from Clojure: https://blog.jmibanez.com/2018/07/22/grpc-with-clojure-and-leiningen.html There’s also this new tool to ease working with protobufs: https://github.com/bufbuild/buf

rmxm17:04:01

@UCHV4JZ7A thanks, i have seen this guide, i was just wondering why there is no established way to deal with grpc in clojure... thanks anyway

👍 4

noisesmith15:04:00

speaking for me / the companies I've worked for, we explicitly didn't want an RPC approach and opted instead for event sourcing (specifically using kafka as it is data oriented, ordered, and persistent by default)

myguidingstar11:04:05

@roguas actually Clojure community doesn't just remain in REST world, they actively take the Graphql approach - in fact they even created a Clojure-native version of Graphql called EQL (Edn Query Language)

rmxm12:04:48

@myguidingstar sure, but graphql has a very different usecase, grpc is for internal lowlatency service mesh comms, whereas graphql seems oriented towards abstracting over query/command offering single fascade endpoint to fetch data

mkvlr12:04:35

@roguas there’s certainly mature libs for java, no? Why not just use that?

rmxm12:04:11

sure, sure I know of that option, was just wondering why it's not getting any buzz in clojure community... and i dont mean it in a bad way - just would like to hear opinon 🙂

andy.fingerhut12:04:43

Not every Java lib needs a Clojure wrapper, or reinplementation. I do not know if grpc use in Clojure could be made significantly better than using existing Java libs, but if not, or no one has, Java interop is there by design, for that reason

andy.fingerhut12:04:49

There are lots of other data transfer mechanisms than grpc - some may not like that you have to define all the message fields up front, for example, and there are multiple similar approaches that do not require that

dominicm12:04:56

There's a few gRPC clojure things about.

mruzekw16:04:15

Does anyone have a favorite Clojure by example book?

potetm16:04:21

💯 Not every Java lib needs a Clojure wrapper 🔥

roklenarcic18:04:28

I’ve had some trouble managing my exceptions with Slingshot. According to Slingshot documentation, if you throw an object instead of throwable it is simply wrapped in ex-info . But when trying to use a matcher, it doesn’t match on ex-info exceptions, but it does on thrown objects. Here’s an example:

(try+
  (throw+ {:status 401})
  (catch [:status 401] _ "Caught!"))

This will print “Caught!”

(try+
  (throw+ (ex-info "" {:status 401}))
  (catch [:status 401] _ "Caught!"))

This will not print caught. Does anyone know why not? I’ve been debugging this for a while now and it doesn’t seem to make any sense.

Braden Shepherdson18:04:16

@mruzekw Clojure for the Brave and True has lots of examples and is available freely online.

🙏 8

roklenarcic19:04:40

Well I’ve never had this problem before. A dependency of a dependency in leiningen includes [slingshot 0.10.3] but at top level I specify [slingshot 0.12.2] . The lein deps :tree tells me I;m using 0.12.2, runtime classpath tells me I am using 0.12.2, the debugger tells me I am using 0.10.3….

dpsutton19:04:19

Which debugger?

roklenarcic19:04:25

IntelliJ

dpsutton19:04:19

Guessing it’s lying. Go delete the slingshot versions from your m2 cache and then debug again.

roklenarcic19:04:20

So in my stacktrace I have slingshot.support/unwrap line 65, but in actual source the unwrap function is around like 30, but in 0.10.3 source it is at line 65

roklenarcic19:04:42

also the acutal function works like the old version

roklenarcic19:04:52

so debugger isn’t lying, everything else is

andy.fingerhut19:04:04

Have you checked full class path being used by JVM process, e.g. using something like ps axguwww | grep java or whatever is similar on your system?

roklenarcic19:04:41

I checked

"java.class.path"

system property in REPL

roklenarcic19:04:06

supposedly it’s using the version 0.12.2, but the behaviour is like in 0.10.3

dpsutton19:04:47

Can you call source on the function and see what that reports?

roklenarcic19:04:59

(source slingshot.support/unwrap)
  "Returns a Throwable given a context: the object in context if it's
  a Throwable, else a Throwable context wrapper"
=> nil

dpsutton19:04:00

Also did you try removing the artifacts from the maven cache and see which one ends up there when you restart?

roklenarcic19:04:07

yeah I did

roklenarcic19:04:23

only the correct one is there but behavior isn’t correct

dpsutton20:04:58

Post a snippet along with expectation a small observed behavior?

roklenarcic20:04:16

It’ll take a bit

roklenarcic20:04:49

Given this code in 0.12.2: https://github.com/scgilardi/slingshot/blob/5125a79e2bd6b25384f41e2751608d9e2ee1580b/src/slingshot/support.clj#L45

roklenarcic20:04:26

(slingshot.support/unwrap (ex-info "" {:status 401}))

should return a non-null value, as ex-data is not null

roklenarcic20:04:18

but actually it works like this 0.10.3 version: https://github.com/scgilardi/slingshot/blob/5fd2b1f330dc3bf9a4164f1f2b0de0b162f5ef2e/src/slingshot/support.clj#L66

roklenarcic20:04:43

that is, because of missing meta tag, that where evaluates to false and it returns nil

dpsutton20:04:12

can you call (source support/unwrap)?

roklenarcic20:04:40

(source slingshot.support/unwrap)
  "Returns a Throwable given a context: the object in context if it's
  a Throwable, else a Throwable context wrapper"
=> nil

dpsutton20:04:04

i'm wondering why your repl is messed up

dpsutton20:04:14

sling> (source support/unwrap)
(defn unwrap
  "If t is a context wrapper or other IExceptionInfo, returns the
  corresponding context with t assoc'd as the value for :wrapper, else
  returns nil"
  [^Throwable t]
  (if-let [data (ex-data t)]
    (assoc (make-context t)
      :object (if (::wrapper? (meta data)) (:object data) data)
      :wrapper t)))

roklenarcic20:04:46

I’ll try running lein repl

roklenarcic20:04:02

same thing in lein repl

roklenarcic20:04:33

I’ll try to run it with IntelliJ itself

dpsutton20:04:41

and from lein repl you did

user> (require '[slingshot.support :as support])
nil
user> (support/unwrap (ex-info "" {:status 401}))

roklenarcic20:04:47

yeah

roklenarcic20:04:57

I’ll look at plugins next

dpsutton20:04:42

try lein deps :plugin-tree

roklenarcic20:04:49

calling lein repl introduces old version of the library

dpsutton20:04:03

try with lein deps :plugin-tree

dpsutton20:04:40

i've tried this with clj and lein with simple projects and it works fine. so its not a dep of lein or anything. you seem to have a plugin or perhaps your project does something a bit strange

roklenarcic20:04:13

I’ll try without any plugins

dpsutton20:04:27

did you try that deps command i listed?

dpsutton20:04:36

it will show you if a plugin overrides it

roklenarcic20:04:47

yeah I tried, I also tried with no plugins at all

roklenarcic20:04:58

and the source function still doesn’t return source and the doc is all wrong

roklenarcic20:04:08

I don’t know at this point

andy.fingerhut20:04:18

Another possibility, perhaps already mentioned earlier but I have not followed full thread: some JAR earlier in your class path includes source and/or .class files for the older version of the lib.

roklenarcic20:04:37

how do I find that?

roklenarcic20:04:38

how would the class file be named for slingshot.support/unwrap

andy.fingerhut20:04:18

There may be better ways, but on way would be to do jar tvf path/to/myjar.jar on each JAR file before it in the class path, and grep filenames for something you expect to find in the older version of the lib

andy.fingerhut20:04:59

Best way I know to find out what a class file would be named for a Clojure source file is to compile it locally to class files, and look through file system for what files were added from before that compilation run, to after

andy.fingerhut20:04:38

e.g. find some/dir > before.txt before compiling to class files, and again after redirected to after.txt, then diff before.txt after.txt

andy.fingerhut20:04:55

I would not recommend doing jar tvf path/to/myjar.jar | grep class-name with the full class name -- just enough of a name that it looks like it might be distinctive to that library.

andy.fingerhut20:04:25

I don't know full rules for class name munging of things that look like decimal numbers, e.g. in class names, and how repeatable those are.

roklenarcic20:04:24

I think it must be some such thing, I did search the classpath for clj file with the right name, and since source command fails I am sure it’s a compiled class file somewhere

andy.fingerhut20:04:32

The reason why I make this suggestion is that there have been occasions I have heard about (not dealt with personally, but other Clojurians have) where some JAR file is distributed that uses AOT compilation, and it includes class files for libraries it depends upon.

andy.fingerhut20:04:12

Identifying which JAR file is the cause is step 1 in replacing that JAR file with something that doesn't do it.

andy.fingerhut20:04:28

there might even be more than 1 such JAR file

roklenarcic20:04:50

found it

roklenarcic20:04:16

jesus christ this was annoying

andy.fingerhut20:04:00

You find another JAR earlier in the class path that had class files for the older slingshot lib? Or source?

roklenarcic20:04:30

found a jar that was obviously an uberjar with this class in it

andy.fingerhut20:04:47

cool. Makes sense.

roklenarcic20:04:52

(-> slingshot.support/unwrap class .getClassLoader (.getResource "slingshot/support$unwrap.class"))

andy.fingerhut20:04:59

It is always nice after such an endeavor when the world seems sane again.

roklenarcic20:04:29

yeah, and I lost 4 hours trying to solve this instead of working on the project 😕

andy.fingerhut21:04:49

Having such "war stories" and experiences is the difference between a developer with 3-4 years of experience, and someone fresh to the dev tools.

andy.fingerhut21:04:50

You get to the point that you can make better guesses of probable causes and experiments to try, so eventually the 4 hours goes down to 30 mins or less. At least for things you guess well on 🙂

andy.fingerhut21:04:08

From experienced Clojure devs, all of that you just experienced is the raw data and pain behind advice that is summarized as "don't AOT when publishing libraries"

andy.fingerhut21:04:09

or in a more nuanced detailed form "be very very aware and careful of what you include in AOT compiled libraries"

roklenarcic20:04:37

Thank you guys for helping me solve this

seancorfield21:04:51

@roklenarcic I'm curious as to which JAR you found it in? Was it an uberjar uploaded to Maven or Clojars?

roklenarcic21:04:51

it was a library by my client, uploaded to a private maven repo, so it’s only a part of the project I’m working on, not a concern for the general public

roklenarcic21:04:55

I’ll have to talk to them about not releasing uberjars as maven artifacts

Spaceman22:04:39

Is there a way I can return an input stream from running a python script within clojure ring? So basically, I'm performing image processing in open cv using python and then calling that script in clojure ring like so: (shell/sh "python3" "segmentation.py") Currently this python script saves a static file, and then I send the static file using (io/input-stream img.png). But I don't want to save the static file and directly send the input stream. How can I do that?

phronmophobic22:04:58

what’s wrong with saving the static file?

phronmophobic22:04:25

if you really want to avoid saving the static file, you can try having the python script write its output to stdout and use that as the inputstream response

Spaceman22:04:43

it's redundant. If a hundred thousand requests come in at the same time, I don't know how to server will handle them. What if it creates hundred thousand files? What happens if two requests come at the same time? Also reading and writing from hard disk is several orders of magnitude slower than reading from the ram

phronmophobic22:04:45

are you actually expecting a 100k requests?

Spaceman22:04:17

yes

Spaceman22:04:06

do you know a better way to handle bigger loads?

phronmophobic22:04:29

I don’t know what you’re building

phronmophobic22:04:44

optimizing performance almost always depends on context so it’s hard to give any suggestions without more context.

Spaceman22:04:05

basically I have an image tag [:img {:src "/my-route"}], and the point is that this img tag shows the processed image that was uploaded half a second ago. at /my-route, I run this python script right?, and I don't want to waste time saving the file, and send the input-stream directly.

Spaceman22:04:43

directly, as soon as the python script finishes executing

phronmophobic22:04:07

how likely is it for the image to be downloaded more than once?

Spaceman22:04:31

there's got to be a way to return some kind of a buffer from the python script right?

phronmophobic22:04:56

did you see the suggestion about using stdout ?

Spaceman22:04:07

more than once? you got to specify a time frame with that question.

Spaceman22:04:35

no, I didn't. What's the suggestion again?

phronmophobic22:04:46

if it’s going to be downloaded more than once, then you’ll probably want to save the result somewhere anyway

phronmophobic22:04:00

> if you really want to avoid saving the static file, you can try having the python script write its output to stdout and use that as the inputstream response

Spaceman22:04:46

that's a cute hack but how to do this legitimately?

phronmophobic22:04:59

i’m happy to give advice and explain other options, but I feel like you’re asking rudely and not being very appreciative of the free advice

andy.fingerhut22:04:10

Another option: Create a "ram-disk" on the server, i.e. a file system backed by RAM, not disk or SSD, and save and serve the file from there.

Spaceman22:04:05

@U0CMVHBL2, do you know if this can be done on heroku?

andy.fingerhut22:04:24

I have not tried.

andy.fingerhut22:04:55

And if you are truly expecting 100K requests in a short period of time, at some point you need multiple servers, no matter what method you use.

andy.fingerhut22:04:40

I agree that using a method that requires 10 servers is cheaper in machine rental costs than a method that requires 100 servers, so not saying your questions are unwarranted.

Spaceman22:04:02

is there a way to abstract to multiple servers in clojure ring? I don't really understand how running multiple servers fits into how heroku manages running my backend over multiple machines

Spaceman22:04:10

in a cluster

andy.fingerhut22:04:00

servers=machines. I did not mean anything different than that by the term servers.

👍 4

Spaceman22:04:43

and say if requests come in concurrently, and I have to save and serve these files, but the processed files are different depending on the source of the request. Can that create some kind of blockage because the server has to wait before saving and serving each file for each of those concurrent requests?

Spaceman22:04:18

or can it happen in parallel?

andy.fingerhut22:04:47

You want to invoke a Python program, or in general some other Linux process, that will create the file you want to server, I think?

Spaceman23:04:07

yup, exactly

andy.fingerhut23:04:29

If you cannot invoke that program before the HTTP request comes in, because you do not yet know what command line args or input to give to that program, then obviously it cannot start before the HTTP request arrives, and presumably you cannot send the response back until the file is ready.

andy.fingerhut23:04:36

So that is a lower bound on the latency of responding.

Spaceman23:04:22

what if I'm willing to save img1.png, img2.png.... img100.png and cycle through them? Does that reduce the bottleneck?

andy.fingerhut23:04:40

If you want a web server that runs concurrent threads on a multi-CPU-core machine to increase throughput, that can certainly be done. You would need to ensure that the different files you save (if they are written to a file system) have distinct file names, if the file contents should be different. There are well-known ways to generate unique file names from the command line, e.g. mktemp Linux command, but those techniques can be implemented in Clojure or any other programming language, too.

andy.fingerhut23:04:51

If a single machine is serving many request at a high rate, I would guess that 100 is too low a number of distinct file names to use, but thankfully it is easy to make that number nearly arbitrarily large, rather than only 100.

Spaceman23:04:37

Yeah. What's the clojurian way to mktemp?

andy.fingerhut23:04:38

You would need some way of deciding when a file was no longer needed, and to remove it, of course. Deciding when it is no longer needed, or that you are at least willing to regenerate it if you need it again, depend upon your application requirements, probably.

Spaceman23:04:07

in this case, the application requirement is to not need the file after it has been served

andy.fingerhut23:04:22

This collection of files is effectively a cache. Whether it will ever happen that you can take advantage of using the same file to server more than one request also depends upon your application and what these files are for. Whether you can reuse files for multiple requests, or not, you will need to remove them eventually.

💯 4

Spaceman23:04:24

and regenerate it if needed rather than storing it

andy.fingerhut23:04:15

But if you are referring to these files using src="filename.jpg" in an HTML response, then I believe the filename can be requested many milliseconds after you send the HTML response, yes?

andy.fingerhut23:04:51

Or the user can hit reload on your HTML page (again, depending upon your web app whether that is expected to work), and request that file again, long after the HTML was generated.

andy.fingerhut23:04:22

Others who actually have more experience than I do writing web apps should chime in if I'm talking nonsense, hopefully.

andy.fingerhut23:04:35

It is definitely not something I've done for a day job, ever.

Spaceman23:04:22

Actually, marking the file destroyable right after it's been served is probably not a good idea in my case I don't want to regenerate the file every time the user refreshes the page (which is critical to the application.) Seems like I would need to save the file anyway, and frankly I don't know much about the pros and cons of caching in web servers and how useful it is in practice for performance.

andy.fingerhut23:04:24

I don't know best practices in this area of web server software dev. Eventually dynamically generated files do need to be deleted, so you can't support reload 478 days later and expect the file to still be sitting in the file system.

😀 4

Spaceman23:04:08

Check this out: https://github.com/ptaoussanis/carmine

Spaceman23:04:33

A redis data-store with logic to destroy-this-file-after-x-seconds and a cap on max-memory.

andy.fingerhut23:04:21

It doesn't surprise me at all if people have implemented such caching logic N times in N different ways, where N is at least 100

andy.fingerhut23:04:44

And have had detailed and perhaps even vehement discussion about the advantages and disadvantages of several of them, compared to each other.

Spaceman23:04:47

Redis is a popular open source framework, and I've heard of it before. Not so much any others really.

Spaceman23:04:03

also its heroku add-on is free for development/hobby, which is great

potetm00:04:05

I mean, assuming you want to load the whole image in memory anyways, you can use (:out (shell/sh …)). That’s the stdout of the shell process.

potetm00:04:25

by default it returns a byte array, which you can pass to an image lib

Spaceman00:04:06

(type (:out (shell/sh "ls"))) returns java.lang.String. So it would have to be a b64 output right? That seems hacky to me!

andy.fingerhut00:04:56

The reason for discussing file system stuff is, if an HTTP server wants to reference an image in a src="filename.jpg" reference, it needs to logically be in a file system on some server somewhere in the world, right? Or is there a way in HTTP to send back HTML with actual in-line image data?

potetm00:04:34

Uh, you just gotta send a bytestream of a jpg back? How could anyone possibly know if it was sitting on the disk or not?

potetm00:04:08

@U010Z4Y1J4Q use `

:out-enc :bytes

potetm00:04:25

(per the doc string of sh)

potetm01:04:36

@U0CMVHBL2 Pretty sure you set the response headers properly (e.g. Content-Type) and you can put whatever in the body.

potetm01:04:51

File servers just happen to set all of that for you.

Spaceman01:04:34

I haven't tried it but does that mean that I return an opencv mat from the python script, set the header to image/png in the response and I'm good to go?!

andy.fingerhut01:04:52

I didn't say disk, I said "file system", but even that is too specific. If you have an element img src="string" link in an HTML file, then the browser is likely to later make a request for the contents of whatever "string" refers to, whether it is on disk or generated on demand, yes?

andy.fingerhut01:04:15

I suppose that data could be generated on the fly at that time, rather than when the HTML containing img src="string" was sent back.

andy.fingerhut01:04:08

I guess I need that second coffee today, after all 🙂

potetm01:04:36

@U010Z4Y1J4Q I’m pretty sure, yes.

potetm01:04:16

@U0CMVHBL2 At the end of the day: browser requests an image, server returns some image bytes. Don’t matter where they come from.

andy.fingerhut01:04:14

Right, understood. In the context of the discussion @U010Z4Y1J4Q has been asking about, it sounds like that means if he is willing to generate the images only when they are actually requested by the clients, then the important thing is to maintain a set of image names, but no actual storage for them if he doesn't want to. Storage space for image data could be maintained somewhere, but that is optional if waiting for the image data to be generated at the last moment before it is needed is acceptable in his use case.

potetm01:04:36

His use case is unclear to me. But he asked a pretty straightforward question initially.

andy.fingerhut01:04:25

He did. He also had the misfortune to have me responding, instead of someone who didn't go down a garden path 🙂

potetm01:04:21

happens to the best of us 🙂 (and you weren’t the only one)

emccue04:04:55

@U010Z4Y1J4Q how long is segmentation.py?

emccue04:04:27

libpython-clj would let you avoid the call to shell and create an actual stream

emccue04:04:47

but you do create a bottleneck with the python GIL that wouldn't be there in the shell case

emccue04:04:39

but tbt if there isnt all that much going on in segmentation.py, you could rewrite it in clojure/java without much fuss

emccue04:04:43

also, as is the case with all performance optimizations, i'd say do the laziest thing that works until it makes the app too slow

👍 4

jumar06:04:52

@U010Z4Y1J4Q It sure sounds like a premature optimization- running 100,000 scripts at the same time won’t be great no matter what. Also with the way clojure java shell works you’re likely going end up spawning 3x as many threads. Moreover, OS usually caches disk IO so reading from a file written just a moment ago doesn’t have to mean it’s read from disk. I’m curious what kind of service is this and if there’s a justification for such scaling requirements - I doubt so, especially since you’re actually asking these questions ;); I wouldn’t be surprised if it was more along the lines 100 r/s

Spaceman13:04:51

I love libpython-clj. @U3JH98J4R, I can run several instances of python in libpython-clj to overcome gil

Spaceman13:04:43

I love how libpython-clj makes python interop so seamless

emccue14:04:59

Yeah, that being said I don't think overcoming the Gil is a supported thing - with good reason

emccue14:04:12

Def. Try it and profile

emccue14:04:30

Or guestimate

emccue14:04:54

But there's no shame in a little bottlenecking or excessive scaling

😀 4

Spaceman14:04:45

I don't follow you. Why is def a good reason? You can create multiple refs to the python interpreter and run the threads of course

emccue03:04:51

Well because libpython-clj marries a python environment to the jvm

emccue03:04:16

So if you want to use python objects together they need to be from the same enviroment

emccue03:04:06

Therefore it uses a global reference to an interpreter by default, which is the best user experience

emccue03:04:17

And the first target audience is the data science crowd, which is the place where pythons ecosystem beats the jvm

emccue03:04:51

And that space already uses a single thread to queue off native code hubaloo which probably runs as parallel as it can

emccue03:04:24

So the Gil and multiple interpreters isn't really a thing that would matter all that much

emccue03:04:37

That being said, it might be supported I just don't know

emccue03:04:35

Either way I said to def profile Because even with the Gil you probably meet your performance requirements

emccue03:04:00

And it's possible maybe that the filesystem approach is the fastest

emccue03:04:29

You never know except by profiling, and guessing is an imprecise art

💯 4

Spaceman12:04:36

I guess guessing takes less effort than profiling that's why people are more prone to doing it.

2020-04-04

Channels