Fork me on GitHub
#clojure
<
2020-08-22
>
mac0102101:08:24

Anyone using no.disassemble with a deps.edn project? What does one do to make the bytecode available when not running via leiningen?

seancorfield02:08:31

@mac01021 I can't speak to no.disassemble but I have used this from a deps.edn project: https://github.com/clojure-goes-fast/clj-java-decompiler

seancorfield02:08:32

(there's an alias for that in my dot-clojure repo's deps.edn file)

seancorfield02:08:02

Oh, I see you've opened an issue there... so it's about specifying a JVM option, to the actual path of the no.disassemble JAR...

mac0102102:08:17

@seancorfield Yes that's right. But thanks for the pointer to clj-java-decompiler! It looks like if I can't get the java agent thing to work, then your link will meet my needs

seancorfield02:08:55

I answered you in that issue on the repo.

seancorfield02:08:04

I copy/pasted what worked for me locally.

seancorfield02:08:41

I'll paste it in here for anyone else:

$ clj -Sdeps '{:deps {nodisassemble/nodisassemble {:mvn/version "0.1.3"}}}' -J-javaagent:$HOME/.m2/repository/nodisassemble/nodisassemble/0.1.3/nodisassemble-0.1.3.jar
Initializing NoDisassemble Transformer
Clojure 1.10.1
user=> (require '[no.disassemble :as nope])
nil
user=> (println (nope/disassemble (fn [])))
// Compiled from NO_SOURCE_FILE (version unknown : 52.0, super bit)
public final class user$eval360$fn__361 extends clojure.lang.AFunction {

mac0102120:08:33

Thank you! Sorry I didn't reply earlier. I had to run away at that moment to deal with my 3-year-old.

pavel.klavik19:08:24

Hi, I am trying to pinpoint why sending larger WebSocket messages (about 400 kB) is so slow on our production server. All the sent data are already loaded in the memory, so I instantly call Sente send! function. I am using transit as a message protocol. The data get to the client after multiple seconds, frequently taking over 30 seconds. When I tried to send a similar amount of data from my development server on localhost, it was almost instant. Any tips what could be going wrong or how to debug this better?

rutledgepaulv04:08:09

To me this sounds like maybe you have response buffering enabled on an intermediate proxy / load balancer.

pavel.klavik19:08:45

@ Why do you think this is responsible? What should we look for? We are using nginx.

rutledgepaulv19:08:47

Because I've seen similar things before and some proxies by default accumulate packets from the upstream server before sending any to the client. This is not behavior you want when using web sockets. Sounds like this is true of nginx as well: https://serverfault.com/questions/789417/should-proxy-buffering-be-disabled-in-nginx-to-support-sockjs-xhr-streaming

rutledgepaulv20:08:09

So sounds like you want to disable the proxy_buffering nginx directive

pavel.klavik21:08:41

So we were measuring the situation by running tcpdump between our Clojure server and nginx in front of it, and it seems that the delay occurs in Clojure server, than when the data reach nginx, they are forwarded immediately to the client

pavel.klavik21:08:03

We tested this just to be sure but it does not seem to improve the speed (it could improve latency probably, but not our problem)

rutledgepaulv04:08:46

Got it. Curious problem. I'm not familiar enough with sente and that stack to know what might be the cause. I just use ring-jetty9 websocket support and a thin adapter layer for core.async to implement the sending

rutledgepaulv04:08:32

If core.async is involved via sente and you're using core.async yourselves could there be a dispatch pool starvation problem because of blocking operations being run on dispatch threads? I've seen that manifest itself in strange ways like this. Jstack is useful in that case to understand what the core.async dispatch threads are doing

pavel.klavik14:08:58

Sounds like it, was looking into Sente documentation and it seems to be a possible problem. The issue is that the precomputed values I am sending are computed lazily and their realization requires accessing DB on a different machine multiple times. Will rewrite the code so it works faster.

pavel.klavik19:08:16

@ @ Thanks a lot for your help and discussing potential issues. There were two problems in the end: computation and blocking database I/O when sending the response, and blocking of all threads in Sente pool so the server stopped responding to otherwise quick messages. I fixed the problems by setting Sente to use it's own thread pool (as discussed here: https://github.com/ptaoussanis/sente/issues/265) and by realizing all lazy sequences before sending them to Sente (in the case of administration, we were precomputing data every 15 minutes, but they contained lazy sequences which needed to read further values from DB for realization). The resulting speed looks like this:

lanejo0119:08:23

Glad you got things resolved @pavel.klavik!

lanejo0119:08:26

@pavel.klavik Are you being throttled by your network provider? can you use something like https://github.com/websockets/wscat to see if it's related to the browser, your network provider (aws, for example) or your server (can you hit your ws endpoint from within your prod vpc, for example and see similar timings?)

lanejo0119:08:13

Is it possible you have a large number of connections in your prod env and you're looping over them to find the right one / broadcasting to all? If you're broadcasting and looping over conns, is it possible your algorithm is just O(n)?

pavel.klavik19:08:25

The problem should not be with my internet connection, it is slow everywhere. Also the app downloads a lot of images and other resources while running and they are very fast. Small WS messages run quickly as shown in the screenshot.

pavel.klavik19:08:56

No, this particular message is send to just a single connection and overall I don't expect to have more than ~10 connections active at any moment.

lanejo0119:08:01

Sure, but maybe the provider throttles larger WS messages.

lanejo0119:08:29

Also, for transit, are you using the :json format or :json-verbose?

pavel.klavik19:08:26

Not sure, setting it up like this:

(sente/make-channel-socket!
                      (aleph/get-sch-adapter)
                      {:user-id-fn (fn [req] (:client-id req))
                       :packer     (sente-transit/get-transit-packer)})

lanejo0119:08:31

Can you actually identify how many conns you have in prod? Maybe you're not cleaning them up?

pavel.klavik19:08:47

so it seems as :json, by checking the code

pavel.klavik19:08:42

is throttling larger WS messages common?

lanejo0119:08:34

It wouldn't surprise me, half a mb of transit is a lot of data.

pavel.klavik19:08:22

hmm, so by checking into :connected-uids, we have 5 connection at the moment, we don't really have high traffic on the server and the same happens after restarting it

pavel.klavik19:08:37

how would you use wscat to pinpoint the problem?

smith.adriane19:08:32

chrome devtools has good support for showing websocket connection info. have you narrowed it down whether the issue on the browser side or the server side?

pavel.klavik19:08:08

What should I look for? Not sure where is the problem, I will definitely try to look at my server traffic how long it takes before it is send.

pavel.klavik19:08:04

I got this in headers tab, and there are frames available

smith.adriane20:08:35

@pavel.klavik , check the messages tab. It should give the timing of the messages. if the message finishes sending quickly, then it's probably a server issue. if it's slow, then it's probably the browser

pavel.klavik20:08:57

It takes about 30 seconds there from asking for the data till receiving it.

smith.adriane20:08:17

seems like it's a server side issue

pavel.klavik20:08:03

client-side is very unlikely since I am doing very little there, just displaying the data

pavel.klavik20:08:50

so it either happens on the server-side, or in the network in between, will try to find out tomorrow when our devops guy comes back from vacation

rdcoold19:08:34

Is it only for ws? or for any large message? Just maybe you have small tcp_sndbuf (to handle many connections) Also local tests as far as I remember dont use network adapter at all.

pavel.klavik19:08:00

Happens only for large WS messages. Downloading or uploading images or even large files works fine.

lanejo0119:08:55

Spin up a machine in the same datacenter / VPC as your prod server, install wscat on it, simulate the ws connection like above, time it using unix time or a stopwatch. You need to bisect the problem.

lanejo0119:08:28

@pavel.klavik To confirm, when running locally you can send that large payload no problem?

pavel.klavik19:08:39

Ya, that is in the second screenshot, it took about 200 ms

lanejo0119:08:04

Nginx might be your issue. Did this "just start happening" or did you try this in prod for the first time now?

pavel.klavik20:08:13

I think it was always slower but became much more noticable recently as the number of our users/data is growing

pavel.klavik20:08:19

It might be something involving our nginx configuration, we will need to test everything to see where the problem could be.

lanejo0120:08:26

Ok. Bisect the problem by doing the wscat mentioned above in your Higher environment. Then you will know if the slowness is in your server code or not.

lanejo0120:08:13

Going directly to the server, not through nginx in the higher env.

pavel.klavik20:08:14

sure, plus we can look into nginx logs and network data there to see how fast it is, thx for pointers

lanejo0120:08:51

you could also instrument your code for observability and redeploy to prod.

lanejo0120:08:42

Or clone prod data locally (depending on your industry) and try it again locally with the same amount of data and see if it's still 200ms or not. Good luck!

pavel.klavik20:08:52

we also have a staging server, so we can play there, data should not be very different from our testing dev data I have

pavel.klavik20:08:00

Btw. Sente or WS directly are merging multiple messages into a single frame. Is there a way to not do it?

pavel.klavik20:08:24

In the figures above, I am sending two messages but get a combined reply in a single frame.

lanejo0120:08:35

No idea, sorry!

pavel.klavik21:08:05

@ So we did some digging and by running tcpdump in between of our Clojure server and nginx, we found out that the delay is caused by Clojure server

pavel.klavik21:08:35

Further, we did an experiment on our staging server running the same code and it is much faster there, so I am quite puzzled

pavel.klavik21:08:35

By reading the code, it seems that sente send-fn! is async, not sure how to get insight further

lanejo0123:08:13

Add some instrumentation. What are your observability capabilities? Are you running out of memory? Disk? Is your VPS provider throttling that environment?

pavel.klavik23:08:53

After more digging, I think it is just related to this https://github.com/ptaoussanis/sente/issues/265 and that the precomputed data are stored lazily, so they are realized when we ask for them, costing the extra delay. I will need to do some further experiments with it.