Fork me on GitHub
#clojure
<
2020-08-22
>
mac0102101:08:24

Anyone using no.disassemble with a deps.edn project? What does one do to make the bytecode available when not running via leiningen?

seancorfield02:08:31

@mac01021 I can't speak to no.disassemble but I have used this from a deps.edn project: https://github.com/clojure-goes-fast/clj-java-decompiler

seancorfield02:08:32

(there's an alias for that in my dot-clojure repo's deps.edn file)

seancorfield02:08:02

Oh, I see you've opened an issue there... so it's about specifying a JVM option, to the actual path of the no.disassemble JAR...

mac0102102:08:17

@seancorfield Yes that's right. But thanks for the pointer to clj-java-decompiler! It looks like if I can't get the java agent thing to work, then your link will meet my needs

seancorfield02:08:55

I answered you in that issue on the repo.

seancorfield02:08:04

I copy/pasted what worked for me locally.

seancorfield02:08:41

I'll paste it in here for anyone else:

$ clj -Sdeps '{:deps {nodisassemble/nodisassemble {:mvn/version "0.1.3"}}}' -J-javaagent:$HOME/.m2/repository/nodisassemble/nodisassemble/0.1.3/nodisassemble-0.1.3.jar
Initializing NoDisassemble Transformer
Clojure 1.10.1
user=> (require '[no.disassemble :as nope])
nil
user=> (println (nope/disassemble (fn [])))
// Compiled from NO_SOURCE_FILE (version unknown : 52.0, super bit)
public final class user$eval360$fn__361 extends clojure.lang.AFunction {

mac0102120:08:33

Thank you! Sorry I didn't reply earlier. I had to run away at that moment to deal with my 3-year-old.

Pavel Klavík19:08:24

Hi, I am trying to pinpoint why sending larger WebSocket messages (about 400 kB) is so slow on our production server. All the sent data are already loaded in the memory, so I instantly call Sente send! function. I am using transit as a message protocol. The data get to the client after multiple seconds, frequently taking over 30 seconds. When I tried to send a similar amount of data from my development server on localhost, it was almost instant. Any tips what could be going wrong or how to debug this better?

rutledgepaulv04:08:09

To me this sounds like maybe you have response buffering enabled on an intermediate proxy / load balancer.

Pavel Klavík19:08:45

@U5RCSJ6BB Why do you think this is responsible? What should we look for? We are using nginx.

rutledgepaulv19:08:47

Because I've seen similar things before and some proxies by default accumulate packets from the upstream server before sending any to the client. This is not behavior you want when using web sockets. Sounds like this is true of nginx as well: https://serverfault.com/questions/789417/should-proxy-buffering-be-disabled-in-nginx-to-support-sockjs-xhr-streaming

rutledgepaulv20:08:09

So sounds like you want to disable the proxy_buffering nginx directive

Pavel Klavík21:08:41

So we were measuring the situation by running tcpdump between our Clojure server and nginx in front of it, and it seems that the delay occurs in Clojure server, than when the data reach nginx, they are forwarded immediately to the client

Pavel Klavík21:08:03

We tested this just to be sure but it does not seem to improve the speed (it could improve latency probably, but not our problem)

rutledgepaulv04:08:46

Got it. Curious problem. I'm not familiar enough with sente and that stack to know what might be the cause. I just use ring-jetty9 websocket support and a thin adapter layer for core.async to implement the sending

rutledgepaulv04:08:32

If core.async is involved via sente and you're using core.async yourselves could there be a dispatch pool starvation problem because of blocking operations being run on dispatch threads? I've seen that manifest itself in strange ways like this. Jstack is useful in that case to understand what the core.async dispatch threads are doing

Pavel Klavík14:08:58

Sounds like it, was looking into Sente documentation and it seems to be a possible problem. The issue is that the precomputed values I am sending are computed lazily and their realization requires accessing DB on a different machine multiple times. Will rewrite the code so it works faster.

Pavel Klavík19:08:16

@U0CJ19XAM @U5RCSJ6BB Thanks a lot for your help and discussing potential issues. There were two problems in the end: computation and blocking database I/O when sending the response, and blocking of all threads in Sente pool so the server stopped responding to otherwise quick messages. I fixed the problems by setting Sente to use it's own thread pool (as discussed here: https://github.com/ptaoussanis/sente/issues/265) and by realizing all lazy sequences before sending them to Sente (in the case of administration, we were precomputing data every 15 minutes, but they contained lazy sequences which needed to read further values from DB for realization). The resulting speed looks like this:

Joe Lane19:08:23

Glad you got things resolved @pavel.klavik!

Joe Lane19:08:26

@pavel.klavik Are you being throttled by your network provider? can you use something like https://github.com/websockets/wscat to see if it's related to the browser, your network provider (aws, for example) or your server (can you hit your ws endpoint from within your prod vpc, for example and see similar timings?)

Joe Lane19:08:13

Is it possible you have a large number of connections in your prod env and you're looping over them to find the right one / broadcasting to all? If you're broadcasting and looping over conns, is it possible your algorithm is just O(n)?

Pavel Klavík19:08:25

The problem should not be with my internet connection, it is slow everywhere. Also the app downloads a lot of images and other resources while running and they are very fast. Small WS messages run quickly as shown in the screenshot.

Pavel Klavík19:08:56

No, this particular message is send to just a single connection and overall I don't expect to have more than ~10 connections active at any moment.

Joe Lane19:08:01

Sure, but maybe the provider throttles larger WS messages.

Joe Lane19:08:29

Also, for transit, are you using the :json format or :json-verbose?

Pavel Klavík19:08:26

Not sure, setting it up like this:

(sente/make-channel-socket!
                      (aleph/get-sch-adapter)
                      {:user-id-fn (fn [req] (:client-id req))
                       :packer     (sente-transit/get-transit-packer)})

Joe Lane19:08:31

Can you actually identify how many conns you have in prod? Maybe you're not cleaning them up?

Pavel Klavík19:08:47

so it seems as :json, by checking the code

Pavel Klavík19:08:42

is throttling larger WS messages common?

Joe Lane19:08:34

It wouldn't surprise me, half a mb of transit is a lot of data.

Pavel Klavík19:08:22

hmm, so by checking into :connected-uids, we have 5 connection at the moment, we don't really have high traffic on the server and the same happens after restarting it

Pavel Klavík19:08:37

how would you use wscat to pinpoint the problem?

phronmophobic19:08:32

chrome devtools has good support for showing websocket connection info. have you narrowed it down whether the issue on the browser side or the server side?

Pavel Klavík19:08:08

What should I look for? Not sure where is the problem, I will definitely try to look at my server traffic how long it takes before it is send.

Pavel Klavík19:08:04

I got this in headers tab, and there are frames available

phronmophobic20:08:35

@pavel.klavik , check the messages tab. It should give the timing of the messages. if the message finishes sending quickly, then it's probably a server issue. if it's slow, then it's probably the browser

Pavel Klavík20:08:57

It takes about 30 seconds there from asking for the data till receiving it.

phronmophobic20:08:17

seems like it's a server side issue

Pavel Klavík20:08:03

client-side is very unlikely since I am doing very little there, just displaying the data

Pavel Klavík20:08:50

so it either happens on the server-side, or in the network in between, will try to find out tomorrow when our devops guy comes back from vacation

Alexander G19:08:34

Is it only for ws? or for any large message? Just maybe you have small tcp_sndbuf (to handle many connections) Also local tests as far as I remember dont use network adapter at all.

Pavel Klavík19:08:00

Happens only for large WS messages. Downloading or uploading images or even large files works fine.

Joe Lane19:08:55

Spin up a machine in the same datacenter / VPC as your prod server, install wscat on it, simulate the ws connection like above, time it using unix time or a stopwatch. You need to bisect the problem.

Joe Lane19:08:28

@pavel.klavik To confirm, when running locally you can send that large payload no problem?

Pavel Klavík19:08:39

Ya, that is in the second screenshot, it took about 200 ms

Joe Lane19:08:04

Nginx might be your issue. Did this "just start happening" or did you try this in prod for the first time now?

Pavel Klavík20:08:13

I think it was always slower but became much more noticable recently as the number of our users/data is growing

Pavel Klavík20:08:19

It might be something involving our nginx configuration, we will need to test everything to see where the problem could be.

Joe Lane20:08:26

Ok. Bisect the problem by doing the wscat mentioned above in your Higher environment. Then you will know if the slowness is in your server code or not.

Joe Lane20:08:13

Going directly to the server, not through nginx in the higher env.

Pavel Klavík20:08:14

sure, plus we can look into nginx logs and network data there to see how fast it is, thx for pointers

Joe Lane20:08:51

you could also instrument your code for observability and redeploy to prod.

Joe Lane20:08:42

Or clone prod data locally (depending on your industry) and try it again locally with the same amount of data and see if it's still 200ms or not. Good luck!

Pavel Klavík20:08:52

we also have a staging server, so we can play there, data should not be very different from our testing dev data I have

Pavel Klavík20:08:00

Btw. Sente or WS directly are merging multiple messages into a single frame. Is there a way to not do it?

Pavel Klavík20:08:24

In the figures above, I am sending two messages but get a combined reply in a single frame.

Joe Lane20:08:35

No idea, sorry!

Pavel Klavík21:08:05

@U0CJ19XAM So we did some digging and by running tcpdump in between of our Clojure server and nginx, we found out that the delay is caused by Clojure server

Pavel Klavík21:08:35

Further, we did an experiment on our staging server running the same code and it is much faster there, so I am quite puzzled

Pavel Klavík21:08:35

By reading the code, it seems that sente send-fn! is async, not sure how to get insight further

Joe Lane23:08:13

Add some instrumentation. What are your observability capabilities? Are you running out of memory? Disk? Is your VPS provider throttling that environment?

Pavel Klavík23:08:53

After more digging, I think it is just related to this https://github.com/ptaoussanis/sente/issues/265 and that the precomputed data are stored lazily, so they are realized when we ask for them, costing the extra delay. I will need to do some further experiments with it.