Fork me on GitHub
#clj-otel
<
2024-07-01
>
steffan18:07:01

📣 clj-otel release 0.2.7 is out 🔭 Bumped OpenTelemetry deps to 1.39.0 🌡️ Support for creating and setting gauges that take measurements synchronously 📊 Option to set explicit bucket boundaries advice for histograms 🧹 Removed wrapper modules for obsolete Jaeger exporters, since Jaeger natively supports OTLP 📈 Added Grafana visualization for microservices examples, with pre-configured dashboards Major refactor of microservices examples: Embedded nREPL, state management, simpler & faster build 🐘 Added example with instrumented PostgreSQL database access :spiral_note_pad: See the https://github.com/steffan-westcott/clj-otel/blob/master/CHANGELOG.adoc for more information on changes in this release https://github.com/steffan-westcott/clj-otel

🙌 2
Bill Adams18:07:43

Thanks for the release Steffan.

1
Bill Adams18:07:44

Right now I'm having trouble getting the basic examples to work. I checked out tag 0.2.7, downloaded the 2.5.0 version of the agent (for the 1.39.0 SDK) into the examples folder. I cleaned out any, docker images I had laying around and just tried starting cube, divisor, et al from the command line (via up.sh ) but it looks like I'm getting:

[otel.javaagent 2024-07-02 13:52:50:205 -0500] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 2.5.0
Clojure 1.11.3
user=> [otel.javaagent 2024-07-02 13:52:55:355 -0500] [OkHttp ] ERROR io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export metrics. The request could not be executed. Full error message: unexpected end of stream on ...
java.io.IOException: unexpected end of stream on ...
	at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:210)
	at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
Ubuntu 22.04.4 LTS , openjdk version "17" 2021-09-14, Clojure CLI version 1.11.3.1463 I checked prior to starting compose up - nothing was running or listening on that port. :man-shrugging:

steffan19:07:37

That is odd. As a sanity check, I've just tried cube and divisor examples and both are working as expected. Can you show the output of ./up.sh ?

steffan19:07:28

For reference, this is the output of ./up.sh for the divisor-app example:

❯ ./up.sh
[+] Running 3/3
✔ Network divisor-app_default         Created                                                                                                                                  ✔ Container divisor-app-prometheus-1  Started                                                                                                                                  ✔ Container divisor-app-jaeger-1      Started

Bill Adams19:07:07

Same

✔ Network divisor-app_default         Created                                                                                                                                                                                                                                                                    0.2s 
 ✔ Container divisor-app-prometheus-1  Started                                                                                                                                                                                                                                                                    0.5s 
 ✔ Container divisor-app-jaeger-1      Started 
docker containers look ok
CONTAINER ID   IMAGE                             COMMAND                  CREATED          STATUS          PORTS                                                                                                                                                                   NAMES
29a51a0dc6bf   prom/prometheus:latest            "/bin/prometheus --c…"   24 seconds ago   Up 24 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                                                                                               divisor-app-prometheus-1
78f07e4bb026   jaegertracing/all-in-one:latest   "/go/bin/all-in-one-…"   24 seconds ago   Up 24 seconds   4318/tcp, 5775/udp, 5778/tcp, 9411/tcp, 14250/tcp, 0.0.0.0:4317->4317/tcp, :::4317->4317/tcp, 0.0.0.0:16686->16686/tcp, :::16686->16686/tcp, 14268/tcp, 6831-6832/udp   divisor-app-jaeger-1  

Bill Adams19:07:44

oh this is weird.... it stopped crapping out. hold on

Bill Adams19:07:30

cube still is a problem ... hmm

steffan19:07:53

Note that the OTLP port and protocol used differs when using agent 2.x.x. The default was changed from gRPC on port 4317 to HTTP/protobuf on port 4318. All example apps have been updated to assume agent 2.x.x is used. Confusingly, the default OTLP when using the SDK is unchanged from port 4317.

Bill Adams19:07:08

interesting - fwiw I am using agent v 2.5.0.

1
steffan19:07:53

To clarify, divisor-app is working but cube-app is not?

Bill Adams19:07:20

cube app is not able to connect given the reasons stated above. divisor comes up and is able to export metrics, it does complain about > 62024-07-02T14:20:40,510 WARN (r)etry.exporter.internal.grpc.GrpcExporter Failed to export spans. Server responded with gRPC status code 2. Error message: Broken pipe And as expected I can't find the span in the local jaeger. (service is not listed)

steffan19:07:00

Before running, ./up.sh, are any other containers running?

Bill Adams19:07:47

None. I made sure no process was listening on any of those ports either. I could go through it again.

steffan19:07:19

I wonder if the Docker images are up to date. Before doing ./up.sh, try docker compose pull

Bill Adams19:07:37

pulled - returned almost instantaneously.

steffan19:07:31

After running ./up.sh for cube-app, what output do you get for docker ps?

Bill Adams19:07:36

CONTAINER ID   IMAGE                             COMMAND                  CREATED              STATUS              PORTS                                                                                                                                                                   NAMES
43c65356179c   prom/prometheus:latest            "/bin/prometheus --c…"   About a minute ago   Up About a minute   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                                                                                               divisor-app-prometheus-1
7fc82707b374   jaegertracing/all-in-one:latest   "/go/bin/all-in-one-…"   About a minute ago   Up About a minute   4318/tcp, 5775/udp, 5778/tcp, 9411/tcp, 14250/tcp, 0.0.0.0:4317->4317/tcp, :::4317->4317/tcp, 0.0.0.0:16686->16686/tcp, :::16686->16686/tcp, 14268/tcp, 6831-6832/udp   divisor-app-jaeger-1

steffan19:07:27

That is the output for divisor-app. Each example has its own Docker compose file.

Bill Adams19:07:42

eek! hold on. sorry - long day - fighting (other) fires. :face_with_open_eyes_and_hand_over_mouth:

Bill Adams19:07:17

ONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS         PORTS                                                                                                                             NAMES
19f3ed6b89ea   otel/opentelemetry-collector:latest   "/otelcol --config=/…"   4 minutes ago   Up 2 seconds   4317/tcp, 55678-55679/tcp, 0.0.0.0:4318->4318/tcp, :::4318->4318/tcp                                                              cube-app-otel-collector-1
46072a33b68d   prom/prometheus:latest                "/bin/prometheus --c…"   4 minutes ago   Up 2 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                                                         cube-app-prometheus-1
ba2f7e644be6   jaegertracing/all-in-one:latest       "/go/bin/all-in-one-…"   4 minutes ago   Up 2 seconds   4317-4318/tcp, 5775/udp, 5778/tcp, 9411/tcp, 14250/tcp, 14268/tcp, 6831-6832/udp, 0.0.0.0:16686->16686/tcp, :::16686->16686/tcp   cube-app-jaeger-1

steffan19:07:22

Looks very strange. The agent running in cube-app should be exporting OTLP to port 4318. The docker ps shows that this should be directed to port 4318 of the collector instance. But for some reason, this communication is blocked.

steffan19:07:34

Is this the first time you've tried running with agent 2.x.x? Also, what happens if you try to run one of the microservices/auto-instrument examples?

Bill Adams19:07:16

2nd time. Before this I was on 0.2.6 using agent v.2.0.0 - I haven't run the examples in several weeks.

Bill Adams19:07:33

I'll try the microservice example now

Bill Adams19:07:19

Auto instrument docker ps

5240b0df1ca5   clojure:temurin-22-tools-deps-alpine          "entrypoint clojure …"   42 seconds ago   Up 19 seconds             0.0.0.0:8092->8092/tcp, :::8092->8092/tcp   middleware-sentence-summary-load-gen-1
0705a70177f2   clojure:temurin-22-tools-deps-alpine          "entrypoint clojure …"   42 seconds ago   Up 30 seconds (healthy)   0.0.0.0:8090->8090/tcp, :::8090->8090/tcp   middleware-sentence-summary-service-1
aaa17c6929f9   clojure:temurin-22-tools-deps-alpine          "entrypoint clojure …"   42 seconds ago   Up 41 seconds (healthy)   0.0.0.0:8091->8091/tcp, :::8091->8091/tcp   middleware-word-length-service-1
0225611309c9   grafana/grafana-oss:11.1.0                    "/run.sh"                42 seconds ago   Up 41 seconds             0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   middleware-grafana-1
47d8ea08dbdc   otel/opentelemetry-collector-contrib:latest   "/otelcol-contrib --…"   42 seconds ago   Up 41 seconds             4317/tcp, 55678-55679/tcp                   middleware-otel-collector-1
c2d64c834bcf   prom/prometheus:v2.53.0                       "/bin/prometheus --c…"   43 seconds ago   Up 41 seconds             9090/tcp                                    middleware-prometheus-1
ff3eb2e42207   grafana/tempo:2.5.0                           "/tempo -config.file…"   43 seconds ago   Up 41 seconds                                                         middleware-tempo-1

steffan20:07:59

Looks good so far. Now navigate to http://localhost:3000/ and you should see (among other things) a list of traces. Try clicking on one for route GET /summary and you should see a trace

Bill Adams20:07:49

grafana's up but empty - (No data), Not even JVM metrics. weird

steffan20:07:36

What do you get with docker -v ?

Bill Adams20:07:51

Docker version 27.0.2, build 912c1dd

1
Bill Adams20:07:30

I don't have desktop installed at all either - messed me up awhile ago.

steffan20:07:42

Something seems to be fundamentally affecting docker on your machine. Can you recall any other work with docker compose that is working?

steffan20:07:52

(on a side note: For the microservices examples, it takes around a minute of uptime for meaningful output to appear in Grafana)

Bill Adams20:07:13

yeah - we use docker (with compose at) work and I can stand up our application (comes in 3 parts) - I could try some sample app.

steffan20:07:27

In any of the microservices apps, after starting with ./up.sh, you could try tailing the logs of the container running image otel/opentelemetry-collector-contrib:latest. Use command docker logs -f <containerid> Normal output for the collector should settle down to output like this:

2024-07-02T20:16:07.345Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 19, "data points": 50}
2024-07-02T20:16:08.219Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 40}
2024-07-02T20:16:08.822Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 53}
2024-07-02T20:16:09.553Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 3, "metrics": 15, "data points": 45}
2024-07-02T20:16:12.968Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 18, "data points": 46}
2024-07-02T20:16:13.241Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 22}
2024-07-02T20:16:13.843Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 26}

steffan20:07:36

If there are no messages like this, then no messages are reaching the Collector, which would be very odd.

Bill Adams20:07:48

Yeah nothing is hitting the otel-collector. I could see the generator requesting and getting responses ... still digging.

Bill Adams20:07:23

found something.

Bill Adams20:07:48

> ... 18 more > [otel.javaagent 2024-07-02 20:23:05:616 +0000] [OkHttp http://otel-collector:4318/...] ERROR io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export metrics. The request could not be executed. Full error message: Failed to connect to otel-collector/172.20.0.5:4318 > java.net.ConnectException: Failed to connect to otel-collector/172.20.0.5:4318 > at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:297) > at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207) > at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226) > at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106) > at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74) > at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255) > at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32) on : middleware-sentence-summary-service-1

Bill Adams20:07:34

same on the middleware-word-length-service-1 too .

Bill Adams20:07:13

oh... > 1b84f1ed180c otel/opentelemetry-collector-contrib:latest "/otelcol-contrib --…" 3 minutes ago Up 3 minutes 4317/tcp, 55678-55679/tcp middleware-otel-collector-1

steffan20:07:18

Something is up. I'm having problems now

Bill Adams20:07:28

I'm trying something locally in the otel collector config.

Bill Adams20:07:47

I fixed it I think.

Bill Adams20:07:19

"Fixed" is a stretch of course 😉 its publishing metrics to grafana now.

steffan20:07:59

I think some changes have crept into the Collector. I'm analysing it now.

👍 1
Bill Adams20:07:11

in the meantime - that extra line in the otel-collector config fixed cube-app.

steffan21:07:17

Thanks! I've also discovered changes in Jaeger, so I've several updates to make.

👍 1
Bill Adams21:07:43

want me to put up a PR on the otel-collector config(s) change?

steffan21:07:31

I'd like to investigate further as there are several issues at play. I've just discovered that the latest Jaeger build breaks things, but the most recent tagged build works.

👍 1
Bill Adams21:07:50

sure thing np.

steffan21:07:51

Thanks for the offer though

👍 1
Bill Adams21:07:37

I've got 0.2.7 in our staging env - testing it out now.

Bill Adams21:07:00

got to use the synchronous gauge - that was a win.

Bill Adams21:07:26

and the new histogram bucket advice (without having to override it with a hack)

steffan23:07:23

The Collector (and contrib Collector) had a breaking change introduced yesterday, where the OTLP receiver by default uses localhost instead of 0.0.0.0. https://github.com/open-telemetry/opentelemetry-collector/releases/tag/v0.104.0 Many thanks for finding the fix! 🙇:skin-tone-3:

steffan23:07:01

I've pushed your fix, along with a fix for the broken Jaeger image. You should be fine using clj-otel 0.2.7, as those fixes address just the examples and not clj-otel modules.

🙌 1
Bill Adams16:07:50

@U2C6SPLDS you are very welcome! Thank you for taking the time to 'plod' through this with me. Cheers!