clj-otel

steffan 2024-07-01T18:02:01.075959Z

πŸ“£ clj-otel release 0.2.7 is out πŸ”­ Bumped OpenTelemetry deps to 1.39.0 🌑️ Support for creating and setting gauges that take measurements synchronously πŸ“Š Option to set explicit bucket boundaries advice for histograms 🧹 Removed wrapper modules for obsolete Jaeger exporters, since Jaeger natively supports OTLP πŸ“ˆ Added Grafana visualization for microservices examples, with pre-configured dashboards ⭐ Major refactor of microservices examples: Embedded nREPL, state management, simpler & faster build 🐘 Added example with instrumented PostgreSQL database access πŸ—’οΈ See the https://github.com/steffan-westcott/clj-otel/blob/master/CHANGELOG.adoc for more information on changes in this release https://github.com/steffan-westcott/clj-otel

πŸ™Œ 2
Bill Adams 2024-07-03T16:05:50.426399Z

@steffan you are very welcome! Thank you for taking the time to 'plod' through this with me. Cheers!

Bill Adams 2024-07-02T18:21:43.591209Z

Thanks for the release Steffan.

πŸ™‡πŸΌ 1
Bill Adams 2024-07-02T18:54:44.369059Z

Right now I'm having trouble getting the basic examples to work. I checked out tag 0.2.7, downloaded the 2.5.0 version of the agent (for the 1.39.0 SDK) into the examples folder. I cleaned out any, docker images I had laying around and just tried starting cube, divisor, et al from the command line (via up.sh ) but it looks like I'm getting:

[otel.javaagent 2024-07-02 13:52:50:205 -0500] [main] INFO io.opentelemetry.javaagent.tooling.VersionLogger - opentelemetry-javaagent - version: 2.5.0
Clojure 1.11.3
user=> [otel.javaagent 2024-07-02 13:52:55:355 -0500] [OkHttp ] ERROR io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export metrics. The request could not be executed. Full error message: unexpected end of stream on ...
java.io.IOException: unexpected end of stream on ...
	at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:210)
	at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
Ubuntu 22.04.4 LTS , openjdk version "17" 2021-09-14, Clojure CLI version 1.11.3.1463 I checked prior to starting compose up - nothing was running or listening on that port. πŸ€·β€β™‚οΈ

steffan 2024-07-02T19:07:37.521749Z

That is odd. As a sanity check, I've just tried cube and divisor examples and both are working as expected. Can you show the output of ./up.sh ?

steffan 2024-07-02T19:10:28.547629Z

For reference, this is the output of ./up.sh for the divisor-app example:

❯ ./up.sh
[+] Running 3/3
βœ” Network divisor-app_default         Created                                                                                                                                  βœ” Container divisor-app-prometheus-1  Started                                                                                                                                  βœ” Container divisor-app-jaeger-1      Started

Bill Adams 2024-07-02T19:13:07.968909Z

Same

βœ” Network divisor-app_default         Created                                                                                                                                                                                                                                                                    0.2s 
 βœ” Container divisor-app-prometheus-1  Started                                                                                                                                                                                                                                                                    0.5s 
 βœ” Container divisor-app-jaeger-1      Started 
docker containers look ok
CONTAINER ID   IMAGE                             COMMAND                  CREATED          STATUS          PORTS                                                                                                                                                                   NAMES
29a51a0dc6bf   prom/prometheus:latest            "/bin/prometheus --c…"   24 seconds ago   Up 24 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                                                                                               divisor-app-prometheus-1
78f07e4bb026   jaegertracing/all-in-one:latest   "/go/bin/all-in-one-…"   24 seconds ago   Up 24 seconds   4318/tcp, 5775/udp, 5778/tcp, 9411/tcp, 14250/tcp, 0.0.0.0:4317->4317/tcp, :::4317->4317/tcp, 0.0.0.0:16686->16686/tcp, :::16686->16686/tcp, 14268/tcp, 6831-6832/udp   divisor-app-jaeger-1  

Bill Adams 2024-07-02T19:14:44.586169Z

oh this is weird.... it stopped crapping out. hold on

Bill Adams 2024-07-02T19:15:30.206129Z

cube still is a problem ... hmm

steffan 2024-07-02T19:20:53.445399Z

Note that the OTLP port and protocol used differs when using agent 2.x.x. The default was changed from gRPC on port 4317 to HTTP/protobuf on port 4318. All example apps have been updated to assume agent 2.x.x is used. Confusingly, the default OTLP when using the SDK is unchanged from port 4317.

Bill Adams 2024-07-02T19:23:08.516699Z

interesting - fwiw I am using agent v 2.5.0.

πŸ‘πŸΌ 1
steffan 2024-07-02T19:24:53.643069Z

To clarify, divisor-app is working but cube-app is not?

Bill Adams 2024-07-02T19:28:20.528419Z

cube app is not able to connect given the reasons stated above. divisor comes up and is able to export metrics, it does complain about > 62024-07-02T14:20:40,510 WARN (r)etry.exporter.internal.grpc.GrpcExporter Failed to export spans. Server responded with gRPC status code 2. Error message: Broken pipe And as expected I can't find the span in the local jaeger. (service is not listed)

steffan 2024-07-02T19:29:00.812879Z

Before running, ./up.sh, are any other containers running?

Bill Adams 2024-07-02T19:29:47.054759Z

None. I made sure no process was listening on any of those ports either. I could go through it again.

steffan 2024-07-02T19:32:19.526749Z

I wonder if the Docker images are up to date. Before doing ./up.sh, try docker compose pull

Bill Adams 2024-07-02T19:33:37.828269Z

pulled - returned almost instantaneously.

steffan 2024-07-02T19:36:31.341539Z

After running ./up.sh for cube-app, what output do you get for docker ps?

Bill Adams 2024-07-02T19:37:36.901849Z

CONTAINER ID   IMAGE                             COMMAND                  CREATED              STATUS              PORTS                                                                                                                                                                   NAMES
43c65356179c   prom/prometheus:latest            "/bin/prometheus --c…"   About a minute ago   Up About a minute   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                                                                                               divisor-app-prometheus-1
7fc82707b374   jaegertracing/all-in-one:latest   "/go/bin/all-in-one-…"   About a minute ago   Up About a minute   4318/tcp, 5775/udp, 5778/tcp, 9411/tcp, 14250/tcp, 0.0.0.0:4317->4317/tcp, :::4317->4317/tcp, 0.0.0.0:16686->16686/tcp, :::16686->16686/tcp, 14268/tcp, 6831-6832/udp   divisor-app-jaeger-1

steffan 2024-07-02T19:40:27.299789Z

That is the output for divisor-app. Each example has its own Docker compose file.

Bill Adams 2024-07-02T19:41:42.099889Z

eek! hold on. sorry - long day - fighting (other) fires. 🫒

Bill Adams 2024-07-02T19:42:17.404699Z

ONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS         PORTS                                                                                                                             NAMES
19f3ed6b89ea   otel/opentelemetry-collector:latest   "/otelcol --config=/…"   4 minutes ago   Up 2 seconds   4317/tcp, 55678-55679/tcp, 0.0.0.0:4318->4318/tcp, :::4318->4318/tcp                                                              cube-app-otel-collector-1
46072a33b68d   prom/prometheus:latest                "/bin/prometheus --c…"   4 minutes ago   Up 2 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                                                         cube-app-prometheus-1
ba2f7e644be6   jaegertracing/all-in-one:latest       "/go/bin/all-in-one-…"   4 minutes ago   Up 2 seconds   4317-4318/tcp, 5775/udp, 5778/tcp, 9411/tcp, 14250/tcp, 14268/tcp, 6831-6832/udp, 0.0.0.0:16686->16686/tcp, :::16686->16686/tcp   cube-app-jaeger-1

steffan 2024-07-02T19:49:22.351719Z

Looks very strange. The agent running in cube-app should be exporting OTLP to port 4318. The docker ps shows that this should be directed to port 4318 of the collector instance. But for some reason, this communication is blocked.

steffan 2024-07-02T19:51:34.100249Z

Is this the first time you've tried running with agent 2.x.x? Also, what happens if you try to run one of the microservices/auto-instrument examples?

Bill Adams 2024-07-02T19:53:16.904259Z

2nd time. Before this I was on 0.2.6 using agent v.2.0.0 - I haven't run the examples in several weeks.

Bill Adams 2024-07-02T19:53:33.379279Z

I'll try the microservice example now

Bill Adams 2024-07-02T19:58:19.032569Z

Auto instrument docker ps

5240b0df1ca5   clojure:temurin-22-tools-deps-alpine          "entrypoint clojure …"   42 seconds ago   Up 19 seconds             0.0.0.0:8092->8092/tcp, :::8092->8092/tcp   middleware-sentence-summary-load-gen-1
0705a70177f2   clojure:temurin-22-tools-deps-alpine          "entrypoint clojure …"   42 seconds ago   Up 30 seconds (healthy)   0.0.0.0:8090->8090/tcp, :::8090->8090/tcp   middleware-sentence-summary-service-1
aaa17c6929f9   clojure:temurin-22-tools-deps-alpine          "entrypoint clojure …"   42 seconds ago   Up 41 seconds (healthy)   0.0.0.0:8091->8091/tcp, :::8091->8091/tcp   middleware-word-length-service-1
0225611309c9   grafana/grafana-oss:11.1.0                    "/run.sh"                42 seconds ago   Up 41 seconds             0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   middleware-grafana-1
47d8ea08dbdc   otel/opentelemetry-collector-contrib:latest   "/otelcol-contrib --…"   42 seconds ago   Up 41 seconds             4317/tcp, 55678-55679/tcp                   middleware-otel-collector-1
c2d64c834bcf   prom/prometheus:v2.53.0                       "/bin/prometheus --c…"   43 seconds ago   Up 41 seconds             9090/tcp                                    middleware-prometheus-1
ff3eb2e42207   grafana/tempo:2.5.0                           "/tempo -config.file…"   43 seconds ago   Up 41 seconds                                                         middleware-tempo-1

steffan 2024-07-02T20:01:59.247609Z

Looks good so far. Now navigate to http://localhost:3000/ and you should see (among other things) a list of traces. Try clicking on one for route GET /summary and you should see a trace

Bill Adams 2024-07-02T20:06:49.799179Z

grafana's up but empty - (No data), Not even JVM metrics. weird

steffan 2024-07-02T20:07:36.199689Z

What do you get with docker -v ?

Bill Adams 2024-07-02T20:07:51.439269Z

Docker version 27.0.2, build 912c1dd

πŸ‘πŸΌ 1
Bill Adams 2024-07-02T20:08:30.836359Z

I don't have desktop installed at all either - messed me up awhile ago.

steffan 2024-07-02T20:10:42.263189Z

Something seems to be fundamentally affecting docker on your machine. Can you recall any other work with docker compose that is working?

steffan 2024-07-02T20:11:52.933999Z

(on a side note: For the microservices examples, it takes around a minute of uptime for meaningful output to appear in Grafana)

Bill Adams 2024-07-02T20:12:13.301019Z

yeah - we use docker (with compose at) work and I can stand up our application (comes in 3 parts) - I could try some sample app.

steffan 2024-07-02T20:19:27.774319Z

In any of the microservices apps, after starting with ./up.sh, you could try tailing the logs of the container running image otel/opentelemetry-collector-contrib:latest. Use command docker logs -f <containerid> Normal output for the collector should settle down to output like this:

2024-07-02T20:16:07.345Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 19, "data points": 50}
2024-07-02T20:16:08.219Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 40}
2024-07-02T20:16:08.822Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 53}
2024-07-02T20:16:09.553Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 3, "metrics": 15, "data points": 45}
2024-07-02T20:16:12.968Z        info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 18, "data points": 46}
2024-07-02T20:16:13.241Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 22}
2024-07-02T20:16:13.843Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 26}

steffan 2024-07-02T20:20:36.530529Z

If there are no messages like this, then no messages are reaching the Collector, which would be very odd.

Bill Adams 2024-07-02T20:22:48.213799Z

Yeah nothing is hitting the otel-collector. I could see the generator requesting and getting responses ... still digging.

Bill Adams 2024-07-02T20:23:23.273109Z

found something.

Bill Adams 2024-07-02T20:23:48.400519Z

> ... 18 more > [otel.javaagent 2024-07-02 20:23:05:616 +0000] [OkHttp http://otel-collector:4318/...] ERROR io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export metrics. The request could not be executed. Full error message: Failed to connect to otel-collector/172.20.0.5:4318 > java.net.ConnectException: Failed to connect to otel-collector/172.20.0.5:4318 > at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:297) > at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207) > at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226) > at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106) > at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74) > at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255) > at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32) on : middleware-sentence-summary-service-1

Bill Adams 2024-07-02T20:24:34.084129Z

same on the middleware-word-length-service-1 too .

Bill Adams 2024-07-02T20:25:13.693489Z

oh... > 1b84f1ed180c otel/opentelemetry-collector-contrib:latest "/otelcol-contrib --…" 3 minutes ago Up 3 minutes 4317/tcp, 55678-55679/tcp middleware-otel-collector-1

steffan 2024-07-02T20:42:18.983119Z

Something is up. I'm having problems now

Bill Adams 2024-07-02T20:43:28.784469Z

I'm trying something locally in the otel collector config.

Bill Adams 2024-07-02T20:43:47.858129Z

I fixed it I think.

Bill Adams 2024-07-02T20:46:19.473249Z

"Fixed" is a stretch of course πŸ˜‰ its publishing metrics to grafana now.

Bill Adams 2024-07-02T20:46:46.330029Z

Bill Adams 2024-07-02T20:47:22.467949Z

steffan 2024-07-02T20:56:59.476839Z

I think some changes have crept into the Collector. I'm analysing it now.

πŸ‘ 1
Bill Adams 2024-07-02T20:59:11.753729Z

in the meantime - that extra line in the otel-collector config fixed cube-app.

steffan 2024-07-02T21:02:17.396879Z

Thanks! I've also discovered changes in Jaeger, so I've several updates to make.

πŸ‘ 1
Bill Adams 2024-07-02T21:05:43.464499Z

want me to put up a PR on the otel-collector config(s) change?

steffan 2024-07-02T21:11:31.895479Z

I'd like to investigate further as there are several issues at play. I've just discovered that the latest Jaeger build breaks things, but the most recent tagged build works.

πŸ‘ 1
Bill Adams 2024-07-02T21:11:50.856379Z

sure thing np.

steffan 2024-07-02T21:11:51.143739Z

Thanks for the offer though

πŸ‘ 1
Bill Adams 2024-07-02T21:12:37.729289Z

I've got 0.2.7 in our staging env - testing it out now.

Bill Adams 2024-07-02T21:13:00.356579Z

got to use the synchronous gauge - that was a win.

Bill Adams 2024-07-02T21:13:26.395879Z

and the new histogram bucket advice (without having to override it with a hack)

steffan 2024-07-02T23:05:23.477869Z

The Collector (and contrib Collector) had a breaking change introduced yesterday, where the OTLP receiver by default uses localhost instead of 0.0.0.0. https://github.com/open-telemetry/opentelemetry-collector/releases/tag/v0.104.0 Many thanks for finding the fix! πŸ™‡πŸΌ

steffan 2024-07-02T23:29:01.362039Z

I've pushed your fix, along with a fix for the broken Jaeger image. You should be fine using clj-otel 0.2.7, as those fixes address just the examples and not clj-otel modules.

πŸ™Œ 1