clj-otel

dharrigan 2024-10-22T18:24:08.136519Z

I'm having a hard time getting my ring/compojure (sweet) data into xray on aws

dharrigan 2024-10-22T18:26:08.969649Z

I have this in my middleware (middleware [trace-http/wrap-compojure-route]) and I'm using the java agent for automatic instrumentation. I am seeing xray data for other things, like sns/sqs and database access, but just nothing when it comes to routes that I want to trace. I'm sure I have something incorrectly setup!

dharrigan 2024-10-22T18:26:46.157509Z

Any help would be appreciated! 🙂

steffan 2024-10-22T19:59:19.657139Z

I don't have experience using AWS X-Ray, but I'll ask some questions as we need a fuller explanation. Are you not seeing any HTTP server traces (for your HTTP application) at all in AWS X-Ray, or are they showing up without HTTP route data?

dharrigan 2024-10-23T08:50:26.846929Z

I'm unsure, as it looks to me that the current context, from the agent isn't bound in a synchronous route.

dharrigan 2024-10-23T08:51:01.681149Z

if create-span? is false, there's a call to wrap-existing-server-span

dharrigan 2024-10-23T08:51:39.090829Z

but the agent context is only bound on an asynchronous route, not a synchronous route.

dharrigan 2024-10-23T08:51:46.432459Z

But if I modify the code a bit, to this:

dharrigan 2024-10-23T08:51:52.538099Z

(defn- wrap-existing-server-span
  [handler]
  (fn
    ([request]
     (let [context (context/dyn)]
       (handler (assoc request :io.opentelemetry/server-span-context context))))
    ([request respond raise]
     (let [context (context/dyn)]
       (handler (assoc request :io.opentelemetry/server-span-context context)
                respond
                (fn [e]
                  (span/add-exception! e {:context context})
                  (raise e)))))))

dharrigan 2024-10-23T08:51:58.427949Z

then the agent context is bound to the request

steffan 2024-10-23T09:07:48.478779Z

I'm sorry, I'm not at all clear on the problem you are reporting. Can we please take a step back and identify the scope of your issue? First, are you seeing any server traces at all in the X-Ray console?

dharrigan 2024-10-23T09:10:12.612979Z

Apologies. Yes, I am seeing xray reports, but only for lower level stuff, like database access, sns, sqs requests and so on.

dharrigan 2024-10-23T09:10:29.706309Z

I do not see any route traces.

dharrigan 2024-10-23T09:10:52.695449Z

So I don't know if /foo/bar/baz is being called.

dharrigan 2024-10-23T09:11:46.073419Z

When I modified the wrap-existing-server-span, as above, I was then able to query (locally running jaeger) for url.path.

dharrigan 2024-10-23T09:14:09.752369Z

However, I'm not sure yet of whether that is just a red-herring that I'm going down

steffan 2024-10-23T09:15:42.613329Z

OK, so your issue isn't to do with the Compojure integration, as that just decorates server spans with extra detail.

steffan 2024-10-23T09:19:09.019869Z

What HTTP server library does your application use e.g. Jetty, http-kit...?

dharrigan 2024-10-23T09:20:19.895419Z

Jetty 12.0.14

steffan 2024-10-23T09:21:14.015949Z

OK, Jetty is supported by the OpenTelemetry Java instrumentation agent. Are you using that, or some other agent?

dharrigan 2024-10-23T09:21:21.050299Z

opentelemetry

steffan 2024-10-23T09:23:11.945479Z

OK, are you able to run your application (or something similar to your application) without clj-otel at all, but with the agent? You should be getting traces without any manual instrumentation.

dharrigan 2024-10-23T09:24:50.928809Z

I can attempt that. I have a locally running jaeger (as deploying to aws ecs is tedious each time). I can spin up my application and see what I get with the agent running

steffan 2024-10-23T09:25:51.028029Z

As you may appreciate, there are a lot of configuration details to nail down. Manual instrumentation (the major use case for clj-otel) is the icing on the cake, so to speak.

dharrigan 2024-10-23T09:27:03.072939Z

Yeah, it's quite tricky for sure

steffan 2024-10-23T09:28:35.620299Z

Establishing traces in X-Ray through automatic instrumentation for your application would be a significant milestone towards your ultimate goal of a manually instrumented application.

dharrigan 2024-10-23T09:29:10.680329Z

....manual is best?

dharrigan 2024-10-23T09:31:57.933639Z

Okay, so application running with agent and no clj-otel. Jaeger running. I can see traces in jaeger of database requests, subscriptions to sns an sqs going on.

steffan 2024-10-23T09:31:58.795969Z

Recalling the clj-otel https://github.com/steffan-westcott/clj-otel/blob/master/doc/tutorial.adoc, automatic instrumentation is good, but enriched instrumentation (automatic plus manual) enables extra insight.

dharrigan 2024-10-23T09:33:12.028989Z

As expected, attempting to query this url.path=/foo/bar/baz shows no traces

steffan 2024-10-23T09:34:33.609639Z

I'm confused again. Are you deploying locally or on AWS? Are you using X-Ray to view, or deploying Jeager on AWS?

dharrigan 2024-10-23T09:35:48.150279Z

I am deploying to AWS. However, as mentioned above, deploying to AWS ECS is tedious each time, takes +minutes to do. So, in order to simulate the behaviour, I'm using jaeger, which respects the otlp protocol (like aws xray). So what I can get in jaeger should be nearly close enuogh to what I see in xray

dharrigan 2024-10-23T09:36:34.053159Z

I am running locally for this experimentation to see why with the middleware (wrap-server-span) set in my application, I do not see route requests - both on aws xray and on jaeger

steffan 2024-10-23T09:37:54.944309Z

Your issue is more related to networking, rather than clj-otel functionality. This is why I believe you should focus on getting the automatic instrumentation working first.

dharrigan 2024-10-23T09:38:28.352779Z

It is working, I do see traces of everything except route requests.

dharrigan 2024-10-23T09:38:41.516009Z

Maybe partially working? 🙂

steffan 2024-10-23T09:40:57.291979Z

The instrumentation agent should be providing trace telemetry for your HTTP application. From what you've said, it looks like only AWS API calls are getting instrumented, so I think there's a problem with the configuration of the agent.

dharrigan 2024-10-23T09:41:29.792379Z

Indeed, tis a head scratcher.

dharrigan 2024-10-23T09:41:43.775349Z

I'll dig some more...

steffan 2024-10-23T09:43:06.564069Z

Make sure you can see err/`out` from the application. The agent will complain if its unable to export OTLP data.

👍 1
steffan 2024-10-23T10:00:46.850009Z

In case you are not aware, AWS provides a distro for OpenTelemetry. As part of this, they offer a customised agent. https://aws-otel.github.io/docs/getting-started/java-sdk

dharrigan 2024-10-23T10:01:48.365689Z

Yes I use that

dharrigan 2024-10-23T10:01:50.537709Z

it's my sidecar

steffan 2024-10-23T10:04:16.507289Z

One important detail is that the AWS X-Ray propagator is not enabled by default in the vanilla agent. I imagine (but do not know) its enabled by default in the ADOT version.

dharrigan 2024-10-23T10:05:27.072019Z

I have an open telemetry yml which enables it

dharrigan 2024-10-23T10:05:55.518189Z

on the otel collector (aws distro)

dharrigan 2024-10-23T10:08:10.223799Z

now I'm confused

dharrigan 2024-10-23T10:53:51.859519Z

An update. Only by adding (wrap-serve-span {:create-span? true}) do I now get lovely paths and lots of juicy data from requests in amazon xray. It looks like, for whatever reason, the aws-opentelemetry-agent.jar is not instrumenting jetty(?) to get the paths out.

dharrigan 2024-10-23T10:54:20.197689Z

I'm happy enough to allow clj-otel collector to create the span for me - it'll do and seems to work fine. I can look into the java agent later.

dharrigan 2024-10-23T10:54:20.956539Z

.

steffan 2024-10-23T11:55:23.054099Z

I'm glad you've got something working, I know it is tough to get off the ground 😄 Your experiments so far have verified that networking between your application, agent, OpenTelemetry Collector and AWS X-Ray is working. That's a lot!

dharrigan 2024-10-23T11:58:17.006499Z

Thanks for your guidance and help 🙂

steffan 2024-10-23T11:58:18.447379Z

As you've noted, getting the agent to automatically instrument the application would be the next step. My guess would be some setting in ADOT needs attention. Perhaps the traces signal is disabled by default?

dharrigan 2024-10-23T11:58:50.812119Z

I'll look into it and if I find something, I'll report back to add to the collective knowledge 🙂

steffan 2024-10-23T12:01:47.465009Z

Yes, please do. It would be most useful for folk in general to give some experience reports of using clj-otel in the wild, like deployments in AWS and GCP. Many of the questions I get are related to configuration and deployment.

👍 1
Sam H 2024-10-24T13:09:27.598229Z

https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/10575/files Looks like jetty12 instrumentation was added earlier in the year. Not sure how you'd check if the AWS agent has that code or not

dharrigan 2024-10-24T13:13:17.529329Z

Interesting, it may not be in the 1.X release branch, (as the commit seems to be on main, which looksl like a 2.X branch). Amazon AWS otel java instrumentation is only on 1.X still.

Sam H 2024-10-24T13:13:47.640889Z

https://github.com/aws-observability/aws-otel-java-instrumentation/releases that references quite an older version of the otel java one

Sam H 2024-10-24T13:14:15.781049Z

guess you could downgrade the jetty server to validate if you wanted to check

Sam H 2024-10-24T13:20:45.867179Z

https://github.com/aws-observability/aws-otel-java-instrumentation/blob/main/dependencyManagement/build.gradle.kts#L29-L34 In case you have any appetite to upgrade it. Though that could be rabbit hole. For reference, here's the supported lib page for the version it's using v1.32.1 https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/v1.32.1/docs/supported-libraries.md#application-servers which doesn't seem to have the changes for jetty 12

dharrigan 2024-10-24T13:31:13.820929Z

thank you! 🙂

dharrigan 2024-10-24T13:31:35.061519Z

mucho appreciated! I'll devote some cycles tomorrow to look at this.

👍 1
steffan 2024-10-24T18:51:15.785489Z

Beware, there are some breaking changes between agent 1.x and 2.x so be sure to check the release notes. Off the top on my head, I remember some changes to default OTLP settings, and a load of semantic conventions.

dharrigan 2024-10-25T10:29:13.092259Z

Thank you. I'll stick to 1.X at the moment, as that is the version that aws-opentelemetry-agent supports. It appears that support for Jetty 12 is only in the 2.X branch for sure (2.6.0 onwards).

dharrigan 2024-10-25T10:29:24.749659Z

I can live without the spans for the moment 🙂