clj-otel

lepistane 2024-06-04T14:06:47.428339Z

Hey Super mega giga beginner here. First time encountering open telemetry and clj-otel. We've used prometheus collector registry and bunch of counters to have basic metrics for our needs. I am looking for a most basic example possible but examples provided in the repo seem quite advanced. I am looking at some python examples here: https://opentelemetry.io/docs/languages/python/exporters/ usage of OTLPMetricExporter seems to be on the level of my understanding and current needs. Is it possible to do this with clj-otel? edit seems like https://github.com/steffan-westcott/clj-otel/blob/93df3ce6de4cf7af11efd9f13b273f960e769732/examples/factorial-app/src/example/factorial_app.clj is the way to go 🙌

lepistane 2024-06-10T11:19:04.942989Z

@steffan i am trying to use agent + manual counters i added (and auto config) but i am having trouble using both.

:jvm-opts ["-javaagent:opentelemetry-javaagent.jar"
as per example +
"-Dotel.java.global-autoconfigure.enabled=true"
"-Dotel.metrics.exporter=prometheus"
"-Dotel.metric.export.interval=5000"
"-Dotel.logs.exporter=none"
I get open telemetry metrics on localhost:9464 - great! but when i do
(instrument/add! @custom-count {:value 1})
i get
}Jun 10, 2024 1:12:17 PM io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdkBuilder build
INFO: Error encountered during autoconfiguration. Closing partially configured components.
Jun 10, 2024 1:12:17 PM io.opentelemetry.api.GlobalOpenTelemetry maybeAutoConfigureAndSetGlobal
SEVERE: Error automatically configuring OpenTelemetry SDK. OpenTelemetry will not be enabled.
io.opentelemetry.sdk.autoconfigure.spi.ConfigurationException: Unexpected configuration error
...
Caused by: java.io.UncheckedIOException: Could not create Prometheus HTTP server
Any idea what i am doing wrong? My impression was that agent + SDK autoconfigure are able to share the prometheus and then you are able to scrape anything you want so you get free stuff with agent + whatever specifically you need as manual instrument ? Or maybe this is the scenario where using agent + SDK manual config is a must (another prometheus server on different port). Example agent's metrics will be scraped from 9464 and mine will be scraped from 946*5 ?*

lepistane 2024-06-10T11:41:01.917209Z

Found:https://github.com/steffan-westcott/clj-otel/blob/0291d04e62efb3659c88c41e4a17e6a7f01eeb84/examples/divisor-app/src/example/divisor_app.clj#L4 Where you are not using agent, but runtime let me see if that works for me

lepistane 2024-06-10T11:52:21.153609Z

Code from the example works like i hoped it would

lepistane 2024-06-10T12:24:30.590549Z

but what are the options for the agent + autoconfig is that even possible? should it even be done?

steffan 2024-06-10T13:01:35.875289Z

The agent uses autoconfig as its method of configuration i.e. it is configured using system properties (or environment variables). The agent provides automatic instrumentation for your application, such as JVM metrics and support for a large number of frameworks and libraries. To enrich (add to) this instrumentation with manual instrumentation, you use the OpenTelemetry API. With clj-otel, this means you add clj-otel-api to your application. You do not need to add any other dependencies, such as the SDK, the autoconfigure SDK extension, JVM runtime telemetry or Prometheus exporter, as the agent already has its dependencies built in. You also do not need to manage starting the SDK, as the agent handles this also. Please take a close look at cube-app for an example that shows an application run with the agent that has enriched instrumentation. https://github.com/steffan-westcott/clj-otel/tree/master/examples/cube-app

steffan 2024-06-10T13:06:22.172569Z

The https://github.com/steffan-westcott/clj-otel/blob/master/doc/tutorial.adoc is another example of an application run with the agent and enriched (manual + automatic) instrumentation.

steffan 2024-06-10T13:11:15.458479Z

In the case of manual metrics instrumentation, they are exported along with the automatic metrics instrumentation. They use the same exporter and do not end up on another HTTP endpoint.

👍 1
steffan 2024-06-10T13:20:32.768309Z

Also, do not use otel.java.global-autoconfigure.enabled as it is discouraged and may cause configuration failures.

👍 1
lepistane 2024-06-10T14:08:31.074889Z

Oh ok i was on the good track then but failed to realize that my local setup has issues with cider version i am using it basically never get cider repl 😄 so metrics are running, agent is running and everything but repl was missing when i cider-jack-in again i was getting bound port errors. So that's where the confusion happened. Good to know that agent and clj-otel-api can work together. Seems very nice. Will go with runtime-otel for now. Thank you so much for the help 🙇

steffan 2024-06-10T14:14:43.751339Z

I'm glad you have something working now 😄 Making sense of the options and configuring Java OpenTelemetry is not simple, so you have come a long way!

steffan 2024-06-04T20:19:53.513349Z

Thank you for your interest in clj-otel 😄

steffan 2024-06-04T20:21:45.175769Z

An even more basic example would be one that uses the OpenTelemetry instrumentation agent, as all the dependencies are included in the agent JAR. Take a look at cube-app for a small example that uses the agent. https://github.com/steffan-westcott/clj-otel/tree/master/examples/cube-app

steffan 2024-06-04T20:25:54.229289Z

The factorial-app that you looked at is a small example, but note it shows programmatic configuration of the SDK. Alternatively using the autoconfigure SDK extension is a bit easier to get going, as that uses system properties (or environment variables) rather than program code. The OpenTelemetry instrumentation agent uses the autoconfigure SDK extension, so all examples with the agent show this manner of configuration, such as cube-app.

steffan 2024-06-04T20:29:04.561229Z

See this https://cljdoc.org/d/com.github.steffan-westcott/clj-otel-api/0.2.6/doc/concepts#_using_the_opentelemetry_sdk for a brief overview of the different ways the SDK can be configured.

lepistane 2024-06-05T14:12:40.541589Z

hey @steffan thanks for the quick response. I actually want to avoid adding jvm options and would like to setup this programatically. I've managed to use the factorial example to setup metrics -> collector -> prometheus locally. Works great. Thanks for this! I've got maybe even simpler setup in mind that i wasn't able to find example in the repo. Is it possible to setup so that prometheus is scraping metrics without the need for collector

:meter-provider {:readers [{:metric-reader (meter/periodic-metric-reader
                                                {:metric-exporter (prometheus/http-server {:port 1237
                                                                                           :path "/metrics2"})})}
this doesn't seem to work.
Execution error (ClassCastException) at steffan-westcott.clj-otel.sdk.meter-provider/periodic-metric-reader (meter_provider.clj:15).
class io.opentelemetry.exporter.prometheus.PrometheusHttpServer cannot be cast to class io.opentelemetry.sdk.metrics.export.MetricExporter (io.opentelemetry.exporter.prometheus.PrometheusHttpServer and io.opentelemetry.sdk.metrics.export.MetricExporter are in unnamed module of loader 'app'
Which is obvious but not obvious how to solve it?

lepistane 2024-06-05T14:19:20.290079Z

https://github.com/steffan-westcott/clj-otel/blob/master/examples/divisor-app/deps.edn is using agent which i don't want to use

lepistane 2024-06-05T14:37:36.770079Z

(sdk/init-otel-sdk!  ;; The service name is the minimum resource information.
   "test-server"

   {
    :resources [(res/host-resource)
                (res/os-resource)
                (res/process-resource)
                (res/process-runtime-resource)]
    :metric-exporter (prometheus/http-server {;;:port 1239
                                              :path "/metrics2"})
    })
This one doesn't throw exceptions but i am not sure it's working since http://localhost:9464/metrics2 gives me valid response but it's empty. it's like (instrument/add! @count {:value 1}) doesn't do anything

steffan 2024-06-05T15:31:21.962839Z

You are almost there! (prometheus/http-server) returns a MetricReader which is used directly in the SDK configuration, like this:

(sdk/init-otel-sdk!
   "my-app"
   {:resources      [(res/host-resource) (res/os-resource) (res/process-resource) (res/process-runtime-resource)]
    :meter-provider {:readers [{:metric-reader (prometheus/http-server)}]}})

steffan 2024-06-05T15:35:15.661699Z

Here is a modified version of divisor-app as an example:

(ns example.divisor-app
  (:require [steffan-westcott.clj-otel.api.metrics.instrument :as instrument]
            [steffan-westcott.clj-otel.resource.resources :as res]
            [steffan-westcott.clj-otel.sdk.otel-sdk :as sdk]
            [steffan-westcott.clj-otel.exporter.prometheus :as prometheus]))

(defonce gcd-count
  (delay (instrument/instrument {:name "app.divisor.gcd-count"
                                 :instrument-type :counter
                                 :unit "{greatest common divisors}"
                                 :description
                                 "The number of greatest common divisors calculated"})))

(defn- gcd*
  [x y]
  (if (zero? x)
    y
    (recur (mod y x) x)))

(defn gcd
  [x y]
  (instrument/add! @gcd-count {:value 1})
  (gcd* x y))

(defn init-otel!
  []
  (sdk/init-otel-sdk!
   "divisor-app"
   {:resources      [(res/host-resource) (res/os-resource) (res/process-resource)
                     (res/process-runtime-resource)]
    :meter-provider {:readers [{:metric-reader (prometheus/http-server)}]}}))

(comment
  (init-otel!)
  (gcd 18 24))

steffan 2024-06-05T15:36:06.456059Z

Use this deps.edn

{:paths ["src"]

 :deps  {org.clojure/clojure {:mvn/version "1.11.3"}
         com.github.steffan-westcott/clj-otel-api {:mvn/version "0.2.6"}
         com.github.steffan-westcott/clj-otel-sdk {:mvn/version "0.2.6"}
         com.github.steffan-westcott/clj-otel-instrumentation-resources {:mvn/version "0.2.6"}
         com.github.steffan-westcott/clj-otel-exporter-prometheus {:mvn/version "0.2.6"}}}
...and this prometheus.yaml
global:
  scrape_interval: 10s
  evaluation_interval: 10s
scrape_configs:
  - job_name: localhost
    static_configs:
      - targets:
          - host.docker.internal:9464
...and this compose.yaml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yaml:/etc/prometheus.yaml
    command:
      - --config.file=/etc/prometheus.yaml
    ports:
      - "9090:9090"    # Prometheus web interface
    extra_hosts:
      - "host.docker.internal:host-gateway"

lepistane 2024-06-05T15:40:11.766169Z

Nice! the key was :meter-provider {:readers [{:metric-reader (prometheus/http-server)}]}}) i could swear i tried this config. Will (sdk/close-otel-sdk!) close down the server and i can (init-otel!) to start it? This way i can iterate fast through configs.

steffan 2024-06-05T15:41:34.427099Z

You should be able to close and open the SDK instance as you please. This isn't the case when using the OpenTelemetry instrumentation agent.

👍 1
steffan 2024-06-05T15:42:39.119259Z

Out of interest, why do you want to avoid using the autoconfigure module? Programmatic configuration is the harder option to use successfully.

lepistane 2024-06-05T15:50:02.658209Z

We are exploring all the options of clj-otel. We got a lot of services with different use cases and are looking to have the plug and play type that works in multiple/all scenarios. Yesterday i managed to have collector setup and it worked nicely. So far we are leaning away from that way of doing things because we want to have the control over how often metrics are fetched without having to specify this as service specific config (talking about :interval [10 TimeUnit/SECONDS] Today i managed to do it with prometheus with your help!

got metrics there ! yay I just gotta setup local prometheus to scrape it! 😄 Honestly it's possible i am not seeing all the benefits of autoconfigure module. Do you think it's superior method of setting this up?

steffan 2024-06-05T16:00:39.103539Z

There are many things to consider when configuring telemetry in a system. For example, using the Collector gives you centralised control over the flow of telemetry data. This aspect becomes acute should the number of sources of data (application instances) become large.

steffan 2024-06-05T16:03:36.519019Z

Using the autoconfigure SDK extension should be the preferred option if it supports the options you need. If your needs are exotic, programmatic configuration is available but not as easy to set up correctly.

lepistane 2024-06-05T16:08:40.736319Z

Lets say we commit to using collector is it possible to avoid the service specific configuration of how often metrics are sent? Using prometheus allows us change how often metrics are fetched without redeploying service.

Using the autoconfigure SDK extension should be the preferred option 
When we talk about autoconfigure we are talking about using agent + jvm options + https://github.com/steffan-westcott/clj-otel/blob/master/examples/divisor-app/src/example/divisor_app.clj#L37 Correct? the prometheus setup i was looking toward looks kinda like this
(defn init-otel! []
  (sdk/init-otel-sdk!
   "test-server"

   {:resources [(res/host-resource)
                (res/os-resource)
                (res/process-resource)
                (res/process-runtime-resource)]

    :meter-provider
    {:readers [{:metric-reader (prometheus/http-server {:path "/metrics2"})}]}}))
What would we gain by going the agent route?

steffan 2024-06-05T16:10:54.676179Z

The point of the Collector is to decouple exporters (sources) from telemetry backends (destinations). So by using a Collector you are making fewer assumptions on what backends are present e.g. Prometheus instances.

👍 1
steffan 2024-06-05T16:12:10.663899Z

The autoconfigure SDK extension and the OpenTelemetry instrumentation agent are two different things. The agent uses autoconfig.

steffan 2024-06-05T16:20:04.978449Z

You'll need to investigate options when it comes to exporting Prometheus telemetry data. The example above is pull based, meaning the Prometheus server scrapes targets according to its scrape_config. An alternative is using Prometheus remote write, where application instances push data. The clj-otel microservices examples show this option.

lepistane 2024-06-05T16:21:10.679929Z

I needed pull based method. This seems to be the preferred option from the tech lead over here.

steffan 2024-06-05T16:26:14.740329Z

None of the current clj-otel examples show this, but you can use prometheus instead of prometheusremotewrite in the Collector config. See https://opentelemetry.io/docs/collector/configuration/#exporters

👍 1
lepistane 2024-06-05T16:28:58.343069Z

The autoconfigure SDK extension and the OpenTelemetry instrumentation agent are two different things. The agent uses autoconfig.
so autoconfigure is this? https://github.com/steffan-westcott/clj-otel/blob/master/clj-otel-sdk-extension-autoconfigure/src/steffan_westcott/clj_otel/sdk/autoconfigure.clj#L1

steffan 2024-06-05T16:30:39.255369Z

I'll refer you to a point in the docs I mentioned earlier : The https://cljdoc.org/d/com.github.steffan-westcott/clj-otel-api/0.2.6/doc/concepts#_using_the_opentelemetry_sdk explains the 3 main options

👀 1
lepistane 2024-06-05T16:36:14.353089Z

I read this part of documentation but it went over my head. Thank you for sharing and thank you for your time! If i am understanding everything correctly https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/autoconfigure#prometheus-exporter Just setting these env variables would spin up endpoint from which prometheus can scrape data from and then all i need to do is

(defonce count
  (delay (instrument/instrument {:name "count"
                                 :instrument-type :counter
                                 :unit "{count}"
                                 :description "The number of messages"})))
(instrument/add! @count {:value 1}) and all would work so i get best of both worlds. Autoconfigure + prometheus. Thank youuu!! 🙇

steffan 2024-06-05T16:40:01.995669Z

Yes, you can use the SDK autoconfigure extension like you say. Almost all the clj-otel examples use autoconfig.

👍 1
steffan 2024-06-05T16:43:18.987239Z

To answer a previous question you had, using the agent gives you lots of high quality telemetry for no effort, for supported frameworks and libraries. For example, the tutorial shows how by adding the agent (and changing no code) you automatically get server spans for the Jetty server. The agent uses the SDK autoconfigure extension (plus some more options) for its configuration.

lepistane 2024-06-05T17:26:21.810869Z

oh my god... autoconfigure is SO EASY TO DOOOO! and it works like a charm 😄 @steffan thank you so much for the time and effort!

🙇🏼 1