Hey
Super mega giga beginner here.
First time encountering open telemetry and clj-otel.
We've used prometheus collector registry and bunch of counters to have basic metrics for our needs.
I am looking for a most basic example possible but examples provided in the repo seem quite advanced.
I am looking at some python examples here: https://opentelemetry.io/docs/languages/python/exporters/
usage of OTLPMetricExporter seems to be on the level of my understanding and current needs.
Is it possible to do this with clj-otel?
edit
seems like https://github.com/steffan-westcott/clj-otel/blob/93df3ce6de4cf7af11efd9f13b273f960e769732/examples/factorial-app/src/example/factorial_app.clj is the way to go 🙌
@steffan i am trying to use agent + manual counters i added (and auto config) but i am having trouble using both.
:jvm-opts ["-javaagent:opentelemetry-javaagent.jar"
as per example +
"-Dotel.java.global-autoconfigure.enabled=true"
"-Dotel.metrics.exporter=prometheus"
"-Dotel.metric.export.interval=5000"
"-Dotel.logs.exporter=none"
I get open telemetry metrics on localhost:9464 - great!
but when i do
(instrument/add! @custom-count {:value 1})
i get
}Jun 10, 2024 1:12:17 PM io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdkBuilder build
INFO: Error encountered during autoconfiguration. Closing partially configured components.
Jun 10, 2024 1:12:17 PM io.opentelemetry.api.GlobalOpenTelemetry maybeAutoConfigureAndSetGlobal
SEVERE: Error automatically configuring OpenTelemetry SDK. OpenTelemetry will not be enabled.
io.opentelemetry.sdk.autoconfigure.spi.ConfigurationException: Unexpected configuration error
...
Caused by: java.io.UncheckedIOException: Could not create Prometheus HTTP server
Any idea what i am doing wrong?
My impression was that agent + SDK autoconfigure are able to share the prometheus and then you are able to scrape anything you want so you get free stuff with agent + whatever specifically you need as manual instrument ?
Or maybe this is the scenario where using agent + SDK manual config is a must (another prometheus server on different port). Example agent's metrics will be scraped from 9464 and mine will be scraped from 946*5 ?*Found:https://github.com/steffan-westcott/clj-otel/blob/0291d04e62efb3659c88c41e4a17e6a7f01eeb84/examples/divisor-app/src/example/divisor_app.clj#L4 Where you are not using agent, but runtime let me see if that works for me
Code from the example works like i hoped it would
but what are the options for the agent + autoconfig is that even possible? should it even be done?
The agent uses autoconfig as its method of configuration i.e. it is configured using system properties (or environment variables). The agent provides automatic instrumentation for your application, such as JVM metrics and support for a large number of frameworks and libraries. To enrich (add to) this instrumentation with manual instrumentation, you use the OpenTelemetry API. With clj-otel, this means you add clj-otel-api to your application. You do not need to add any other dependencies, such as the SDK, the autoconfigure SDK extension, JVM runtime telemetry or Prometheus exporter, as the agent already has its dependencies built in. You also do not need to manage starting the SDK, as the agent handles this also. Please take a close look at cube-app for an example that shows an application run with the agent that has enriched instrumentation.
https://github.com/steffan-westcott/clj-otel/tree/master/examples/cube-app
The https://github.com/steffan-westcott/clj-otel/blob/master/doc/tutorial.adoc is another example of an application run with the agent and enriched (manual + automatic) instrumentation.
In the case of manual metrics instrumentation, they are exported along with the automatic metrics instrumentation. They use the same exporter and do not end up on another HTTP endpoint.
Also, do not use otel.java.global-autoconfigure.enabled as it is discouraged and may cause configuration failures.
Oh ok i was on the good track then but failed to realize that my local setup has issues with cider version i am using it basically never get cider repl 😄 so metrics are running, agent is running and everything but repl was missing when i cider-jack-in again i was getting bound port errors. So that's where the confusion happened.
Good to know that agent and clj-otel-api can work together. Seems very nice.
Will go with runtime-otel for now.
Thank you so much for the help 🙇
I'm glad you have something working now 😄 Making sense of the options and configuring Java OpenTelemetry is not simple, so you have come a long way!
Thank you for your interest in clj-otel 😄
An even more basic example would be one that uses the OpenTelemetry instrumentation agent, as all the dependencies are included in the agent JAR. Take a look at cube-app for a small example that uses the agent. https://github.com/steffan-westcott/clj-otel/tree/master/examples/cube-app
The factorial-app that you looked at is a small example, but note it shows programmatic configuration of the SDK. Alternatively using the autoconfigure SDK extension is a bit easier to get going, as that uses system properties (or environment variables) rather than program code. The OpenTelemetry instrumentation agent uses the autoconfigure SDK extension, so all examples with the agent show this manner of configuration, such as cube-app.
See this https://cljdoc.org/d/com.github.steffan-westcott/clj-otel-api/0.2.6/doc/concepts#_using_the_opentelemetry_sdk for a brief overview of the different ways the SDK can be configured.
hey @steffan thanks for the quick response. I actually want to avoid adding jvm options and would like to setup this programatically. I've managed to use the factorial example to setup metrics -> collector -> prometheus locally. Works great. Thanks for this! I've got maybe even simpler setup in mind that i wasn't able to find example in the repo. Is it possible to setup so that prometheus is scraping metrics without the need for collector
:meter-provider {:readers [{:metric-reader (meter/periodic-metric-reader
{:metric-exporter (prometheus/http-server {:port 1237
:path "/metrics2"})})}
this doesn't seem to work.
Execution error (ClassCastException) at steffan-westcott.clj-otel.sdk.meter-provider/periodic-metric-reader (meter_provider.clj:15).
class io.opentelemetry.exporter.prometheus.PrometheusHttpServer cannot be cast to class io.opentelemetry.sdk.metrics.export.MetricExporter (io.opentelemetry.exporter.prometheus.PrometheusHttpServer and io.opentelemetry.sdk.metrics.export.MetricExporter are in unnamed module of loader 'app'
Which is obvious but not obvious how to solve it?https://github.com/steffan-westcott/clj-otel/blob/master/examples/divisor-app/deps.edn is using agent which i don't want to use
(sdk/init-otel-sdk! ;; The service name is the minimum resource information.
"test-server"
{
:resources [(res/host-resource)
(res/os-resource)
(res/process-resource)
(res/process-runtime-resource)]
:metric-exporter (prometheus/http-server {;;:port 1239
:path "/metrics2"})
})
This one doesn't throw exceptions but i am not sure it's working since
http://localhost:9464/metrics2 gives me valid response but it's empty.
it's like (instrument/add! @count {:value 1}) doesn't do anythingYou are almost there! (prometheus/http-server) returns a MetricReader which is used directly in the SDK configuration, like this:
(sdk/init-otel-sdk!
"my-app"
{:resources [(res/host-resource) (res/os-resource) (res/process-resource) (res/process-runtime-resource)]
:meter-provider {:readers [{:metric-reader (prometheus/http-server)}]}})Here is a modified version of divisor-app as an example:
(ns example.divisor-app
(:require [steffan-westcott.clj-otel.api.metrics.instrument :as instrument]
[steffan-westcott.clj-otel.resource.resources :as res]
[steffan-westcott.clj-otel.sdk.otel-sdk :as sdk]
[steffan-westcott.clj-otel.exporter.prometheus :as prometheus]))
(defonce gcd-count
(delay (instrument/instrument {:name "app.divisor.gcd-count"
:instrument-type :counter
:unit "{greatest common divisors}"
:description
"The number of greatest common divisors calculated"})))
(defn- gcd*
[x y]
(if (zero? x)
y
(recur (mod y x) x)))
(defn gcd
[x y]
(instrument/add! @gcd-count {:value 1})
(gcd* x y))
(defn init-otel!
[]
(sdk/init-otel-sdk!
"divisor-app"
{:resources [(res/host-resource) (res/os-resource) (res/process-resource)
(res/process-runtime-resource)]
:meter-provider {:readers [{:metric-reader (prometheus/http-server)}]}}))
(comment
(init-otel!)
(gcd 18 24))Use this deps.edn
{:paths ["src"]
:deps {org.clojure/clojure {:mvn/version "1.11.3"}
com.github.steffan-westcott/clj-otel-api {:mvn/version "0.2.6"}
com.github.steffan-westcott/clj-otel-sdk {:mvn/version "0.2.6"}
com.github.steffan-westcott/clj-otel-instrumentation-resources {:mvn/version "0.2.6"}
com.github.steffan-westcott/clj-otel-exporter-prometheus {:mvn/version "0.2.6"}}}
...and this prometheus.yaml
global:
scrape_interval: 10s
evaluation_interval: 10s
scrape_configs:
- job_name: localhost
static_configs:
- targets:
- host.docker.internal:9464
...and this compose.yaml
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yaml:/etc/prometheus.yaml
command:
- --config.file=/etc/prometheus.yaml
ports:
- "9090:9090" # Prometheus web interface
extra_hosts:
- "host.docker.internal:host-gateway"Nice!
the key was :meter-provider {:readers [{:metric-reader (prometheus/http-server)}]}})
i could swear i tried this config.
Will (sdk/close-otel-sdk!) close down the server and i can (init-otel!) to start it?
This way i can iterate fast through configs.
You should be able to close and open the SDK instance as you please. This isn't the case when using the OpenTelemetry instrumentation agent.
Out of interest, why do you want to avoid using the autoconfigure module? Programmatic configuration is the harder option to use successfully.
We are exploring all the options of clj-otel.
We got a lot of services with different use cases and are looking to have the plug and play type that works in multiple/all scenarios.
Yesterday i managed to have collector setup and it worked nicely. So far we are leaning away from that way of doing things because we want to have the control over how often metrics are fetched without having to specify this as service specific config (talking about :interval [10 TimeUnit/SECONDS]
Today i managed to do it with prometheus with your help!
got metrics there ! yay
I just gotta setup local prometheus to scrape it! 😄
Honestly it's possible i am not seeing all the benefits of autoconfigure module.
Do you think it's superior method of setting this up?There are many things to consider when configuring telemetry in a system. For example, using the Collector gives you centralised control over the flow of telemetry data. This aspect becomes acute should the number of sources of data (application instances) become large.
Using the autoconfigure SDK extension should be the preferred option if it supports the options you need. If your needs are exotic, programmatic configuration is available but not as easy to set up correctly.
Lets say we commit to using collector is it possible to avoid the service specific configuration of how often metrics are sent? Using prometheus allows us change how often metrics are fetched without redeploying service.
Using the autoconfigure SDK extension should be the preferred option
When we talk about autoconfigure we are talking about using agent + jvm options + https://github.com/steffan-westcott/clj-otel/blob/master/examples/divisor-app/src/example/divisor_app.clj#L37
Correct?
the prometheus setup i was looking toward looks kinda like this
(defn init-otel! []
(sdk/init-otel-sdk!
"test-server"
{:resources [(res/host-resource)
(res/os-resource)
(res/process-resource)
(res/process-runtime-resource)]
:meter-provider
{:readers [{:metric-reader (prometheus/http-server {:path "/metrics2"})}]}}))
What would we gain by going the agent route?The point of the Collector is to decouple exporters (sources) from telemetry backends (destinations). So by using a Collector you are making fewer assumptions on what backends are present e.g. Prometheus instances.
The autoconfigure SDK extension and the OpenTelemetry instrumentation agent are two different things. The agent uses autoconfig.
You'll need to investigate options when it comes to exporting Prometheus telemetry data. The example above is pull based, meaning the Prometheus server scrapes targets according to its scrape_config. An alternative is using Prometheus remote write, where application instances push data. The clj-otel microservices examples show this option.
I needed pull based method. This seems to be the preferred option from the tech lead over here.
None of the current clj-otel examples show this, but you can use prometheus instead of prometheusremotewrite in the Collector config. See https://opentelemetry.io/docs/collector/configuration/#exporters
The autoconfigure SDK extension and the OpenTelemetry instrumentation agent are two different things. The agent uses autoconfig.
so autoconfigure is this?
https://github.com/steffan-westcott/clj-otel/blob/master/clj-otel-sdk-extension-autoconfigure/src/steffan_westcott/clj_otel/sdk/autoconfigure.clj#L1I'll refer you to a point in the docs I mentioned earlier : The https://cljdoc.org/d/com.github.steffan-westcott/clj-otel-api/0.2.6/doc/concepts#_using_the_opentelemetry_sdk explains the 3 main options
I read this part of documentation but it went over my head. Thank you for sharing and thank you for your time! If i am understanding everything correctly https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/autoconfigure#prometheus-exporter Just setting these env variables would spin up endpoint from which prometheus can scrape data from and then all i need to do is
(defonce count
(delay (instrument/instrument {:name "count"
:instrument-type :counter
:unit "{count}"
:description "The number of messages"})))
(instrument/add! @count {:value 1})
and all would work so i get best of both worlds.
Autoconfigure + prometheus.
Thank youuu!! 🙇Yes, you can use the SDK autoconfigure extension like you say. Almost all the clj-otel examples use autoconfig.
To answer a previous question you had, using the agent gives you lots of high quality telemetry for no effort, for supported frameworks and libraries. For example, the tutorial shows how by adding the agent (and changing no code) you automatically get server spans for the Jetty server. The agent uses the SDK autoconfigure extension (plus some more options) for its configuration.
oh my god... autoconfigure is SO EASY TO DOOOO! and it works like a charm 😄 @steffan thank you so much for the time and effort!