Hello! I am writing a cron-scheduling service and wanted to get some feedback on the use of transducers. How much of an overhead am I paying here for creating the transducers for each call and should I bother pulling them out of the function call? (i.e. moving map to a def)
(defmacro parser [[^long min ^long max] expression translator]
`(transduce
(map #(parse-fragment % ~min ~max ~translator))
compact
;; for some cases we waste 1 bit for convenience
(new BitSet ~(inc max))
(string/split ~expression #",")))
(defn parse-seconds [^String expression]
(parser [0 59] expression {}))
(defn parse-minutes [^String expression]
(parser [0 59] expression {}))
(defn parse-hours [^String expression]
(parser [0 23] expression {}))
(defn parse-days [^String expression]
(parser [1 31] expression {}))
(defn parse-months [^String expression]
(parser [1 12] expression months-translator))
(defn parse-weekdays [^String expression]
(parser [0 6] expression weekdays-translator))
I doubt either way it's going to be noticeable (at least not with time) but I would like to be more aware of what I am doing here 🙂For performance recommendations, you can also try cross-posting in #performance. time isn't that useful for benchmarking unless the function is fairly slow. I recommend https://github.com/hugoduncan/criterium for benchmarking. You can also use https://github.com/clojure-goes-fast/clj-async-profiler to identify where functions are spending most of their time.
I doubt there's much benefit to making parser a macro. You can also experiment with :inline, https://blog.janetacarr.com/clojure-inline-explained/.
As mentioned at the bottom of the post:
> by wrapping that stuff in thunks (which always have boxed args), you're missing some of the biggest benefits of inlining, which have to do with using inlined functions together with primitives in a local context, so this is not a useful test to demonstrate
> — Alex Miller (@puredanger)
I don't think using a macro in your example bypasses the boxing, so I doubt you would notice much of a difference.
One technique that probably would speed things up is to replace transduce with loop/recur that does everything directly.
Thanks for recommending #performance hadn't noticed it! I'm aware of criterium and the profiler they are great tools 😄 Yeah the macro doesn't skip the boxing and it was mostly to have a nicer interface for things than anything.
Although the differences in timing are probably still minor and I wouldn't bother unless it showed up in profiling output.
yeah that's reasonable, i will give loop out of curiosity if anything
You can see some differences between reduce (and therefore transduce) at https://aphyr.com/posts/360-loopr-a-loop-reduction-macro-for-clojure.
those are some good resources to put things into perspective 🙂 i appreciate them, will get reading
Assuming this is a parser of String->CronSchedule, why not extend-type String to a custom IParseCronSchedule protocol? Anything that does not parse as a CronSchedule Record can return an error (or throw an exception, if that's what you want).
Given such a function of a string to a cron schedule, you are free to transduce it at will.
It will also be protocols-fast (better than functions-fast, once hotspot kicks in).
I've generally stayed away from protocols as they never felt very idiomatic, could you explain a bit more about what you mean with protocols-fast?
If you're not dispatching on type, then I don't see how protocols would help. As far as I know, it would be unnecessarily adding indirection and would be slower. Here's a benchmark as an example:
(add-libs '{criterium/criterium {:mvn/version "0.4.6"}})
(require '[criterium.core
:as criterium
:refer
[bench quick-bench]])
(defprotocol IFoo
(foo-protocol [s]))
(extend-type String
IFoo
(foo-protocol [s]
(.length ^String s)))
(defn foo-fn [s]
(.length ^String s))
(bench
(foo-fn "foo"))
;; Evaluation count : 15012733500 in 60 samples of 250212225 calls.
;; Execution time mean : 2.180314 ns
;; Execution time std-deviation : 0.014121 ns
;; Execution time lower quantile : 2.164483 ns ( 2.5%)
;; Execution time upper quantile : 2.205879 ns (97.5%)
;; Overhead used : 1.830422 ns
(bench
(foo-protocol "foo"))
;; Evaluation count : 11461432320 in 60 samples of 191023872 calls.
;; Execution time mean : 3.415150 ns
;; Execution time std-deviation : 0.014690 ns
;; Execution time lower quantile : 3.400873 ns ( 2.5%)
;; Execution time upper quantile : 3.444830 ns (97.5%)
;; Overhead used : 1.830422 nsIt would be interesting to hear otherwise.
If you are dispatching on type, you may find this previous discussion interesting (see thread and subsequent channel messages): https://clojurians.slack.com/archives/C03L9H1FBM4/p1678093600267009
> what you mean with protocols-fast I assumed type-based dispatch.