beginners

2025-10-18T09:26:49.723959Z

Hi, I'd like to share my experiment using Clojure in competitive programming. My reasoning was as follows: since Java is the second most popular programming language for competitions, and Clojure's performance is comparable to Java, I can use Clojure in competitive programming. I started with the AtCoder Beginner Selection problems, and everything went well—I solved 8 problems, but then I encountered the "Time limit exceeded" limit on the ninth problem, "Toshidama" (out of 11 problems total). I optimized the algorithm and reduced the number of tests that exceeded the time limit from 6 to 2. The exceedance is minor: the slowest test took 2035 ms compared to the 2000 ms limit. Here's the problem: there are N banknotes—x 10,000-yen bills, y 5,000-yen bills, and z 1,000-yen bills, for a total of T yen. Find the combination x, y, z if x 10k + y 5k + z * 1k = T Here's my best solution (2 tests exceeded the time limit):

(require '[clojure.string :as str])

(let [input (slurp *in*)
        in (str/split input #" |\n")
        n (Integer. (get in 0))
        t (Integer. (get in 1))
        res (first
             (filter (fn [[x y z]] (= t (+ (* x 10000) (* y 5000) (* z 1000))))
                     (filter #(= n (apply + %))
                             (for [x (range (inc n))
                                   y (range (inc n))
                                   :let [z (if (neg? (- n x y)) 0 (- n x y))]]
                               [x y z]))))]
    (if (nil? res)
      (print -1 -1 -1)
      (apply print res)))
It's strange that further optimization—`(range (inc (min n (quot t 10000))))` etc.—led to an increase in the number of tests exceeding the time limit. Therefore, I have to use C++ for time-consuming problems or choose language-agnostic contests where you submit test output instead of code.

p-himik 2025-10-18T09:40:08.319829Z

Without delving into the algorithm itself (I have no idea what the proper algorithm is and I actively resist being nerd-sniped :D), the way res is created is most definitely sub-optimal. You create three lazy sequences just to find a single value, you become a victim of chunking (lazy sequences, depending on the context, can evaluate items in chunks and not 1-by-1, which usually leads to better performance, but not in this case). Something like a simple loop or reduce would be better. Or even just (first (for ...)) since for supports not just :let but also :when and :while, although combining them could be counter-intuitive. Another thing is boxing. When you don't specify primitive types, everything is an object - you even used Integer yourself. So a simple value like 17 ends up being wrapped in an object which makes it significantly slower. Try (set! *unchecked-math* :warn-on-boxed) before evaluating that code above and see what it gives you. > It's strange that further optimization—(range (inc (min n (quot t 10000)))) etc. How would the code look with that optimization?

👍 1
2025-10-18T09:43:41.408799Z

(require '[clojure.string :as str])

(let [input (slurp *in*)
        in (str/split input #" |\n")
        n (Integer. (get in 0))
        t (Integer. (get in 1))
        max-x (min n (quot t 10000))
        max-y (min n (quot t 5000))
        max-z (min n (quot t 1000))
        res (first
             (filter (fn [[x y z]] (= t (+ (* x 10000) (* y 5000) (* z 1000))))
                     (filter #(= n (apply + %))
                             (for [x (range (inc max-x))
                                   y (range (inc max-y))
                                   :let [z (if (neg? (- n x y)) 0 (min max-z (- n x y)))]]
                               [x y z]))))]
    (if (nil? res)
      (print -1 -1 -1)
      (apply print res)))

p-himik 2025-10-18T09:46:38.888089Z

Can you give a sample input for which the original code is faster than the optimized version?

p-himik 2025-10-18T09:49:25.111899Z

I just tried with 2 11000 - the optimized version is slightly faster.

2025-10-18T09:50:23.446989Z

mmm, AtCoder only disclose a few first tests, but here is the slowest of disclosed: 1000 1234000

2025-10-18T09:53:03.139239Z

I probably was lucky (low CPU usage etc) to get only 2 failed tests with not finally optimized version

p-himik 2025-10-18T09:53:13.898909Z

The optimized version is still faster on my end, as expected (`f` is the original f2 is optimized):

(crit/quick-bench
  (f 1000 1234000))
Evaluation count : 1260 in 6 samples of 210 calls.
             Execution time mean : 562.057965 µs
    Execution time std-deviation : 139.024057 µs
   Execution time lower quantile : 478.888005 µs ( 2.5%)
   Execution time upper quantile : 795.856552 µs (97.5%)
                   Overhead used : 6.098224 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 64.7749 % Variance is severely inflated by outliers
=> nil
(crit/quick-bench
  (f2 1000 1234000))
Evaluation count : 4578 in 6 samples of 763 calls.
             Execution time mean : 133.713648 µs
    Execution time std-deviation : 3.044836 µs
   Execution time lower quantile : 130.066537 µs ( 2.5%)
   Execution time upper quantile : 136.497486 µs (97.5%)
                   Overhead used : 6.098224 ns
=> nil

👍 1
2025-10-18T09:56:16.602129Z

I’ll try to write a “loop” version tonight

p-himik 2025-10-18T09:56:47.760869Z

I can give some general notes on that code, not performance-related, if you want.

2025-10-18T09:57:13.191919Z

That would be great

p-himik 2025-10-18T10:02:30.791539Z

From the very start, it's much better to write things in a composable manner. It's not harder to do so, but it's better if you do any sort of experiments. Some specific things I'd do • Wrap let in a function that accepts n and t as arguments and returns res without any checks or printing. Reading the input, processing it, deciding the default value in case of a failure, printing - all are concepts unrelated to each other • Use so-called "rich comments" to run experiments with that function instead of feeding it different stdin • For the purpose of the online test, just call that function from that let form Much, much easier to experiment that way and exclude unrelated things like stdin and printing from benchmarking. Also, check out ->>. Your code would be much more readable with it - it allows you to chain collection-related functions (when coll is the last argument) without nesting. Similar, -> also exists for when something is the first argument, but here it wouldn't be useful.

👍 1
p-himik 2025-10-18T10:05:30.393459Z

By trying out many random combinations in a loop, found the value for which the optimized version is consistently slower. But the original timing is still well within a single SD from it. N = 1000, T = 9283000.

🔥 1
p-himik 2025-10-18T10:07:28.142619Z

Repeated that test many more times - the consistency is gone, both f and f2 take time that's within SD of each other, in any direction. Probably an outlier case that could be slower because of CPU branch prediction or something like that.

2025-10-18T10:07:48.867909Z

I wonder if there is a smart algorithm instead of my brute force solution

p-himik 2025-10-18T10:08:10.125359Z

In any case, you'll see incomparably smaller times when you get rid of lazy collections in combination with first and boxed math.

👍 1
2025-10-18T10:08:48.870999Z

What is SD?

2025-10-18T10:09:15.932799Z

I mean abbreviation

p-himik 2025-10-18T10:09:21.720199Z

Standard Deviation.

👍 1
p-himik 2025-10-18T10:21:01.666649Z

> I wonder if there is a smart algorithm instead of my brute force solution Most likely, it's a popular problem.

p-himik 2025-10-18T11:32:53.224199Z

Oh, almost forgot - that z (if ...) can be rewritten as z (max (- n x y) 0). Tried out of curiosity optimizing the very first version, without any changes to the algorithm. Using a nested loop made it 5 times faster. Avoiding box math made it 2 times faster. So, a whole order of magnitude.

2025-10-18T11:33:43.739769Z

Wow, can you post the solution?

p-himik 2025-10-18T11:33:57.578299Z

Sure:

(defn f3 [^long n ^long t]
  (loop [xs (range (inc n))]
    (when-first [^long x xs]
      (or (loop [ys (range (inc n))]
            (when-first [^long y ys]
              (let [z (max (- n x y) 0)]
                (if (= (+ (* x 10000)
                          (* y 5000)
                          (* z 1000))
                       t)
                  [x y z]
                  (recur (next ys))))))
          (recur (next xs))))))

1
👀 1
🔥 1
p-himik 2025-10-18T11:34:36.736069Z

It uses ^long instead of ^int because in Clojure number literals are longs by default, not ints.

2025-10-18T11:37:38.202459Z

I’ll test it tonight (far from the laptop now). It should pass the time constraint

p-himik 2025-10-18T11:46:33.540549Z

Ah, but we don't really need the ranges and iterating over collections, do we? This version goes 20 times faster than the previous one.

(defn f4 [^long n ^long t]
  (loop [x 0]
    (when (<= x n)
      (or (loop [y 0]
            (when (<= y n)
              (let [z (max (- n x y) 0)]
                (if (= (+ (* x 10000)
                          (* y 5000)
                          (* z 1000))
                       t)
                  [x y z]
                  (recur (inc y))))))
          (recur (inc x))))))
Tested a plain Java version - it's at exactly the same time as the above version. So we ended up with a Clojure function that has almost no type hints (since there are no generic collections, types of most bindings can be deduced) and is as performant as an equivalent Java function.

🔥 1
p-himik 2025-10-18T11:47:43.936019Z

The equivalent Java code:

public class X {
    public static long[] f(long n, long t) {
        for (long x = 0; x <= n; x++) {
            for (long y = 0; y <= n; y++) {
                long z = Math.max(n - x - y, 0);
                if (x * 10000 + y * 5000 + z * 1000 == t) {
                    return new long[] { x, y, z };
                }
            }
        }
        return null;
    }
}
Used as:
(import 'X)

(defn f-java [^long n ^long t]
  (vec (X/f n t)))
The results:
(crit/quick-bench
    (f4 1000 9283000))
Evaluation count : 936 in 6 samples of 156 calls.
             Execution time mean : 647.384538 µs
    Execution time std-deviation : 2.895507 µs
   Execution time lower quantile : 643.851417 µs ( 2.5%)
   Execution time upper quantile : 650.939564 µs (97.5%)
                   Overhead used : 6.098224 ns
=> nil
(crit/quick-bench
    (f-java 1000 9283000))
Evaluation count : 912 in 6 samples of 152 calls.
             Execution time mean : 665.660780 µs
    Execution time std-deviation : 572.906808 ns
   Execution time lower quantile : 665.104704 µs ( 2.5%)
   Execution time upper quantile : 666.351677 µs (97.5%)
                   Overhead used : 6.098224 ns
=> nil

2025-10-18T11:51:11.055489Z

Awesome!

p-himik 2025-10-18T12:01:05.964579Z

Heh, wrote an equivalent C++ function. It is 2x slower than the Java version. It's been more than a decade since the last time I touched C++ seriously, so still investigating.

🤨 1
p-himik 2025-10-18T12:06:33.053339Z

It could be that the constant inputs that I provide make the JIT overeagerly optimize it. Maybe @alexyakushev knows for sure what would be the best way to test it properly, without making JIT optimize for inputs or doing something else that wouldn't be useful in a situation where the inputs aren't known in advance.

p-himik 2025-10-18T12:16:36.948669Z

Nah, changing the inputs on every iteration still shows 10% faster execution than an equivalent C++ code. Apparently I could improve the performance of that code by using likely/`unlikely` with if and some fast integer types. Maybe some extra compilation flags would make it faster. Asked a friend who knows C++ well, he'll get back to me later.

oyakushev 2025-10-18T12:22:10.487099Z

It's hard to say what JIT is doing without inspecting the produced assembly. However, since this is competitive programming, the "correct" way to measure performance here is an antipattern, precisely opposite of what we always say to do. You would need to spin a fresh REPL and run the test only once measured by time , because that's what the testing systems in those competitions usually do. I don't know, maybe the testing protocols have changed and JIT warmup is permitted these days, but I have no idea how they control for caching then, besides, JIT is a kind of a cache you can say.

jaihindhreddy 2025-10-18T14:49:57.758549Z

You can get rid of the nested loop entirely. After dividing t by 1000, the constraints of the original problem are:

x +  y + z = n
10x + 5y + z = t
We pick a value for x in the outermost loop, which leaves two variables and two equations, which can be solved directly, yielding:
y = (t-n-9x)/4
z = n-x-y
The same thing in Clojure:
(set! *unchecked-math* :warn-on-boxed)

(defn f5 [^long n ^long t]
  (let [t (quot t 1000)
        max-x-count (quot t 10)]
    ; if t is not b/w n and 10n, there's no point.
    (when (and (<= n t) (<= t (* 10 n)))
      (loop [x 0]
        (when (<= x max-x-count)
          (let [tmp (- t n (* 9 x))]
            (or (when (zero? (rem tmp 4))
                  (let [y (quot tmp 4)
                        z (- n x y)]
                    (when (and (>= y 0) (>= z 0))
                      [x y z])))
              (recur (inc x)))))))))
And it's ~700x as fast as the f4 above:
user=> (crit/quick-bench (f4 1000 9283000))
Evaluation count : 1056 in 6 samples of 176 calls.
nil
             Execution time mean : 571.315100 µs
    Execution time std-deviation : 407.084473 ns
   Execution time lower quantile : 570.681813 µs ( 2.5%)
   Execution time upper quantile : 571.724933 µs (97.5%)
                   Overhead used : 1.903179 ns
user=> (crit/quick-bench (f5 1000 9283000))
Evaluation count : 779106 in 6 samples of 129851 calls.
nil
             Execution time mean : 773.858834 ns
    Execution time std-deviation : 4.238559 ns
   Execution time lower quantile : 768.838230 ns ( 2.5%)
   Execution time upper quantile : 778.421931 ns (97.5%)
                   Overhead used : 1.903179 ns

🔥 2
p-himik 2025-10-18T14:55:47.600669Z

Yeah, I wasn't trying to change the algorithm at all, just focused on the Clojure aspect of things. A better algorithm is almost always better than just head-on performance tweaks. :)

💯 1
jaihindhreddy 2025-10-18T15:00:59.077859Z

Yup. The first thing I noticed though, even without changing the algorithm, is to take some work from the innermost loop to the outer. After we freeze x, we don't have to do (* 10000 x) in the innermost loop each time.

(defn f4-b [^long n ^long t]
  (loop [x 0]
    (when (<= x n)
      (let [t (- t (* x 10000))
            n (- n x)]
        (or (loop [y 0]
              (when (<= y n)
                (let [z (max (- n y) 0)]
                  (if (= (+ (* y 5000)
                           (* z 1000))
                        t)
                    [x y z]
                    (recur (inc y))))))
          (recur (inc x)))))))
This makes f4 about twice as fast on my machine.

jaihindhreddy 2025-10-18T15:02:05.531989Z

The division of t by 1000 is a similar improvement, because we don't be doing (* z 1000) in the innermost loop.

Bob B 2025-10-18T16:18:26.557399Z

I'm used to bring nerd-sniped, so I'll take a shot... Taking a very different approach, after dividing t by 1000 (`t'`), I started at the ones... we know the ones number will be in (range (mod t' 5) (inc n) 5), because it'd be impossible to get e.g. a '2' in the last place with 3 ones. So, that should cut the possibilities down significantly. I think (based on very vague benchmarking with a couple best-ish and worst-ish case scenarios) that this (limiting the possiblities for ones, and then checking the possible fives and tens from there) is a bit faster overall. Then, I think if you wanted to go wild, you could maybe even turn the fives and tens into a binary search situation - for each possible value of ones you've got remaining bills and a remaining total they need to make; I think you could split the remaining bills, calculate the total, and if the total is low, move bills to the tens side, if it's high, move bills to the fives side, and cut the possibilities in half each time.

👀 1
2025-10-18T19:16:30.335889Z

finally submitted f5: execution time dropped to 1200 ms

pithyless 2025-10-18T22:50:42.585109Z

> Without delving into the algorithm itself (I have no idea what the proper algorithm is and I actively resist being nerd-sniped :D) 👀 Reading the rest of the thread and thinking: That escalated quickly 🤣 But kudos for the code insights that people can hopefully take out of this and apply to other problems.

p-himik 2025-10-19T03:32:21.730769Z

I don't consider Clojure-related tasks nerd-sniping. :) Most of the things that I did in the thread are just mechanical transformations. Even Java and C++-related ones (for which I asked an LLM to write the code for me and only changed it slightly, so no human brain was involved).