2026-02-18 clojure | Clojure Slack Archive

clojure 2026-02-18

2026-02-18T17:20:59.015899Z

So for next.jdbc, i am running into an issue where a connection cannot be constructed because it is attempting to set a property to a default value before setting the value i am setting in the dbspec, and this fails because the driver i am using, the google cloud bigquery driver, doesn't provide a method to get a default for that property, in this case ProjectID. Is there a way to get it to not attempt to set defaults before constructing the 'real' properties? nvm, this appears to be a bug in the driver

seancorfield 2026-02-18T17:37:44.255249Z

Also: #sql is more likely to get targeted help for next.jdbc etc.

2026-02-18T17:38:23.108339Z

thanks, i had looked for a jdbc channel but it was archived

seancorfield 2026-02-18T17:41:55.542549Z

I set the topic on that to "Archived. Join #sql instead." to help future folks. The description already had that but it was less visible I think.

👍 1

2026-02-18T19:56:31.832779Z

I create lots of big PDFs in Clojure, and this generation follow always a patter, which I implement abstractly like below. So I generate seq-of-byte-array (bytes are the PDF content), most often in various map calls. This gets in out of memory very quick, if I have lets say 1000 files of 100 MB. But in reality I only need to have one byte array in memory at any given point in time, after it was written to disk, it could be discarded.

(->>
 (map-indexed (fn [index _]
                {:index index
                 :bytes (byte-array (repeat 100000000 1))})
              (range 1000))
 (run!
  (fn [info]
    (with-open [out (io/output-stream (format "/tmp/%s.bin" (:index info)))]
      (io/copy (:bytes info) out)))))

I know that I could get this by doing the full file generation inside the run! but this seems not Clojure idiomatic, as being procedural.

(run!
    (fn [index]
      (println :index index)
      (let [info
            {:index index
             :bytes (byte-array (repeat 100000000 1))}
            ]
        (with-open [out (io/output-stream (format "/tmp/%s.bin" (:index info)))]
          (io/copy (:bytes info) out))))
    (range 1000)
    )

Is there any pattern to solve this, which is more idiomatic Clojure ? I wonder about a "lazy sequence", which somehow "forgets" a value after it was consumed once....

Alex Miller (Clojure team) 2026-02-18T19:58:09.343449Z

you want to avoid holding the head of a large sequence, which will prevent the whole thing from being garbage collected. run! is a good approach to do so

Alex Miller (Clojure team) 2026-02-18T19:59:10.618599Z

it achieves this by using reduce instead of ->> - you could also do the same

dpsutton 2026-02-18T20:00:40.889579Z

is the first example here OOMing on chunking?

2026-02-18T20:02:28.110459Z

Not sure, I just think that at one point in time I will have 1000 * 100 000 000 bytes in memory. I don't think that using lazy sequence prevents this.

dpsutton 2026-02-18T20:03:16.598949Z

i’m claiming that using the first form in this thread

(->>
 (map-indexed (fn [index _]
                {:index index
                 :bytes (byte-array (repeat 100000000 1))})
              (range 1000))
 (run!
  (fn [info]
    (with-open [out (io/output-stream (format "/tmp/%s.bin" (:index info)))]
      (io/copy (:bytes info) out)))))

you will probably have only ~32 byte arrays in memory at once due to chunking of the lazy sequence

2026-02-18T20:07:44.177629Z

This is exactly one of my questions.... But even if so, ideally I want "at max 1" .. not 32

dpsutton 2026-02-18T20:08:48.848049Z

ah. gotcha. i bet @hiredman has a transducing function that does this immediately.

2026-02-18T20:09:32.474519Z

I don't think it is a specific transducing function, but just using transducers

2026-02-18T20:09:55.114779Z

map-indexed has a transducer arity

2026-02-18T20:10:45.726789Z

transducers tend to be better than using lazy-seqs when you really care about when something happens, in this case you care because you can't have more than one in memory at once

2026-02-18T20:14:15.279259Z

https://clojurians.slack.com/archives/C053AK3F9/p1700418176500259?thread_ts=1700416714.072009&cid=C053AK3F9

2026-02-18T20:17:24.726059Z

I think transducres are indeed the answer:

(def process-xf
  (comp
   (map (fn [index]
         {:index index
          :bytes (byte-array (repeat 100000000 1))}))))

(transduce
 process-xf
 (completing
  (fn [_ info]
    (println :index (:index info))
    (with-open [out (io/output-stream (format "/tmp/%s.bin" (:index info)))]
      (io/copy (:bytes info) out))))
 nil
 (range 1000))

2026-02-18T20:21:29.103429Z

This runs through while using "stable memory", it seems. (but in parallel, can it be ?) Is is correct to assume that this then holds either "1" or at max (max-cores) of my big byte-arrays in memory ?

2026-02-18T20:23:35.643869Z

If so, then this is the general pattern I was looking for.

dpsutton 2026-02-18T20:23:59.266569Z

i don’t believe there is anything that fans out to cores. and it can garbage collect but it might have more than one in memory at once

2026-02-18T20:24:42.889689Z

I see in htop that multiple cores are active... but do get results in order....

2026-02-18T20:26:52.715659Z

That was an"artifact" of htop... The dorun! shows the same, and that's not parallel for sure.

2026-02-18T20:27:59.396719Z

Right , "it can garbage collect", but it might of course not guaranty "only 1" in memory.

2026-02-18T20:29:20.229469Z

same as the dorun! version. But my "first" version, does not allow garbage collection, as I "hold on the head".... I learned something... and a potential use case for transducers-

exitsandman 2026-02-18T20:33:29.762459Z

doesn't run! on an eduction -as opposed to a map-indexed chunked seq- do the trick here?

2026-02-18T20:35:53.604889Z

yes, as well. Docu says:

;; This will run out of memory eventually,
;; because the entire seq is realized, 
;; because the head of the lazy seq is retained.
(let 
  [s (range 100000000)] 
  (do (apply print s) (first s)))

;; This iterates through the lazy seq without realizing the seq.
(let 
  [s (eduction identity (range 100000000))] 
  (do (apply print s) (first s)))

ghadi 2026-02-18T20:52:23.875509Z

"Docu says"?

2026-02-18T20:52:39.357379Z

https://clojuredocs.org/clojure.core/eduction

ghadi 2026-02-18T20:53:26.693659Z

yeah it's very misleading

2026-02-18T20:53:46.328839Z

why ?

ghadi 2026-02-18T20:59:12.264069Z

user=> (let [x (eduction identity (range 100))] (map System/identityHashCode [(seq x) (seq x)]))
(602830277 296204898)
user=> (let [x (range 100)] (map System/identityHashCode [(seq x) (seq x)]))
(1938259481 1938259481)

ghadi 2026-02-18T20:59:32.621879Z

they are different seqs in the eduction case

ghadi 2026-02-18T20:59:47.834229Z

it's iterating through the first seq, throwing it away, then creating another seq

ghadi 2026-02-18T21:00:53.090959Z

you can make this work easily with seqs as mentioned above 1. write a fn that takes one pdf 2. don't hold the entire collection of pdfs

ghadi 2026-02-18T21:02:12.660559Z

the clojuredocs comment is wrong

2026-02-18T21:07:51.174809Z

This does work, which is my case. I can write 1000 files of one GB each. So it does garbage collection in between.

(eduction
 (map (fn [index]
        {:index index
         :bytes (byte-array (repeat 1000000000 1))}))
 (map 
  (fn [info]
    (println :index (:index info))
    (with-open [out (io/output-stream (format "/tmp/%s.bin" (:index info)))]
      (io/copy (:bytes info) out))))
 (range 1000)
 )

ghadi 2026-02-18T21:28:56.531889Z

you do not need to use eduction

ghadi 2026-02-18T21:29:50.110369Z

plain seq/coll fns will work

2026-02-18T21:32:11.365569Z

eduction also stops chunking

2026-02-18T21:33:04.458339Z

so if you took the original code and replaced (range 1000) with (eduction identity (range 1000)) there is a good chance it would also stop running out of memory

2026-02-18T21:37:53.041159Z

I suspect @ghadi is keying off the mentioning of head holding, but that doesn't sound like what is happening here, it looks like chunking is causing 32 or so gigabyte size byte arrays to try to exist in memory at once. you can do stuff (like the eduction thing with identity) to try and avoid chunking (range is chunked so you have to unchunk which eduction does, but things like map and even for pass chunking through) but the best way to have complete control of what is realized and not is using transducers (and I would say processing with transduce, using transducers in eduction has complicating factors).

ghadi 2026-02-18T21:40:12.080719Z

^ yes I didn't catch the chunking concerns

ghadi 2026-02-18T21:40:38.560789Z

what hiredman said

exitsandman 2026-02-18T22:05:24.612119Z

Fwiw what I was proposing was

(run! F (eduction (map-indexed G) Xs))

as opposed to

(run! F (map-indexed G Xs))

with the assumption that the resource consumption happens in G. As a reference, this is what eduction's reduce looks like:

(reduce [_ f init]
     ;; NB (completing f) isolates completion of inner rf from outer rf
     (transduce xform (completing f) init coll))

dpsutton 2026-02-18T22:09:39.562739Z

i think the signature of f that run! expects gets a bit funky with transducers right?

exitsandman 2026-02-18T22:12:56.491039Z

wdym?

dpsutton 2026-02-18T22:13:38.070839Z

run! expects a function that takes a single arg whereas reducing functions treat that as the completion arity. So it gets a bit wonky i though. i swapped back to transducer rather than run for this

dpsutton 2026-02-18T22:14:10.341009Z

but you baked the transducer into the eduction (which feels weird to me, but works here)

exitsandman 2026-02-18T22:14:13.528059Z

that doesn't track for me, I never had problems run! ing eductions

dpsutton 2026-02-18T22:14:50.822069Z

yes. i was thinking (run! (xf rf) seq) which is what i wanted to reach for rather than (run! f (eduction xf seq))

exitsandman 2026-02-18T22:15:58.283049Z

yeah that wouldn't work because run! goes into a plain reduce

Alex Miller (Clojure team) 2026-02-18T22:17:34.686629Z

https://ask.clojure.org/index.php/13153/could-gain-another-arity-cover-common-case-feeding-eduction

💡 2

➕ 2

igrishaev 2026-02-19T09:35:27.058329Z

As far as I understand, you're trying to share the same byte array across multiple PDFs. This is fragile and might be broken by switching from map to pmap (or any kind of parallel execution). It would be simpler to use a dedicated ByteArrayOutputStream for each file. This output is dynamically increasing on demand. Once you've processed your PDF, the stream gets garbage-collected.

igrishaev 2026-02-19T09:39:19.930309Z

Also, consider an output stream pointing to a temp file. It's slower of course but you won't saturate memory when processing PDFs in parallel

☝️ 1

2026-02-19T12:32:38.185019Z

Not sharing. Be sure that the ByteArray can be garbage collected after I wrote it to disk

2026-02-19T12:34:10.918019Z

Which run! does allow, but I was wondering about as different pattern. But the refered AskClojure looks like what I was looking for

igrishaev 2026-02-19T12:34:16.254759Z

It makes not sense to allocate an array in advance as you don't know for sure the exact size of the output

igrishaev 2026-02-19T12:35:01.890779Z

Just use ByteArrayOutputStream if you need raw bytes. Write to it, close it and then invoke the method called .toByteArray

2026-02-19T12:35:31.722419Z

But I don't want to allocate. The code is simplified. I use a library which produces PDFs as bytes

☝️ 1

exitsandman 2026-02-19T12:36:14.825369Z

perhaps the lib already supports outputting to a stream?

igrishaev 2026-02-19T12:36:26.809669Z

Are sure there is no way to pass an output stream?

2026-02-19T12:36:46.522699Z

That might be an solution, indeed.

2026-02-19T12:37:02.080819Z

I will look for that.

igrishaev 2026-02-19T12:37:31.821979Z

In- and output streams are bread and butter for Java, there must be a method that accepts either of these

2026-02-19T12:39:05.031289Z

Maybe I was too much focussing on simple values, as used in closure,so came to byte arrays

2026-02-19T13:17:22.916839Z

Yes, I think this was the right direction. My usage of byte-arrays was wrong, using streams instead is better.

🎉 1

geoff 2026-02-18T00:57:14.288519Z

Anyone else obsessed with layers of macro DSL's?

😂 1

🙅 2

🙅🏼 1

mloughlin 2026-03-31T11:21:52.601999Z

macros are great fun for showing off how clever you are

Ed 2026-02-18T09:54:07.597189Z

> Nothing says "scr*w you" like a DSL • Stuart Halloway - https://youtu.be/LEZv-kQUSi4?si=JJQp05ZKs8Q71ofl&t=483 😉

teodorlu 2026-02-18T12:27:21.394129Z

Data is better! 😁

teodorlu 2026-02-18T12:27:53.642229Z

I'd much rather have a linear data pipeline than DSL magic.

geoff 2026-02-18T00:59:05.441459Z

Like a lasagna with layers cheesy macro compilers mapping between language and semantic domains . ?

geoff 2026-02-18T00:59:39.532479Z

6, 7 layers deep ??

🤣 1

7️⃣ 1

6️⃣ 1

seancorfield 2026-02-18T01:12:53.482629Z

Nope 🙂

john 2026-02-18T01:48:26.795189Z

Not me, never... Why, ya got any?

2026-02-18T02:19:54.614699Z

https://okmij.org/ftp/Scheme/macros.html#ck-macros

2026-02-18T03:35:33.523049Z

Not me, often times my ideas that start with macros often just end up as simple function or dsl over clojure data

👍 1

geoff 2026-02-19T08:10:47.733369Z

Wow very surprised by this response from lisp people

teodorlu 2026-02-19T09:00:24.907149Z

Clojure's got its peculiarities! But I suspect you might find similar opinions from Common Lisp people who have been exposed to macro spaghetti written by other people!

teodorlu 2026-02-19T09:00:40.478719Z

@gcoumessos curious about your take? for/against macros?

Clojurians Log v2

clojure 2026-02-18