Fork me on GitHub
#clojure-europe
<
2023-11-24
>
Eugen04:11:07

neața' / mornin'

Mario Trost07:11:06

Good morning and tgif

ray08:11:13

Good morning. Thank goodness it’s leaf week.

🍂 1
schmalz08:11:17

Morning all.

thomas08:11:45

Morning... show why haven't you left yet @raymcdermott? 😉

😅 1
Thierry09:11:08

Hi everyone, I need some help with handling zip files and keeping the extracted entries with their content in memory (in a vector or map or anything) for further processing. I have this function below (based on https://stackoverflow.com/a/5428265) This function works fine if called directly but as soon as I want to do anything with the entries from the zip the stream is closed. To make sure the file stays open I have a with-open and a doall to make sure the lazy sequence is initialized. What am I missing that could be causing the side effects of the stream closing here? I have tried several things but all seem to come back to either the stream closing prematurely or the zipfile being closed before anything can be done with it.

(defn unzip
  [{:keys [^File file path filename]}]
  (let [zipfile (or file (io/file (str path filename)))]
    (with-open [zf (ZipFile. zipfile)]
      (doall ; initialize the lazy sequence immediately
       (map
        #(assoc {}
                :filename (.getName %)
                :data     (map
                           (fn [entry]
                             (string/trim entry))
                           (line-seq (io/reader (.getInputStream zf %)))))
        (enumeration-seq
         (.entries zf)))))))

👀 1
ray09:11:26

💡 move the with-open on the line with io/file

ray09:11:56

or maybe the file being passed in is being closed?

ray09:11:25

try and keep all the side-effects in one place if poss

Thierry09:11:48

When I call it like this I can get all entries just fine:

(unzip {:file (io/file "path/to/file.zip")})
As soon as I try anything that creates a new lazy-sequence the stream is closed prematurely

Thierry09:11:54

Let my try moving the with-open

Thierry09:11:15

Switching the let and with-open results in No matching field found: close for class java.io.File

ray09:11:26

any difference if you only pass the name not the file object ... cos that should be wrapped with-open too 🙂

Thierry09:11:37

ok il ltry

ray09:11:57

slurp also works :)

Thierry09:11:17

not using io/file in the call results in the same

Thierry09:11:51

slurp as replacement for io/file? that results in an exception and returning the actual zip contents bytes being thrown haha. Execution error (InvalidPathException) at sun.nio.fs.UnixPath/checkNotNul (UnixPath.java:90).

emilaasa09:11:12

What is it that you want to do? Open the zipfile, mutate some stuff and then zip it again?

Thierry09:11:06

No open the zip, get the contents (xml files) and send each entry (in memory) as sequence to another function

emilaasa09:11:22

Ah so only reading it in memory, but line by line?

emilaasa09:11:40

I.e it's not good enough to fully read the file and then start working on it?

emilaasa09:11:08

Sorry I mean entry by entry I guess 🙂

Thierry09:11:44

I dont mind either way as each file (entry) in the zip would be sent of to another function. The xml will be read as line-seq after getting it from the zip anyway. I now have this inside the unzip.

emilaasa09:11:14

You probably need to keep the inputstream open as well

emilaasa09:11:50

Something like this should work:

(defn process-zip [zip-file-path f]
  (with-open [zip-file (java.util.zip.ZipFile. zip-file-path)]
    (let [entries (.entries zip-file)]
      (doseq [entry (enumeration-seq entries)]
        (with-open [entry-stream (.getInputStream zip-file entry)]
          (f entry-stream))))))

(process-zip "example.zip" println)

Thierry10:11:24

let me try that

emilaasa10:11:15

The let didnt add much:

(defn process-zip [zip-file-path f]
  (with-open [zip-file (java.util.zip.ZipFile. zip-file-path)]
    (doseq [entry (enumeration-seq (.entries zip-file))]
      (with-open [entry-stream (.getInputStream zip-file entry)]
        (f entry-stream)))))

Thierry10:11:35

this has a side effect of not returning anything due to using doseq tho. that means that f should contain everything that needs fondling with 🙂 this is something I can work with, I'm just wondering if there is a way to get the data out to be able to continue processing outside of the doseq

genmeblog10:11:59

:data entry is a lazy sequence. doall realize first map, but inside created structure is another map which stays lazy after file is closed.

Thierry10:11:23

@U1EP3BZ3Q adding another doall for the data entry doesnt solve it tho

Thierry10:11:47

@U6T7M9DBR I got it working with this thanks! Moved everything inside f and added another binding for the entry name

Thierry10:11:00

(defn process-zip [zip-file-path f]
  (with-open [zip-file (java.util.zip.ZipFile. zip-file-path)]
    (doseq [entry (enumeration-seq (.entries zip-file))]
      (with-open [entry-stream (.getInputStream zip-file entry)]
        (f entry entry-stream)))))

👍 2
🚀 1
Ed11:11:41

I realise that you've got it sorted now, but I didn't spot in the thread where you'd worked out what the cause of the problem was in your original code? If you have, feel free to ignore me 😉 ... but you had a lazy seq of lazy seqs and you were only doalling the outer one ...

(defn unzip
  [{:keys [^File file path filename]}]
  (let [zipfile (or file (io/file (str path filename)))]
    (with-open [zf (ZipFile. zipfile)]
      (doall (map ;; <- outer lazy seq that is realised inside the dynamic scope
        #(assoc {}
                :filename (.getName %)
                :data     (map ;; <- inner lazy seq that isn't realised
                           (fn [entry]
                             (string/trim entry))
                           (line-seq (io/reader (.getInputStream zf %)))))
        (enumeration-seq
         (.entries zf)))))))
you could fix that by changing the second map to mapv or use slurp there or something? Does that make sense? or am I talking nonsense?

Thierry11:11:49

@U0P0TMEFJ I indeed have solved it using @U6T7M9DBR’s example. haha, can't believe it was that simple :man-facepalming::skin-tone-2: I changed the inner map to mapv and that magically solves the issue. I tried so many different approaches yesterday that I couldn't see what I had and hadnt tried anymore. However I did try using a doall for the inner map but that didnt fix it.

👍 1
Thierry11:11:10

I was so close, yet so far away in solving it

Ed11:11:01

That's a very familiar feeling 😉

👀 1
💯 1
Thierry11:11:05

So whats different in mapv from map other then that it doesnt return a lazy sequence but an initialized vector? Because doall map doesnt work

ray11:11:44

vectors are not lazy

Ed11:11:54

(defn unzip
  [{:keys [^java.io.File file path filename]}]
  (let [zipfile (or file ( (str path filename)))]
    (with-open [zf (java.util.zip.ZipFile. zipfile)]
      (doall
       (map
        #(assoc {}
                :filename (.getName %)
                :data     (doall
                           (map
                            (fn [entry]
                              (clojure.string/trim entry))
                            (line-seq ( (.getInputStream zf %))))))
        (enumeration-seq
         (.entries zf)))))))
this works for me ... you need doall on both maps

Thierry11:11:18

@raymcdermott I know they arent, is that the only thing that just makes it work?

1
Thierry11:11:39

Just wondering why a second doall on the nested map didnt work for me

pez11:11:56

I wonder that too.

Ed11:11:56

yeah ... me too ... search and replacing map for mapv work for me in the code you posted ... I tend to use them over things like doall when mixing with io or dynamic scoped things to try and avoid these sorts of problems ...

Thierry11:11:32

Ill have look at that doall again after lunch #omnomnom . I usually skip doall as much as I can too, as I also never use for.

Ed11:11:52

hope lunch is nice ... I've got a cold today, so I'm having a lemsip 😉

pez11:11:30

Why do you avoid for?

Thierry11:11:17

Imho it's slow and ugly :face_with_hand_over_mouth: I replace it with keep 👀 or just reduce

Ed11:11:17

yeah ... I tend to only use for when I want inner loops ... I find filter, map etc easier to read ... thanks @U02CX2V8PJN

Thierry12:11:58

Indeed, if I need inner loops I use keep inside reduce tho

Thierry12:11:38

... 'nu breekt mijn klomp' seriously tho, now it works using doall wth. I guess I tried so many approaches that I got things mixed up or something

👍 2
🥴 1
🙏 1
pez12:11:48

Thanks for updating about that. I started to wonder if I had doall all wrong. 😃

🙌 1
Thierry12:11:39

It's really weird, I am sure I tried it, but it works now so I'm happy 🙂

👍 1
Thierry12:11:48

I can use the approach I wanted to use in the first place now

🎉 3
emilaasa13:11:25

Well fought Thierry! 🙂

catjam 1
borkdude14:11:54

@U02CX2V8PJN babashka.fs also has unzip

babashka 2
borkdude 2
Thierry16:11:33

Good to know!

Ed11:11:07

Morning