Fork me on GitHub
#clojure
<
2020-03-16
>
pinkfrog03:03:52

will (sort coll) on a seq achieve nlgn time complexity?

seancorfield03:03:38

@i according to the source, it uses java.util.Arrays/sort after converting the coll to an array with to-array

seancorfield03:03:49

So whatever that guarantees...

butterguns13:03:01

I have a function that takes an optional extra arg. What is more idiomatic Clojure: (defn foo [req & [opt]]) (defn foo ([req] ... ) ([req opt] ... )) ?

vlaaad13:03:42

second one

👍 24
grounded_sage17:03:38

Can I update deps without restarting my repl with tools.deps?

bfabry17:03:10

no, you need something like pomegranate https://github.com/clj-commons/pomegranate

petterik17:03:28

I'm not sure how helpful this is but there's an unreleased feature in tools.deps called add-lib which you can read about here: https://insideclojure.org/2018/05/04/add-lib/ Github branch: https://github.com/clojure/tools.deps.alpha/tree/add-lib

seancorfield17:03:27

I use the add-lib branch of t.d.a. locally so I can add new dependencies easily without restarting my REPL. You can see how I do this in next.jdbc with a "Rich Comment Form" containing code to bring the test dependencies in from that repo when I'm working on other projects: https://github.com/seancorfield/next-jdbc/blob/master/test/next/jdbc/test_fixtures.clj#L133-L155

👏 8
seancorfield17:03:42

My dot clojure file has an alias for this, and a comment showing how to use it to add a git-based dependency (pulling in master of any project): https://github.com/seancorfield/dot-clojure/blob/master/deps.edn#L147-L160 ^ @grounded_sage

grounded_sage17:03:30

You are a clj community super hero @U04V70XH6

😊 4
Ramon Rios17:03:19

Friends, how do i call a function that is inside of a defrecord ?

Ramon Rios17:03:06

(defrecord ActiveMQConnection [url]
  c/Lifecycle
  (start [this]
    (log/info "Creating ActiveMQ connection to" url)
    (assoc this :connection
           (doto (amq/connect {:url url})
             (jms/start!))))

  (stop [this]
    (when-let [c (:connection this)]
      (jms/disconnect! c))
    (dissoc this :connection)))

tjb20:03:28

can you not invoke it by (your-method-here) ?

Ramon Rios17:03:08

I would like to call this start function

Derek17:03:41

Call the protocol method c/start on an instance of your record

Derek17:03:00

(c/start (->ActiveMQConnection “some-url”))

noisesmith17:03:16

the biggest gotcha in my experience is people expect start to be namespaced to the namespace implementing the record, when it's actually namespaced to the creation of the protocol

8
hiredman17:03:52

that is the thing, people tend to think of, and it is common to talk about the functions that are part of protocols as methods

hiredman17:03:33

but they are functions, as if created by defn, just with special dispatch

💯 4
lilactown18:03:26

I usually implement my protocols as -start or some other private-looking thing, and then a start function in the namespace that makes sense (like core etc.)

lukasz18:03:27

btw, you can call protocol methods directly on record instances implementing them: (.start (>ActiveMQConnection "some-url")) (not a good idea for Components though, as that doesn't do the dependency injection part)

Ramon Rios18:03:20

It worked, my problem now it's other. Thank you everyone

grounded_sage21:03:38

How do I write directly to disk from a BufferedInputStream?

noisesmith21:03:09

io/copy it to a FileOuputStream?

👍 4
noisesmith21:03:27

@grounded_sage

user=> (def uid (java.util.UUID/randomUUID))
#'user/uid
user=> uid
#uuid "b8a1753f-cbac-40bf-9b9d-cc9cd5f018c9"
user=> (defn foo-is [] (io/input-stream (.ByteArrayInputStream. (.getBytes (pr-str uid) "UTF-8"))))
#'user/foo-is
user=> (type (foo-is))
.BufferedInputStream
user=> (with-open [o (io/output-stream (io/file "foo-31"))] (io/copy (foo-is) o))
nil
user=> (slurp "foo-31")
"#uuid \"b8a1753f-cbac-40bf-9b9d-cc9cd5f018c9\""

noisesmith21:03:36

oh - and io/copy is happy to take a File directly, so explicitly calling output-stream was redundant

noisesmith21:03:01

so this suffices (io/copy promises to close things it opens)

user=>  (io/copy (foo-is) (io/file "foo-31"))
nil

grounded_sage21:03:41

Oh so I don’t need with-open nice!

andy.fingerhut21:03:30

The main thing to be careful of with an InputStream is to ensure that it gets an OutputStream behavior when doing io/copy. io/copy might do that under the hood already. I guess it wouldn't make much sense if it tried to create a Writer instead of an OutputStream in this case.

noisesmith21:03:22

@andy.fingerhut good point - I think this is the multimethod dispatch that gets hit, and it would do the right thing https://github.com/clojure/clojure/blob/master/src/clj/clojure/java/io.clj#L319

noisesmith21:03:38

the inputstream/file combo makes an outputstream out of the file and dispatches to inputstream/outputstream inside with-open

noisesmith21:03:25

and of course the outputstream implementation is used directly with no danger of a Writer wrapping it

grounded_sage21:03:28

I ran out of memory doing the copy

andy.fingerhut21:03:39

That seems a bit odd, unless the InputStream was somehow allocating memory as you were reading from it. What is the source of data of the InputStream, or its specific type?

grounded_sage21:03:44

The large file I am copying is fine on it’s own. But when I do all the files there seems to be some memory leak

grounded_sage21:03:00

Large CSV’s from S3

noisesmith21:03:08

there's an optional arg to io/copy that lets you specify a buffer size

hiredman21:03:27

what s3 client are you using?

andy.fingerhut21:03:44

That buffer size is more likely to be an issue if you were running out of memory for a single copy, not across many files. How many files would you estimate are involved?

hiredman21:03:16

my guess is you are are using the cognitect one, which if I recall pulls s3 blobs entirely into memory when downloading

grounded_sage21:03:30

But after each one it would close and release the memory?

hiredman21:03:31

what s3 client are you using?

noisesmith21:03:34

oh yeah, you might want to make sure you are reusing a single instance of the s3 client (even if working concurrently), and consuming the stream directly instead of a wrapper that puts everything in an array or string

grounded_sage21:03:40

Yes cognitect one

andy.fingerhut21:03:45

I do not know whether io/copy closes its inputs when it completes -- maybe not. Explicitly closing them yourself may help, but that seems difficult to imagine causing a memory problem with only 11 files.

noisesmith21:03:25

io/copy only closes things it itself opened, so yes this is a concern

hiredman21:03:59

how big are the files, what is the max heap on your jvm?

grounded_sage21:03:44

(defn S3-CSV->local-disk
  [csv-file]
  (let [folder (str "partner-data/temp/" (str/upper-case (:partner-id @config)))]
    (if (fs/directory? folder)
      (io/copy (get-csv csv-file) (io/file (str folder "/" csv-file)))
      (do (fs/mkdirs folder)
          (S3-CSV->local-disk csv-file)))))

noisesmith21:03:50

small aside about constructing file objects from parts of a path:

user=> (io/file "foo" "bar/baz.txt")
#object[.File 0x6e0cff20 "foo/bar/baz.txt"]
user=> (io/file "foo/bar" "baz.txt")
#object[.File 0x191a709b "foo/bar/baz.txt"]

👍 4
noisesmith21:03:03

which is to say, you don't need that str call

noisesmith21:03:42

also you could skip the if and self-call, and call mkdirs directly and unconditionally (it's a no-op if the dirs already exist)

noisesmith21:03:54

(that concatenation also works when appending a file with a string btw, so both str calls can be eliminated)

grounded_sage23:03:45

Not quite sure what you mean by eliminating both str calls

noisesmith00:03:15

from (io/file (str folder "/" csv-file)) to (io/file folder csv-file) from (str "partner-data/temp/" (str/upper-case (:partner-id @config))) to (io/file "partner-data/tmp" (str/upper-case (:partner-id @config)))

noisesmith00:03:55

and you don't need fs anymore - you can just unconditionally call (.mkdirs folder)

noisesmith00:03:21

it's not directly related to your question of course, just a driveby bikeshed

hiredman21:03:37

the s3 client you are using holds the entire contents of the file in memory before returning your input stream

😬 4
hiredman21:03:51

it has nothing to do with copy or closing or leaks or whatever

grounded_sage21:03:46

Hmm that is a shame

hiredman21:03:21

I don't recall if the s3 api itself exposes ranged gets, or if you have to generate a signed url and do a range get against that

ghadi21:03:34

it does support ranged gets

ghadi21:03:47

it's a param to GetObject, IIRC

grounded_sage22:03:01

Sorry could you please point me to an example or where in the source I can find this?

grounded_sage22:03:56

Figured it out :)

hiredman21:03:35

yeah, so you can do that, and even do that in parallel

ghadi21:03:44

the cognitect.http-client is not fit for all aws-api purposes, and we're slowly trying to extricate it from aws-api

grounded_sage21:03:54

This would change my io/copy though right?

grounded_sage21:03:28

I don’t even know where to start here haha. All this IO stuff trips me up

ghadi21:03:29

downloading chunks in parallel is recommended by S3, I think

grounded_sage21:03:50

Yes it is. Was trying to avoid it for a first pass

hiredman22:03:25

the simplest thing then is to go back to io/copy version with the with-open around it for the outputstream, but put a loop inside the with-open that downloads each part and copies it in to the outputstream

grounded_sage22:03:26

Ok I think I follow

andy.fingerhut22:03:21

And perhaps bump up your JVM's max heap size, if that is needed for one single large file...

ghadi22:03:58

(defn parts
  "given the total byte size and desired chunk size,
   return [start end] bounds for each chunk"
  [total chunk]
  (let [step (fn step
               [i]
               (when (< i total)
                 (cons [i (min (+ i total) max)]
                       (step (+ i total)))))]
    (step 0)))

ghadi22:03:22

if you need to request smaller byte ranges @grounded_sage ^

ghadi22:03:47

I'm sure there's a better way of expressing that function, but my brain is fried

ghadi22:03:09

@hiredman usually has a beautifully succinct way

hiredman22:03:09

I usually flail at the keyboard until whatever api I am using tells me the chunks are not too large or too small and then walk away

8
ghadi22:03:26

oof range requests are inclusive on the end param

ghadi22:03:55

[0, 499] <--- is the first 500 bytes

hiredman22:03:14

couldn't you do something with range and a step?

ghadi22:03:27

(defn parts
  "given the total byte size and desired chunk size,
   return [start end] bounds for each chunk"
  [total chunk]
  (->> (range 0 total chunk)
       (map (fn [start]
              [start (dec (min (+ start chunk) total))]))))

grounded_sage23:03:14

Is the min necessary here? @ghadi

ghadi23:03:55

for the last, possibly short, chunk

grounded_sage23:03:26

I was doing a comparison xD