Fork me on GitHub
#beginners
<
2020-01-06
>
grounded_sage10:01:52

How do you read in a large edn file. I've got a large JSON response from a server I want to inspect with the REPL. I've converted it to EDN and put it on the disk but every time I try to load it the repl locks up and I have to restart.

ludvikgalois10:01:55

is the top level a giant map?

grounded_sage10:01:03

It's only 740kb

grounded_sage10:01:37

@ludvikgalois yes top level is a map

ludvikgalois10:01:14

@grounded_sage 740kb isn't exactly giant though...

grounded_sage10:01:51

Which is why I'm confused as to why it keeps locking up

grounded_sage10:01:58

I'm using VScode and Calva

borkdude10:01:33

@grounded_sage One other alternative would be to use jet from the command line with a query

borkdude10:01:33

@grounded_sage One other alternative would be to use jet from the command line with a query

borkdude10:01:54

It's how I usually inspect large chunks of EDN on disk. https://github.com/borkdude/jet

grounded_sage10:01:02

This is a good temporary fix. But longer term this needs to be part of an application. I can't figure out why this is freezing my repl.

andy.fingerhut10:01:54

Is the contents of the EDN file publishable for others to try to reproduce? How long have you let it run? Have you monitored the JVM process to see if it is running out of memory, and tweaked the -Xmx command line option when starting the JVM to give it more memory if so?

grounded_sage10:01:01

Seems all I had to do was to use println I was sending the data straight to the repl.

andy.fingerhut10:01:01

Yeah, you do not want to print out that much data in a REPL. def 'ing a Var to hold the result is a good approach there.

grounded_sage10:01:38

Yea was a rookie error haha. I guess it runs out of memory because it's holding that info in the repl.

valtteri10:01:26

740kb shouldn’t cause trouble in the REPL

andy.fingerhut10:01:40

It can depend on whether the REPL is in a terminal, or an editor buffer like in Emacs or similar. The latter tend to behave less well with large output than terminals, although terminals can have trouble, too.

valtteri10:01:34

du -h /tmp/file.edn
896K	/tmp/file.edn
And then in repl (emacs+CIDER)
(def f (read-string (slurp "/tmp/file.edn")))
(first f)
=> {:key0 0}

valtteri10:01:19

How are you ‘loading the file’ @grounded_sage?

ramon.rios11:01:03

Guys, i have a nested map and i'm trying to remove one of it's keys. I'm using re-frame. Wich is the best way to dissoc this? I'm using dissoc but it's not passing o my test yet

lucio11:01:38

Dissoc works only on top level keys. If the map is nested you need to use

(update-in db [:nested :map] dissoc :your-key)

heyrutvik14:01:11

Hey folks, I was wondering how to achieve something like following in transformation step: Let's assume I've a collection of strings (words), I applied some filter and map on it, then I want to apply something like takeWhile, but the problem is I need/want to remember all the previous strings and do some operation on them (by combining them in single string) and passing it down pipeline, and continue with rest of the strings with the same step.. ultimately I'll have collection of strings (sentences). It looks like I need a state between the pipeline, in that takeWhile kinda step. Does it make sense? Any kind of thought provoking direction will be appreciated. Thanks.

hindol.adhya18:01:00

What is your condition in take while?

hindol.adhya18:01:03

Why not just take-while in one step and reduce/into/join in the next step?

heyrutvik07:01:35

hi @, in turned out I can achieve my requirements using reduce . The only difference was that my reduce doesn't reduce sequence to a single value, but yet again sequence.

hindol.adhya07:01:47

You mean a list/vector? I might be wrong but I don't think reduce can generate a sequence.

heyrutvik08:01:43

Yes, vector. Reduce can generate them. 🙂

heyrutvik08:01:48

Think about it in this way: It can generate whatever type of value you feed as seed.

heyrutvik08:01:45

Do you want example: you want to double each element in vector. You might think about map but reduce can do the same..

user=> (def v1 [1 2 3])
#'user/v1
user=> (reduce (fn [rt x] (conj rt (* x 2))) [(first v1)] (rest v1))
[1 4 6]
user=> 

heyrutvik08:01:51

Notice that we pass [(first v1)] , not (first v1) . The latter is value but former is vector of single value.

heyrutvik08:01:01

Does it make sense, @?

hindol.adhya08:01:43

Sorry, I wasn't clear. I mean reduce can generate a list, vector, set, map etc. but not a (lazy) sequence. It can of course generate a vector.

hindol.adhya08:01:37

Whereas map filter etc. will give you a sequence.

hindol.adhya08:01:02

All collections in Clojure are definitely seqable but not sequences themselves.

heyrutvik08:01:35

and sequences are lazy, right?

hindol.adhya19:01:41

Yes, there are no eager sequences in Clojure, but a lazy sequence can be realized eagerly with into, reduce, count and many more ways.

hindol.adhya19:01:50

Basically, any result that needs to scan through the sequence will realize the whole sequence eagerly.

hindol.adhya19:01:22

Because all sequences are lazy, you can easily represent infinite sequences in Clojure and realize only as much is actually needed. For example, (range) returns an infinite sequence from 0 to positive infinity but this is valid code,

(take 100 (range)) ;; => (0, 1, 2, ..., 99)
While evaluating (range) alone will give you headaches.

heyrutvik04:01:45

Right. Totally makes sense. Thanks @!

rakyi14:01:17

not sure I understand, but maybe you want let and split-with?

valtteri14:01:11

To me it sounds like you want to reduce words into sentences. (clue included) 🙂

heyrutvik14:01:18

Thanks @rakyi and @valtteri. I'll look into those functions. 👍

grounded_sage15:01:45

I've got a request I am making with clj-http and I want to write the JSON response to disk but it is running out of memory when it makes the request. How do I stream it to disk?

dromar5618:01:47

Haven't tried it, but you can tell clj-http to return the response as a stream:

;; Return the body as a stream
(client/get "" {:as :stream})
;; Note that the connection to the server will NOT be closed until the
;; stream has been read

grounded_sage20:01:46

@ yea that’s what I was doing but how do I turn that stream into a string? I’m there was a lot about IO/write etc when I was googling earlier. Going to try and slurp it next time I’m at the computer

jr0cket16:01:19

I'd like to update all the values in a map using a function and return a new map with the same keys and updated values. I am sure I have done this lots of times, but my mind has gone blank

{:a 2 :b 3.5 :c 2.5 :d 3} 
I can use map with an inline function and update the val but that only returns the sequence of values and not the updated map
(map #(- 10 (val %)) {:a 2 :b 3.5 :c 2.5 :d 3} )
I though I could do this with an inline function. I guess I could use a for or into instead. Any suggestions? Thank you

shan17:01:07

oh that’s what the gist actually has :face_palm:

jp31219:01:34

You could also simply do

(into {} (fn [[k v]] [k (myfn v)]) mymap)
This is because maps are also sequences of map entries and map entries restructure as vectors/seqs of arty 2. At the same time you can conj pairs (vectors of arty two) into maps, e.g. (conj {} [:a :b])

michael.e.loughlin21:01:49

I want to write a small program that has a single writer thread, and multiple reader threads. Is this something that Clojure concurrency primitives are good at or should I be looking at interop?

ghadi21:01:42

you can always use java.util.concurrent, or channels from Clojure core.async @michael.e.loughlin

michael.gaare21:01:25

Just futures will work if you have a known number of reader threads

bfabry21:01:40

java threads + core.async. possibly managed using a threadpool would be my preferred soln

hiredman21:01:58

Clojure's model for identity is in some ways ideal for that case

hiredman21:01:04

Because you put immutable data in a mutable ref, multiple concurrent readers can get and read that immutable data without impeding other reads or even stopping writes

ghadi21:01:11

1) make a queue or channel 2) pass that queue or channel to a producer 3) pass the same queue or channel to the consumers 4) __ 5) profit!

hiredman21:01:45

You likely just need an atom

andy.fingerhut21:01:58

Out of curiosity, what "thing" is it you were hoping to have a single writer, and multiple readers? If the answer is "an arbitrary immutable object, e.g. a Clojure map, or vector", then a Clojure atom enables all of that and more, as long as you are fine with readers getting snapshots of the immutable value, and having to go back and deref the atom again later if you again want the new latest value.

michael.e.loughlin21:01:53

I'm implementing a toy database based on chapter 3 of Designing Data-Intensive Applications. It consists of a hash index that lives in an atom, but the underlying "db" is an append only text file that occasionally gets replaced

andy.fingerhut21:01:16

If the answer is "some arbitrary mutable thing", then there are all kinds of details about that mutable thing, e.g. thread safety, that are all part of the answer, and will often impose constraints on your code that you will need to check manually yourself as you initially develop, and probably later update, your code.

andy.fingerhut21:01:26

"occasionally gets replaced"? Meaning the hash index in the atom sometimes gets updated to have a new file name in it, no longer the one it had been using for a while?

andy.fingerhut21:01:34

If you meant "occasionally instead of appending to the file, the contents are erased to empty and we start appending from that again", that is notably different.

andy.fingerhut21:01:47

I guess the hash index contains offsets into the file?

michael.e.loughlin21:01:48

when the append-only text file grows to an arbitrary size, I "compact" it by taking the most recent values of all my keys (its a key-value DB) and drop them into a new "main file"

andy.fingerhut21:01:09

It seems like as long as you know the ways to flush/sync the contents of the append operation so that it is visible to all readers, and/or to ensure that a new created file is guaranteed to be visible to all readers, you can do those operations in the writer, and only when they are complete, update an atom containing the index, assuming the index is an immutable collection like a map

noisesmith21:01:10

this almost sounds like a good use of an agent, since agents act as a queue of operations to perform on a state (thus will ensure that only one thread is using the i/o connection at a time)

noisesmith21:01:36

so each write or compact would be sent to the agent, and would be free to set any metadata about the file you are abstracting after each operation

noisesmith21:01:59

the drawback is that operations on agents are non-blocking (this might not be the right semantics)

noisesmith21:01:17

but I'd be suspicious of using an atom, as i/o and retries don't mix

andy.fingerhut21:01:29

If there is a single writer, there will be no retries.

andy.fingerhut21:01:40

but point taken and agreed with

andy.fingerhut21:01:37

My "no retries" comment assumed the single writer was always from the same dedicated writer thread of the program

noisesmith21:01:47

yeah - that's also valid, I like the use of agent here because it bakes that assumption into the behavior of the container used, but using discipline to only write from a dedicated thread also works (though you likely end up implementing much of what agent gives you for free, eg. you now need a queue to get input from other threads etc...)

andy.fingerhut21:01:42

sure. And regarding your comment that operations on agents are non-blocking, I haven't used await before, but is an option the writer in this scenario could use to stop until all pending operations have completed.

noisesmith22:01:14

the problem there (in my experience) is that (do (send a f) (await a)) can end up waiting on some g sent from another thread

noisesmith22:01:22

but it's true, await usually works fine

noisesmith22:01:32

my workaround is instead of await , do (let [done (delay true)] (send a #(do (f) (force done))) @done)

noisesmith22:01:06

but that patterns is awkward

andy.fingerhut22:01:32

In this single-writer, multiple reader scenario, only the writer would be doing send calls, yes?

andy.fingerhut22:01:22

Interesting you mention about the "waiting on g sent from another thread" thing. The doc string for await seems to imply that it wouldn't do that.

andy.fingerhut22:01:12

Only imply that, not promise it, I mean. An implementation that waits longer than the minimum necessary implied by the doc string is still meeting what it says, if perhaps prone to misinterpretation on how soon await returns.

noisesmith22:01:49

waiting on g happens in a data race - because the send and the await are not atomic

noisesmith23:01:21

await doesn't wait for a specific action, it waits on all running / pending actions

dpsutton23:01:24

the io could be in a watch?

dpsutton23:01:06

or a watch could queue the new entries for writing?

noisesmith00:01:27

fair point - a watch avoids retry noise