Fork me on GitHub
#clojure
<
2021-03-08
>
cch114:03:53

Here’s a code challenge that, if resolved, will help me test a library I am writing: how can one write an infinitely deep data structure in Clojure? Infinitely wide data structures are common. And it’s even not too difficult to infinitely generate a recursive data structure (but it never returns…). My goal is to find the analogue of the lazy seq but in depth. An example would be “a list of a list of a list of a list of ….“. A characteristic of this data structure is that I could iterate first on it forever.

Ben Sless15:03:38

Won't something like

(defn lazy-tree
  []
  (lazy-seq
   (list
    (lazy-tree)
    (lazy-tree))))
satisfy this condition? It stack overflows, though

borkdude15:03:10

I think any such data structure will stack overflow.

borkdude15:03:26

(defn x [] [(x)]) (x)

Ben Sless15:03:03

Not if it's lazy, then maybe you can walk it

Ben Sless15:03:21

seems like my lazy-tree doesn't stack overflow as long as you don't print anything

Ben Sless15:03:29

(def f (first (lazy-tree)))
^ this is fine

borkdude15:03:32

it's fine as long you don't do anything with f

borkdude15:03:38

which is pretty useless ;)

cch115:03:18

You guys are fast. This was my first attempt, but it burns up tthe CPU…: ((fn f [] (lazy-seq (f))))

cch115:03:28

But attempting to get even the first element locks up. I’m not sure why since the very first element is not aggressively computed (you can perform class on the return of that fn call above).

Ben Sless15:03:56

Let's ask the big question - why?

☝️ 3
cch115:03:32

I”m OK with blowing out the stack on printing or attempting to walk infinitely deep -that’s kinda the point of this structure. I want to be satisfied that my code lazily walks an infinitely deep data structure.

cch115:03:42

Just as first can force realization of the first of an infinitely wide data structure, I want to prove that my accessor ( a variant of a zipper) forces instantiation of only a finite number of recursions of an infinitely nested data structure.

cch115:03:05

@UK0810AQ2 ' attempt and mine have a common shape: a recursive call to a lazy seq with a single elements that is a call to a lazy…

Ben Sless15:03:30

Actually, my example is both infinitely deep and infinitely wide

cch115:03:57

Yep. Mine is not, and just infinitely deep is OK.

Ben Sless15:03:58

its width increases exponentially

cch115:03:09

I’ve already got the width test working.

Ben Sless15:03:43

but what's the difference between an infinite lazy seq and an infinitely deep seq? an infinite lazy sequence is like a linked list. Isn't a linked list deep? or do you consider it wide?

cch115:03:44

But the infinintely deep part seems like it should work. But it clearly does not

cch115:03:57

I think the difference can best be explained with an example accessor: (iterate first data) should run infinitely long on an infinitely deep structure. It would not (necessarily) on an infinitely wide lazy seq.

cch115:03:07

Upon further reflection, here is my litmus test: (iterate first data) should neither error nor return only on an infinitely deep data structure while (iterate rest data) should neither errors nor return only on an infinitely wide data structure. (both modulo stack overflow or OOM conditions).

ghadi15:03:48

still haven't answered why

ghadi15:03:16

at first reading this is classic https://xyproblem.info

cch115:03:01

WHy is easy: to test that a zipper-like constructor does not accidentally walk infinitely deep (and I know it does now, it’s not trivial to fix and it’s easy to regress -thus a unit test seems prudent).

cch115:03:59

…plus it seems like, for all it’s elegant handling of laziness, Clojure would have a tool for lazy recursive (in depth) data structures.

ghadi15:03:08

several of the code snippets above will generate recursive datastructures

cch115:03:31

But none seem to be able to do so in finite time.

ghadi15:03:32

whether or not you can print them is a separate concern

ghadi15:03:47

I do not understand that assertion

cch115:03:03

Maybe I missed something…. checking my first solutiuon again…

cch115:03:39

Nope. Mine (for sure) and I think Ben’s solution (I think, more testing required) both cannot pass the test of (class (first data))

cch115:03:09

(class data) => LazySeq

cch115:03:22

(class (first data)) => ….

cch115:03:20

Looks like @UK0810AQ2 ' solution (`lazy-tree`) passes the “Access the first element lazily” test…

cch115:03:05

Here’s a slightly simplified version (only infinite vertically) of Ben’s lazy-tree that does indeed solve the problem:

cch115:03:36

(def lazy-tree
  ((fn f [] (lazy-seq
            (list
             (f))))))

cch115:03:45

Thanks, Ben.

cch115:03:07

Correctly blows out the stack on (iterate first (lazy-tree)).

pinkfrog15:03:46

Basically I want the setting go with the project, instead of residing in $HOME

Endre Bakken Stovner16:03:14

I have the following function which recursively reads all forms in a file:

(import '[ PushbackReader])
(require '[ :as io])
(defn read-all
  [file]
  (let [rdr (-> file io/file io/reader PushbackReader.)]
    (loop [forms []]
      (let [form (try (read rdr) (catch Exception e nil))]
        (if form
          (recur (conj forms form))
          forms)))))
It works well, however, I want to read maps within the forms as sorted-map. One inelegant way I can do that is to switch all instances of {:bla "bla"} in file to (sorted-map :bla "bla") before sending it to read-all.
(spit "test.json" (into (sorted-map) (sort (zipmap (range 50 0 -1) (repeat 1)))))
(read-all "test.json") ;; [{7 1, 20 1, 27 1, ...}] ;; Darn!
Can you think of a better way to get the items in hash-maps in the order they are given? Is it possible with clojure.edn/read and using the :readers opt?

ghadi16:03:03

maps have no order

ghadi16:03:23

if you want to post-process them into a sorted-map, that is possible

🙏 3
ghadi16:03:41

if you know that the keys are sortable ahead of time

Endre Bakken Stovner16:03:00

I'd love to be able to read {4 1, 2 3, 6 7} from a file into an array-map so the order given is preserved.

borkdude16:03:58

@endrebak85 The array map threshold is an implementation detail. Just store a seq of tuples.

Endre Bakken Stovner17:03:39

This is what I thought too, but an answer with 11 upvotes on Stack Overflow suggested using array-map so I thought perhaps I was mistaken.

borkdude17:03:14

Afaik array-map is just an optimization detail

Endre Bakken Stovner17:03:28

Yeah, 8 seems to be the magic cutoff on my machine:

user=> (into (array-map) (into (sorted-map) (zipmap (range 9 0 -1) (repeat 1))))
{7 1, 1 1, 4 1, 6 1, 3 1, 2 1, 9 1, 5 1, 8 1}
user=> (into (array-map) (into (sorted-map) (zipmap (range 8 0 -1) (repeat 1))))
{1 1, 2 1, 3 1, 4 1, 5 1, 6 1, 7 1, 8 1}

borkdude16:03:21

Or use e.g. clj-commons/ordered

ghadi17:03:19

what are you really trying to do by preserving written order?

Endre Bakken Stovner17:03:24

I am creating a little DSL where {} preserves the order written. So when I read a snippet of a file written in my DSL stuff enclosed in braces becomes a seq of tuples (as per @borkdudes suggestion).

(def example-dsl-snippet "{1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1}") 
(read-string example-dsl-snippet) ;; this loses order: {7 1, 1 1, 4 1, 6 1, 3 1, 12 1, 2 1, 11 1, 9 1, 5 1, 10 1, 8 1}
I'd love a function that could read a string describing a map so it becomes (seq '([1 1] [2 1] [3 1] ...)), not a map.

borkdude17:03:07

@endrebak85 Is that DSL used from Clojure programs? Then that may be very confusing

Endre Bakken Stovner17:03:59

Yes, but the DSL is basically json. It is not meant for Clojure programmers, but people with experience with a Python DSL called https://snakemake.readthedocs.io/. For those users, this would be expected behavior.

borkdude17:03:36

But you want to parse this using a Clojure parser?

Endre Bakken Stovner17:03:48

Yes. I guess I should just use a vector of tuples: [[kw val] [kw2 val2] ...] instead

Endre Bakken Stovner17:03:37

Thanks for the suggestions and help btw.

hiredman17:03:46

yeah, you just can't assume order for maps like that, so you can't use maps for a dsl like that

👍 3
borkdude17:03:54

you could also store the order within the map {:a 1 :b 2 :order [:a :b]}

Endre Bakken Stovner17:03:07

But this would require my users to add the order above. I guess preprocessing the file to switch {-1 5 0 3} into (seq '([-1 5] [0 3])) is the best way to do it. Or just use a vec of [kw val] tuples.

p-himik18:03:49

Alternatively, you can explicitly use array maps. The order will be preserved. Do note however, that assoc, dissoc, and any other function that returns a new map may change its type to a hash map.

borkdude17:03:35

If this language is only used from say Python, but you want to parse / use it from Clojure (for whatever reason), I would just roll my own parser for this and not try to parse this as EDN

🙏 3
quadron17:03:58

why do we have [vswap! swap! alter send] for [volatile atom ref agent]?

quadron17:03:13

is there a reason for nonexistence of a polymorphic name for different ref-like-values?

Ed17:03:56

because they have different semantics? ... refs need to be used in a transaction. Agents are asynchronous. Volatile's and Atoms have different thread guarantees. I'm not sure that anyone should be pretending that they're the same to write to ... only that you can deref them to read

quadron17:03:44

it's not about pretending they are the same

quadron17:03:22

it's about having an abstract operator with a unique name that has different semantics based on the value it is called with

Ed18:03:41

Abstractions are about pretending a group of things are the same in a specific way. Clojure's reference types are the same in that they can always be read without a significant cost (unlike a remote actor which would require a network request, for example) because they're in ram and (ideally) wrap a persistent data structure. But they all have different semantics for writing to, so what's the point of an abstract function? Isn't that just a common name that obscures the important difference between why you're using one reference type over another?

Ed18:03:13

I'm not sure I understand why that would be a positive thing

quadron18:03:04

obscures it for whom? why would an expression have to locally demonstrate the semantics of it's execution if the semantics doesn't matter locally?

quadron18:03:36

at the local point where swap!/vswap!/... is called, the knowledge about the type of the ref does not matter

quadron18:03:26

after all, is the ref itself not telling you that difference?

quadron18:03:06

(circle-area circle) (square-area square)

vemv18:03:20

Let's say there was a a protocol for writes of ref-like things. Then the underlying interface would be "megamorphic" which makes function inlining less likely Not sure if this is the precise reason though, but megamorphic site: returns some hits

Ed18:03:03

> why would an expression have to locally demonstrate the semantics of it's execution if the semantics doesn't matter locally? Because it matters in the execution of the program. If it's a agent, then I can't deref it and expect it to have the value I've just sent it, or if it's a ref I have to have a transaction open make a change. > at the local point where swap!/vswap!/... is called, the knowledge about the type of the ref does not matter I'm not sure I can agree with that. I think at that point it very much matters. My send to an agent has completed, what can I do with that agent, vs the equivalent on an atom. > after all, is the ref itself not telling you that difference? Yes, the constructors are different. But so are the update semantics, and also the reasons that an update might fail or be retried. These differences are what would be obscured by an abstraction that allowed you to easily exchange one for another. It would also need to take into account things like transactions, asynchrony, livelocks and deadlocks (for example), and complicate the ease of use of things like atoms. Also, how would it deal with alter vs commute? Would it need to support both update strategies on all reference types?

Ed18:03:10

I think history is also a reason that this doesn't exist. The STM predates protocols in Clojure.

quadron18:03:20

your take would have made sense to me if one of swap!/vswap!/alter!/send had implementations for more than one ref-type

Ed18:03:25

I'm not sure I understand why that makes more sense ... can you explain further, please?

quadron18:03:15

all these functions are always called in the presence of the ref-like value, so there is no confusion there with respect to alter/commute issue, in the absence of knowledge, it would call whichever operation that is more generic (expects less structure from data)

Ed19:03:19

> all these functions are always called in the presence of the ref-like value, so there is no confusion there I think that's what I'm getting at. There is confusion. Everything about calling one of these update functions is different, apart from the fact that they all do some form of state management > with respect to alter/commute issue, in the absence of knowledge, it would call whichever operation that is more generic (expects less structure from data) alter updates a ref in a compare and swap kind of way (but consistently within the transaction), wheras commute allows for commutable operations to be reordered if there's a transactional conflict. This isn't supported by any the other reference types and is an example of where the semantics up updating a ref differ from an atom, for example.

quadron17:03:44

or perhaps i am not aware of that protocol?

borkdude17:03:41

There is no protocol for atoms in Clojure, only interfaces (IAtom and IAtom2)

3
seancorfield17:03:42

Q about maintaining backward compatibility: if I have some protocol that declares three functions and may have additional user-supplied implementations out in the wild, and I add a fourth function to that protocol (and implement it in my code), could that break users' existing code?

seancorfield17:03:07

My immediate reaction is that "changing a protocol" is breaking, since the protocol is part of the public API -- but if folks only provide a partial implementation and never call the unimplemented functions anyway, it wouldn't affect them?

ghadi17:03:04

"changing a protocol" is breaking -- this isn't quite precise does adding the new method break existing users? -> no, so it's not breaking

ghadi17:03:32

unless there is something about the way the unimplemented method is used -- it's tricky stuff

seancorfield17:03:39

(I'm agonizing over whether to extend next.jdbc/execute-batch! the same way execute! et al operate so you can pass a connectable or sourceable instead of just a PreparedStatement)

Alex Miller (Clojure team)18:03:23

is there an actual api doc anywhere for next.jdbc? not seeing it on the readme

seancorfield18:03:42

I guess I should add a more obvious link to the API docs then...

seancorfield18:03:43

(done; also link to older docs under old group ID)

ghadi17:03:55

ostensibly something is calling the unimplemented method

Kelsey Sorrels18:03:14

Might also depend on how much weight one puts behind Hyrum's Law

Alex Miller (Clojure team)18:03:32

so you're talking about adding -execute-batch to Executable and calling it from execute-batch! ?

Alex Miller (Clojure team)18:03:38

are there known implementors of Executable outside next.jdbc?

Alex Miller (Clojure team)18:03:18

a quick skim through http://grep.app leads me to believe the answer is no

seancorfield18:03:32

Yeah, that's the specific case here but I was also sort of asking in the more general case about whether expanding the list of functions in a protocol is accretive or breaking, and under which circumstances.

seancorfield18:03:10

(and thanks for checking http://grep.app for implementations -- I hadn't gotten far enough down the path to do that work)

Alex Miller (Clojure team)18:03:21

I would have to think harder about whether this is a breaking change in the face of compilation and all the scenarios there but from a source pov, I think you are only going to call the new method if someone passes you a connectable/sourceable to execute-batch! (and those weren't possible before other than PreparedStatement, which presumably you will fix up)

seancorfield18:03:38

Yes, the intention is to try to do this in a purely accretive way -- so new uses become possible, without breaking any existing uses.

Alex Miller (Clojure team)18:03:12

in general, adding a method to an interface in Java is NOT considered a change that breaks binary compatibility

Alex Miller (Clojure team)18:03:32

(mentioning b/c protocols have a backing Java interface ofc)

Alex Miller (Clojure team)18:03:23

given that protocols already handle the case of missing protocol methods, I think from a Clojure POV, it would just be seen in the same way

Alex Miller (Clojure team)18:03:09

so I want to say - this is not a breaking change in that sense

seancorfield18:03:20

Thanks, Alex. That's good feedback and justification, and makes me feel more comfortable about it.

Alex Miller (Clojure team)18:03:17

it's hard for me to see any case in what you describe where something that used to work now does not work. merely that new additional things may need to be added to make external extensions work with the new api

Alex Miller (Clojure team)18:03:37

and I don't even see any cases of that in the wild

3
kah0ona18:03:49

Hello folks, just a quick question. I have a ring handler that does some long I/O-operation. I want it to start the operation, and then to return immediately with a URL, which a client can poll for to see progress. The progress is stored in an atom. Now I simply wrap the long operation in a future, but is that idiomatic? If I never deref the future, would that be a problem, ie. would that mean that eventhough the future goes out of scope, it would still linger/run somewhere?

hiredman18:03:22

a future is another thread, it has a life of its own once started regardless of being deref'ed or not

kah0ona18:03:49

ok, so if that doseq that’s in there finishes, the thread will be destroyed?

hiredman18:03:35

(a future actually runs in a threadpool, so in theory instead of being destroyed the thread may run another future)

kah0ona18:03:45

what if like 8 requests or so come in at roughly the same time

hiredman18:03:02

multithreading man

hiredman18:03:05

its a thing

kah0ona18:03:21

I know haha 🙂 ok

kah0ona18:03:39

i was just wondering if it could fill up the threadpool

hiredman18:03:53

in general it is not good practice to just fire off futures willy nilly, because usually you will end up wanting more control over their execution, but if you don't want more control then there is nothing wrong with it

kah0ona19:03:29

I’ll use the well-let’s-refactor-if-it-goes-boom approach then

kah0ona19:03:20

(ie. my definition of pragmatic ;-))

borkdude19:03:52

@kah0ona alternatively you could put the work on a queue and have a worker that constantly consumes work from that queue

borkdude19:03:16

but I've used the "`future` and forget" approach myself a lot as well ;)

kah0ona19:03:07

yeah, that’ll be the approach i’ll take when everything has screeched to a halt, if that ever happens 🙂 but it’s a quite infrequently used endpoint, so i’m winging it

emccue20:03:20

it can definitely fill up the threadpool - that is a thing

emccue20:03:37

but the abstraction is still what you want

phronmophobic20:03:10

there's also the https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executor.html interface from the JVM with a few options that provide more control

Alex Miller (Clojure team)20:03:46

you cannot "fill up" the future threadpool, it will grow without limit

👍 3
Alex Miller (Clojure team)20:03:02

(and shrink automatically)