Fork me on GitHub
#clojure
<
2020-04-05
>
didibus05:04:24

It seems like the transducer for filter when applied with sequence isn't as lazy as the lazy variant of filter:

(defn printer
  [xf]
  (fn
    ([] (xf))
    ([result] (xf result))
    ([result input]
     (print input "")
     (xf result input))))

(def s
  (sequence
   (comp (filter odd?)
         printer)
   (range 100)))

(def l
  (->> (range 100)
       (filter odd?)
       (map #(do (print % "") %))))

(take 1 s)
;; Prints: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65
(take 1 l)
;; Prints: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

didibus05:04:32

Anyone know why?

didibus05:04:59

This isn't the case for all transducers, for example:

(def s
  (sequence
   (comp (map inc)
         printer)
   (range 100)))
(take 1 s)
;; Prints: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

phronmophobic05:04:29

it seems like the chunk size the is the same

(defn printer
  [xf]
  (fn
    ([] (xf))
    ([result] (xf result))
    ([result input]
     (print input "")
     (xf result input))))
(def s
  (sequence
   (comp (filter (constantly true))
         printer)
   (range 100)))
(def l
  (->> (range 100)
       (filter (constantly true))
       (map #(do (print % "") %))))
(take 1 s)
;; prints 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
(take 1 l)
;; prints 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
here, the only difference is filtering with (constantly true) vs odd?

didibus06:04:18

Hum, interesting

didibus06:04:20

That almost confuses me more

phronmophobic06:04:33

if you think that makes no sense, try:

(defn printer
  [xf]
  (fn
    ([] (xf))
    ([result] (xf result))
    ([result input]
     (print input "")
     (xf result input))))
(def pred #(> % 50))
(def s
  (sequence
   (comp (filter pred)
         printer)
   (range 100)))
(def l
  (->> (range 100)
       (filter pred)
       (map #(do (print % "") %))))
(take 1 s)
;; 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 
(take 1 l)
;; 51 52 53 54 55 56 57 58 59 60 61 62 63 

didibus06:04:04

Hum... actually, this might make me think of something. It seems that with transducer, it grabs 32 result elements, while with lazy-seq it will grab 32 elements from the input coll maybe?

phronmophobic06:04:38

I think the moral of the story is that relying on a certain amount of laziness is asking for trouble. I don’t think there are any guarantees and it may even change from one version of clojure to another

didibus06:04:28

Ya, I do know that, and I'm not too worried if 32 becomes 50 or 84 as long as the magnitude doesn't change, like going from 100 to 1000 would be more problematic

didibus06:04:37

Juts curious in this case

phronmophobic06:04:12

i think it’s like you were saying, when you’re doing (filter pred coll), filter itself is chunking on the input whereas when you use (filter pred) with sequence, then take is doing the chunking and it’s not even seeing some elements

phronmophobic06:04:58

(defn printer
  [xf]
  (fn
    ([] (xf))
    ([result] (xf result))
    ([result input]
     (print input "")
     (xf result input))))
(def pred1 #(zero? (mod % 2)))
(def pred2 #(zero? (mod % 4)))
(def pred3 #(zero? (mod % 8)))
(def s
  (sequence
   (comp (filter pred1)
         (filter pred2)
         (filter pred3)
         printer)
   (vec (range 200))))
(def l
  (->> (vec (range 200))
       (filter pred1)
       (filter pred2)
       (filter pred3)
       (map #(do (print % "") %))))
(take 1 s)
;; 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192 
(take 1 l)
;; 0 8 16 24 

phronmophobic06:04:13

so in this example, take is looking at 30 numbers and only 4 results. with the transducer, it’s still taking 30 numbers, but the filtering happens first so it’s taking 30 filtered numbers

didibus06:04:37

Ya, it seems to be the case

phronmophobic06:04:15

it’s actually one of the key features of transducers that they don’t box and unbox on every step. so when a transducer filters, “nothing happens”, but it can change the chunking behavior for lazy operations

didibus06:04:00

Well, thanks for helping, glad we figured it out

colinkahn16:04:41

What are the options for reading edn data in non-clojure languages? I thought this was what transit was for (specifically transit-js) but seems like it isn't. I guess it's because the edn format supports things that most languages don't have like keywords?

didibus16:04:59

Hum, it's more that Clojure isn't too popular yet and most other languages haven't adopted EDN really

didibus16:04:22

There are some to find in the wild, but I don't know of a list. Such as for python: https://github.com/swaroopch/edn_format

potetm16:04:26

This was one of the goals of transit. It’s encoded in JSON for reach.

potetm16:04:54

Well, actually, yeah EDN has nothing to do w/ transit.

potetm16:04:05

But you can use transit instead of EDN for easier reach.

potetm16:04:51

Even if a language doesn’t have a transit encoder, you can layer one on top of a JSON encoder relatively easily.

potetm16:04:27

afaik transit supports arbitrary data types (e.g. keywords and dates)

didibus17:04:38

Hum, I think some caveats about this advice needs to be taken.

didibus17:04:54

Transit isn't meant to be persisted if I remember correctly

didibus17:04:38

Its only a transport protocol for exchanging data between systems, which uses JSON under the hood (and some of its binary variants) I think

didibus17:04:47

But ya, its good if you want to pass data between a Clojure/Script app and other apps in different language, since it uses JSON based protocols, most languages will have a fast and good parser for it

didibus17:04:29

But for say storing data in your DB, or exchanging documents I don't think it is as good

didibus17:04:54

> NOTE: Transit is intended primarily as a wire protocol for transferring data between applications. If storing Transit data durably, readers and writers are expected to use the same version of Transit and you are responsible for migrating/transforming/re-storing that data when and if the transit format changes.

didibus17:04:33

> The design of Transit is focused on program-to-program communication, as opposed to human readability. While it does support an explicit verbose mode for representing Transit elements in JSON (called JSON-Verbose), Transit is not targeted for situations where human readability is paramount.

didibus17:04:02

To me those two things are the main difference with EDN

didibus17:04:05

EDN is more of an alternative for JSON, where transit seems to target more MessagePack (which it uses), protobuf, ion, avro, thrift, etc.

didibus17:04:50

Where as I see EDN more like JSON and XML

potetm17:04:31

He asked about transit 😄

potetm17:04:50

Not a lot of info on the use case.

potetm18:04:44

But yeah, transit specifically says that it should not be used for persisted data because the spec might change. https://github.com/cognitect/transit-format#implementations

colinkahn18:04:40

From reading this it seems like transit, since it is it's own format and can be extended with custom tags, is probably a better option than using a specific languages edn parser?

didibus18:04:45

It depends what you want to do

didibus18:04:11

If you want to communicate from a Clojure/Script app to an app in a different language, I'd say Transit is better than trying to use EDN yes.

didibus18:04:18

If you want to say export some data form a Clojure/Script app which can be imported in some other app in some other language. EDN might be an option, though the lack of good support in other languages might mean you'd be better off using XML or JSON, or CSV, etc.

potetm18:04:58

@U0CLLU3QT EDN can be extended with custom tags as well.

potetm18:04:47

The primary differences are: 1. Transit is layered on top of JSON for easier performant translation to other langs. 2. Transit has compression built-in.

potetm18:04:27

and 3. Transit is still not finalized (and I’m not sure it ever will be).

didibus22:04:27

#3 is true of EDN too I guess :man-shrugging:

didibus22:04:02

There's also Nippy and Fressian to consider

mkvlr06:04:26

transit is very stable, it’s absolutely fine to use it for durable storage imo

mkvlr06:04:01

> CBOR is a binary encoding with the goal of small code size, compact messages, and extensibility without the need for version negotiation. This makes it a good alternative to https://github.com/edn-format/edn for storing and transmitting Clojure data in a more compact form.

didibus07:04:02

CBOR and MessagePack are very similar in all aspect. Both good choices.

didibus07:04:09

I still think it depends when you store data durably, binary formats are always at a disadvantage in my opinion, so I'd favour XML, CSV, JSON or EDN personally unless you really can't deal with the size. Especially for archival.

didibus07:04:28

Because a binary format isn't self explanatory, you need to have support for a deserializer for it, which if you don't, is harder to write yourself, and if you can't tell what binary format you're dealing with it becomes much harder to figure out what to use to decode it

didibus07:04:36

Whereas give JSON or EDN to someone, and without any knowledge they can probably figure out what they're dealing with and write a parser themselves

didibus07:04:34

I'm not sure why the Transit readme mentions that, but it does. Either because of its use of MessagePack as a binary format, or I'm guessing it reserves itself the right to change in backward incompatible ways in the future, for example changing from MessagePack to CBOR, or making changes to the protocol.

didibus07:04:27

Dunno, but ya, I guess in practice it hasn't had a backward breaking change and newer versions can still read old versions, so there's that, just like many things in Clojure, stable and working but not explicitly committed

potetm13:04:48

@U5H74UNSF No. When Rich et al say something isn’t finalized and is subject to change, they mean it. Just because you “feel” like it’s stable, doesn’t make it so.

potetm13:04:37

If you’re willing to either take the outright risk, or if you’re willing to organize version changes (via e.g. mass migrations), or you’re willing to potentially never upgrade: then yes. Use it for durable storage.

potetm13:04:53

But don’t pretend like it will be cost-free.

potetm13:04:05

To be clear: The tradeoffs here are not very subtle. I just listed them all out, and they’re manageable. But transit is not “very stable,” and using it for durable storage does come with those tradeoffs.

mkvlr13:04:36

only time will tell

mkvlr13:04:26

transit hasn’t seen a format breaking change in years, that’s pretty stable to me

potetm13:04:28

You’re opting for the “take the risk” option. Might pan out. Might not.

mkvlr13:04:43

> The Transit format has thus far had only one version (0.8) and has not changed in several years.

mkvlr13:04:02

read the commit message from Alex

potetm13:04:11

Risk is not bad, but IMO not really worth it in this example (when you can presumably use EDN).

potetm13:04:26

I mean, I think Alex’s note just clarifies the risks. It doesn’t remove any risk whatsoever.

mkvlr14:04:48

I bet we won’t see a breaking change to transit in the next decade

mkvlr14:04:32

I’m even willing to give you 2:1 odds on that bet 😼

potetm14:04:14

I suspect there’s a reason Alex’s commit does not bump it to 1.0. I strongly suspect they have an idea that might cause a breaking change.

potetm14:04:27

I have no idea if it’ll happen in the next decade or not.

potetm14:04:42

But I also see zero reasons to take that bet 😄

potetm14:04:50

When you can, for zero risk, use EDN

teodorlu17:04:38

I've got a Java object I want to explore with the REPL. I haven't worked that much with Java from Clojure. Are there any nice functions for REPL explorations I could use? In Clojure, I'd reach for doc and dir. Thanks!

teodorlu06:04:38

Thanks! I totally forgot about the REPL guide. I appreciate the qualitative discussion. If I remember correctly, you had a hand in writing that guide?

val_waeselynck11:04:13

Yes I did (but wasn't trying to do any self-promotion)

teodorlu12:04:18

Then I'll use the chance to say thanks. It reads well, and is precise!

😊 4
seancorfield17:04:41

@teodorlu Depending on exactly what you mean by "explore", you might find org.clojure/java.data useful as a library.

seancorfield17:04:39

Or there's bean built-in (but it does a lot less -- see the comparison in the README https://github.com/clojure/java.data#feature-comparison-to-clojurecorebean )

💯 4
manutter5117:04:51

There's also javadoc in the REPL.

👍 4
💯 4
teodorlu17:04:29

Thanks! (javadoc my-object) managed to take me to a google search which found the class. Not sure if that was intended behavior, but it sure worked. Thanks! (didn't find javadoc at first, was looking just in the clojure.repl namespace)

teodorlu17:04:02

> exactly what you mean Eg I'd like to know that I could have called .getType and .getValue on a PGobject I got

teodorlu17:04:14

Thanks for the replies, I'll have a look at both 🙂

seancorfield17:04:18

Ah, so you want reflection...

4
teodorlu17:04:39

The combination of javadoc, reflect and bean was perfect -- the former for static documentation and the latter for dynamic exploration. Thanks! Full namespaces for use from an editor (and don't have clojure.repl and such preloaded):

(def e (return-some-java-obj))

;; Look for static docs, might redirect to Google search for java class
(clojure.java.javadoc/javadoc e)

;; Dynamically explore all the info we've got in a data structure
(clojure.reflect/reflect e)

;; Dynamically explore a "tight" data model of e
(clojure.core/bean e)

em20:04:29

I'm trying to understand the details of ref's :max-history and :min-history options. It makes sense that dosync transactions retry on alter for refs that have changed in value in the meantime, so the only use I can think of for these options is on deref, and not commit (since it retries given any change). I'm not sure how the STM/MVCC is implemented behind the scenes, but I'm imagining a timestamp-like value recorded at the beginning of the transaction. Because you can't really know (at least I think this is true) what refs will be accessed during the transaction (let's say you have some long computation before derefing the ref), I'm guessing the history queue is there to provide a sliding buffer for values of refs in case they're being modified in another thread. If the timestamp doesn't exist in the scope of the queue by the time the deref happens inside a dosync, the transaction retries. I'm guessing that these types of read-only retries increments the history queue by 1, and that's why the :min-history option exists to provision a queue upfront. Am I on the right track? The docstring for ref is a little terse and this is the best interpretation I could come up with

cpmcdaniel22:04:57

I must be missing something... is there a way to set the version in pom.xml as generated by clj -Spom ?

seancorfield22:04:42

No. I added options to clj-new that let you control more things when it generates the initial pom.xml file (but that's separate from clj -Spom)

cpmcdaniel22:04:59

OK, so it's a manual process then. I mainly worry about keeping dependencies up-to-date in large projects, so regenerating the pom and setting the project version prior to publishing the jar is the way to go, I presume.

seancorfield23:04:06

clj -Spom updates just the dependencies. It doesn't touch the rest of the file.

seancorfield23:04:29

@cpmcdaniel If you intend to deploy a JAR to Clojars and, especially, if you plan to use http://cljdoc.org for that library, you need quite a bit more in the pom.xml file than clj -Spom creates initially. That's why clj-new creates a full-featured pom.xml file.

seancorfield23:04:33

(but, yeah, then you need to manually update the version and the tag elements for each new release, and you need to run clj -Spom to update pom.xml if you change the project's dependencies)