clojure 2021-02-15 | Slack Archive

Quest18:02:18

My team would like to standardize on a Clojure library for API and/or function validation. • Clojure.spec can be used for these purposes but we've found difficulties in usage that make it unappealing for our org. • We're heavily considering Malli as our standardization target. I'm responsible for demo'ing this and other possibilities soon. Would anyone suggest other libraries or things to consider during this task?

Quest18:02:36

I'm aware of Prismatic Schema but have never used it myself. We were previously having some success with Orchestra & Spec Tools, but we've generally seen low adoption into Clojure.spec.

borkdude18:02:23

I'm hoping spec2 will come out some time, I think the alpha status of spec1 and spec2 slows down momentum

borkdude18:02:45

Malli seems to be actively developed and embraced

Quest18:02:11

That was our take on looking at the library. We're a larger enterprise and must be a little more risk adverse, but Malli seems very reasonable

dharrigan18:02:06

I've been using malli for several of my projects (in tandem with Reitit) and it works very well imho.

👍 3

Quest18:02:46

In my own personal work, I've done some really hacky work on clojure.spec to get runtime specs working. I'm also really looking forward to spec2, but my fingers have been crossed for a long time there 🙂

dpsutton18:02:03

Cognitect is committed to spec as part of the library. It’s inconceivable that spec won’t be supported. That’s not necessarily the case with malli

borkdude18:02:05

Even Schema has been maintained while the company behind it pivoted.

Quest18:02:11

Good point, though I don't believe I'd have success trying to force adoption of spec1. I'll need to come to a conclusion by end of month, and I haven't checked but I'm assuming spec2 is still a ways out

borkdude18:02:26

As long as the community has a big interest in it, it will probably be maintained

borkdude18:02:48

And Metosin has a good track record I'd say

Quest18:02:09

Yeah, I'll ➕1 that. Most of our team was pleasantly surprised to see Metosin was the publisher

ikitommi19:02:08

If you decide go with Malli, there is #malli to get help with your demo. Also, 0.3.0 around the corner (parsers, function & sequence schemas).

🎉 6

👍 3

vemv22:02:34

https://github.com/nedap/speced.def takes bit of a Schema-like approach (i.e. no instrumentation, using what essentially boils down to a pre/postcondition system) while using Spec1 as its backing validation system. The magic being that Spec1 is just an impl detail. I have a branch where I replaced it with Schema (because it's relevant for my day job). I could do the same for Spec2, Malli, etc. It's bit of an underdog but also a completely unique, robust proposition when considered carefully

👍 3

Quest00:02:18

Huh, we were looking for something with instrumentation built in but this is exactly the sort of smaller library I was hoping someone would bring up. Thank you @U45T93RA6 & everyone else who contributed here 🙇

🙂 3

Leonid Korogodski20:02:05

Does there exist a ready Clojure solution for the following? I want to zip several very large files while streaming the result on the fly to a consumer. The ZIP compression algorithm allows sending chunks of the output even before the entire zip archives is complete, and I don't want to hold the entire thing in memory at any given time. Obviously, Java's ZipOutputStream is unsuitable for that. Alpakka has a solution for this problem in Java and Scala: https://doc.akka.io/docs/alpakka/current/file.html#zip-archive However, while I can certainly call Alpakka from Clojure, I don't want to drag in the dependency and Akka and have to initialize its actor systems just for this little thing. Any suggestions?

noisesmith20:02:14

what makes you think ZipOutputStream requires holding the entire output in memory?

noisesmith20:02:35

surely if the target it writes to isn't in memory, it doesn't need to hold the output in memory

Leonid Korogodski20:02:58

Well, you cannot stream the results from its buffers as an input stream until it's done.

noisesmith20:02:22

oh, that's counterintuitive, I'll read up, pardon my ignorance

Leonid Korogodski20:02:51

Unless you can and it's me who's ignorant, of course. 🙂

Leonid Korogodski20:02:10

I'd love to be proven wrong.

noisesmith20:02:37

it takes an outputstream as a constructor argument (that part I did expect)

Leonid Korogodski20:02:03

I want an input stream constructed out of several input streams, which simply zips their contents.

noisesmith20:02:21

an input stream is something you read from :D

noisesmith20:02:35

(simple typo I'm sure)

Leonid Korogodski20:02:41

Yup. So I could read the zipped contents from even before the entire files are done being zipped.

Leonid Korogodski20:02:34

Here's how Alpakka does it in Java:

Source<ByteString, NotUsed> source1 = ...
Source<ByteString, NotUsed> source2 = ...

Pair<ArchiveMetadata, Source<ByteString, NotUsed>> pair1 =
    Pair.create(ArchiveMetadata.create("akka_full_color.svg"), source1);
Pair<ArchiveMetadata, Source<ByteString, NotUsed>> pair2 =
    Pair.create(ArchiveMetadata.create("akka_icon_reverse.svg"), source2);

Source<Pair<ArchiveMetadata, Source<ByteString, NotUsed>>, NotUsed> source =
    Source.from(Arrays.asList(pair1, pair2));

Sink<ByteString, CompletionStage<IOResult>> fileSink = FileIO.toPath(Paths.get("logo.zip"));
CompletionStage<IOResult> ioResult = source.via(Archive.zip()).runWith(fileSink, mat);

noisesmith20:02:34

oh - so you want data -> zipper -> (dupe) -> output, where the dupe step creates something you can read

noisesmith20:02:05

that's a limit algorithmically - with that kind of buffering you always need to hold onto the unread backlog, no matter what you do

noisesmith20:02:12

(unless I still misunderstand)

Leonid Korogodski20:02:32

Yeah, kind of. If you look at how ZipOutputStream works, it reads a chunk at a time, deflates it, adds it to a buffer, updates the checksum, gets another chunk, etc. In the meantime, what's already processed can already be send downstream. That's what I want.

noisesmith20:02:34

of course the (dupe) step can be implemented so it only holds backlog and doesn't keep read state

noisesmith20:02:56

deflates?

Leonid Korogodski20:02:07

That's their terminology for compression.

dpsutton20:02:56

> The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common.

Leonid Korogodski20:02:20

So, something like this: Input: Files A.a, B.b, C.c. Zipped state: filename -> send right away downstream. first chunk -> compress -> send downsteam. second chunk -> compress -> send downstream. filename -> ...

Leonid Korogodski20:02:40

Don't wait to send the entire thing downstream when everything is complete.

Leonid Korogodski20:02:02

The files are too big for that.

noisesmith20:02:11

it seems like the composition of output streams should allow this, if you use an IO based stream and don't use in-memory if I understand your sitution, the gotcha is that you want to be able to "tee" the output, you need something with sensible buffering behavior (if you control both producer and consumer you can make sure the buffer never goes too large in simple use cases) more generally you can get into backpressure and document that the stream will stall if the secondary consumer doesn't read fast enough or use a dropping buffer and document that the secondary reader will not see all the data if it reads too slowly

noisesmith20:02:38

this is something that can be done with outputstreams

Leonid Korogodski20:02:55

Well, I can get to solving the backpressure problem later. To begin with, what do you mean by "composition of output streams" in this particular scenario?

noisesmith20:02:21

what I mean is that by chaining streams you can have two destinations for the zipped data - one you can read back in process, and one that gets sent to IO

noisesmith20:02:36

if you don't need two destinations, your problem is simpler than I thought

noisesmith20:02:03

I might have misunderstood your usage of the term "input stream" relating to the zipped data

noisesmith20:02:55

based on your step by step, all you need is for the output stream you supply to ZipOutputStream to be an IO stream and not an in-memory buffer

Leonid Korogodski20:02:58

Yes, I want something like ZipOutputStream (in the sense of building up a zipped data in its buffers) that can also be queried for available zipped data and read from--to wit, that also functions as an input stream.

Leonid Korogodski20:02:35

I suppose that channels can be used for that, no?

noisesmith20:02:02

channels?

Leonid Korogodski20:02:25

core.async

noisesmith20:02:48

this is totally unrelated to core.async, that will just make your problem more complicated

noisesmith20:02:12

you can make an OutputStream that writes data to two other OutputStreams

phronmophobic20:02:18

you can use PipedInputStream and PipedOutputStream to turn your ZipOutputstream into an InputStream

💯 3

noisesmith20:02:37

you can make an OuputStream that sends to an InputStream

Leonid Korogodski20:02:38

Thanks. I'll take a look at the piped streams.

phronmophobic20:02:07

the big caveat with the Piped*Streams is that you need two threads, whereas you might be able to get away with a single thread

noisesmith20:02:32

@lkorogodski essentially, even without the Piped stuff, you can reify http://java.io.OutputStream and in the implementation simply delegate by writing to two other streams

noisesmith20:02:42

this only needs one thread

💯 3

👍 3

noisesmith20:02:54

(but will block if one reader blocks)

Leonid Korogodski20:02:43

Ok, that will work, I think.

noisesmith20:02:21

the reason I so glibly rejected core.async here is that the synchronization here is not actually complex and the mechanics of putting the problem into core.async leads to potential pitfalls (blocking IO should not be in go blocks for example)

noisesmith20:02:59

and the core.async solution wouldn't simplify the management of the data consumption, just displace the complexity

dpsutton20:02:13

also, solve one problem at a time

Leonid Korogodski21:02:13

Thanks!

Leonid Korogodski21:02:36

Yeah, backing a ZipOutputStream by an OutputStream that uses a CircularByteBuffer would do the trick, as the circular buffer has an associated input stream, too.

2021-02-15

Channels