Fork me on GitHub
#clojure
<
2023-02-05
>
anovick10:02:01

Could somebody help explain to me the differences between these libraries? • Schema • Spec • Spec 2 alpha • Malli

Ben Sless11:02:00

That's a bit of a long answer They all let you do validation Where they differ is semantics, performance, and additional capabilities Support: Spec is built in with Clojure Syntax: Schema's syntax results in schemas looking a lot like the data they reflect Spec is very macro/predicate heavy and looks close to bnf in s expressions Malli: hiccup style data DSL, but added support for forms more similar to Schema Semantics: Notable exception for malli where schemas are data and can be persisted and manipulated Registry semantics: Schema: n/a Spec: global Malli: flexible, can have it whichever way you want Performance: Malli is fastest if that's important for you Coercion: Built in for malli Besides that, malli has plenty of additional modules and capabilities built in

anovick18:02:13

Probably the phrasing of the question is a bit misleading. After re-watching the "Spec-ulation" talk, I surmise that the design decisions of Spec and Spec-2 serve the purpose of a wider problem than just "data validation", the problem of communicating dependencies for the code ecosystem.

seancorfield18:02:36

Does https://corfield.org/blog/2019/09/13/using-spec/ help? It talks about how we use Spec in production (and dev/test) at World Singles Networks.

anovick22:02:26

@U04V70XH6 That's helpful, thanks for sharing, helps to look at examples 🙂 Would you be able to compare it to the Malli library, in case you've had the chance to use it?

seancorfield22:02:06

I haven't used Malli. I used Schema a bit, before Spec existed. I generally prefer to use things the core team provide and maintain, where I can.

hifumi12301:02:28

Out of curiosity, what library are you using for data coercion? I like clojure.spec a lot for data validation, but the core team makes it very clear that it is not intended to be used for data coercion whatsoever, so libraries like spec-tools are technically not something one should use (or if one does use it, don’t expect stability). For this reason, I’ve been slowly phasing out clojure.spec with malli in my codebase, since it offers both validation and coercion (but still maintains them as separate processes)

seancorfield01:02:40

We have actually used conformers in Spec for some data coercion in the past but we're slowly moving away from that and have been experimenting with https://github.com/exoscale/coax and I like that approach.

👍 2
jrychter09:02:15

FWIW, I use spec, mostly because I try to avoid external dependencies as much as I can, though I have to admit after several years that my expectations for spec development were slightly higher. For coercion, I use a rather simple piece of code that I got from someone on the internet (no longer remember where). I adapted it to use multiple registries, so that I can coerce things to model form and to database form. This works very well for me.

jrychter09:02:29

In the past, I made the mistake of thinking that conformers are intended to be used for data coercion. They are definitely not. In general, I only use the validation part of spec.

steveb8n17:02:27

Q: I need to find a way to efficiently create a unique hash for large objects (some clj(s) data structures, JSON data and byte arrays) to create a referentially transparent caching mechanism for expensive fns using these objects as args. Looking for ideas in thread….

steveb8n18:02:40

for CLJ and JSON, a naive technique would be to convert to string and hash the string

steveb8n18:02:00

but I need this to work in CLJ and CLJS and produce the same hash

steveb8n18:02:09

same for byte arrays

steveb8n18:02:24

any suggestions much appreciated

zimablue18:02:57

Hashp rings a bell as a library option

p-himik18:02:14

A hash can't be unique by definition. How exactly to tackle it depends on what you need that hash for. Also, just in case - byte arrays and JSON data are mutable. But if you hash something like that, and then mutate it, it won't correspond to the hash anymore.

steveb8n18:02:48

true, I can tolerate some (rare) collisions

steveb8n18:02:07

I can guarantee the JSON is immutable

steveb8n18:02:05

I cannot use identical? for clj because values are (de)serialized from earlier copies and that would only work for clj anyway

steveb8n18:02:26

while looking for the hashp lib I found this https://github.com/replikativ/hasch which might work

zimablue18:02:55

That might be what I'm thinking of sorry, it's used in datahike I think

steveb8n18:02:33

yeah, they mention that. thanks. I’ll try it

steveb8n18:02:28

it looks like exactly what I need (with expected cljs perf issues) so I can benchmark. much appreciated

zimablue18:02:02

• If performance is a significant issue, and these cached functions need this entire JSON object that might vary slightly at leaves, one alternative is to generate a unique id per object on the creation side? Then you get false disequality for re-generated identical json objects but much improved speed, that's what I think clj objects do under the hood and equivalent to python id() function

steveb8n18:02:11

I could see some kind of Merkle tree down the road for this when seeking perf

zimablue18:02:45

Immutable.js and clojure data structures must both have this implemented efficiently on production side somehow

steveb8n18:02:53

agreed, must be there for any hashed data structure but I doubt the impls for jvm vs node produce the same hash values

delaguardo08:02:22

https://github.com/DotFox/jsonista.jcs i recently made an extension for clojure jsonista library that produces json the same as JavaScript.

zimablue18:02:04

Generated symbol ids get so long, I wish they were per prefix, and do they reset per compile? Or I wish there was some single-pass post-macroexpand transform one could use to reset them to minimal values

zimablue18:02:35

It sounds completely mad but when reading complex macro generated code with lots of let blocks it does become significant for me

p-himik18:02:19

Perhaps something like rewrite-clj can be used to rename symbols in a macroexpansion result to make them more readable.

ghadi18:02:43

An example would help understanding the issue in context

p-himik19:02:58

A plain (macroexpand '(clojure.core.async/go)) will have a lot of symbols with 4-digit numbers in them. old-frame__6905__auto__ is a bit hard to grasp with a quick look.

Ben Lieberman20:02:55

So Java IO is not my strong suit, but I ran into an issue that surprised me. Using AWS API, made a successful API call. Wanted to write the resulting InputStream to a file, but I kept getting IOException: Stream closed using . I spent more than I care to admit on this but it turns out all I had to do was make another API call and then immediately try writing the stream? Java's streams don't "time out" do they? Does it get GC'ed after a certain time?

phronmophobic20:02:45

Generally, I think the stream you get back is based on the API's network connection. Your aws library probably has a timeout, but even if it didn't, the API endpoint almost certainly has a timeout.

Ben Lieberman20:02:47

ohh, right I guess streams are lazy? and if the socket gets closed the stream is gone?

👍 2
phronmophobic20:02:17

I think the reality is that there is no "stream". There are packets and buffers. If you don't read from the buffer (ie. "stream"), then the packets are either never sent or will stop being sent.

lukasz20:02:05

You also might get this error if you read the stream (for example by using slurp or depending on your setup just printing it in the REPL) and try to read it again

didibus02:02:42

I believe streams are read ounce, similar to an iterator. And also, once they're closed by calling .close on them, they cannot be used anymore.

emilaasa09:02:33

You could try doing the API call in a with-open form, which should force you to handle the streams correctly.