Fork me on GitHub
#clj-commons
<
2022-09-13
>
grzm21:09:58

‘lo all. Looks like there’s a regression in clj-yaml 0.7.109 and 0.7.110 (current). Number-like strings aren’t quoted:

% clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.108"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
'083'
{x: '083'}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.109"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.110"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}

grzm21:09:50

I suspect it’s in the upstream snakeyaml library, but haven’t confirmed.

borkdude21:09:55

you could try to bump that in the newest to confirm?

grzm21:09:15

I’ll give it a shot.

🙏 1
grzm21:09:30

Looks like the current version of clj-yaml (0.7.10) uses the latest version of snakeyaml (1.32)

dpsutton21:09:07

i’m seeing the same behavior on 110 and 109. And the quoted behavior on 108

grzm21:09:12

@U11BV7MTK Thanks for confirming.

grzm21:09:32

Looks like the regression was introduced between 1.29 and 1.30

borkdude21:09:34

maybe it's JavaScript semantics or so? 083 in a Node REPL is just 83

borkdude21:09:46

or YAML spec weirdness

grzm21:09:47

YAML spec weirdness, I suspect.

borkdude21:09:05

not sure, online yaml converters do not just automatically change strings into numbers

grzm21:09:23

(and whatever is going on with the node REPL is just being bad)

borkdude21:09:44

I think the leading 0 being octal is just Java

grzm21:09:17

The leading 0 is just a red-herring. It works (or rather doesn’t) with other numbers, too. Actually, maybe it’s not a red herring.

borkdude21:09:18

I think it's worth posting an issue about in snakeyaml

grzm21:09:58

I meant YAML spec weirdness in that it’s so lax that people are often surprised by it’s behavior and as a result screw up their parsers/generators.

grzm21:09:23

My Java is so freakin’ weak. What’s the quickest way to make a repro for a Java library?

borkdude21:09:57

I think your best bet is to look into the clj-commons library and just inline all the java interop into one blob

grzm21:09:57

Oh, that I can do. What I’ll have trouble doing is building the durned thing 🙂

borkdude21:09:27

building? oh right

borkdude21:09:46

what I do:

javac --classpath $(clojure -Spath) Foo.java

borkdude21:09:01

and then java --classpath $(clojure -Spath) Foo.class

borkdude21:09:17

nowadays java also supports running a .java file (since java 11)

grzm21:09:55

Coolio. Yeah, that sounds good. I was thinking of adding a test to their suite.

borkdude22:09:50

can you post a link in the clj-yaml issue when you've create one? going to sleep now, good luck!

👍 1
grzm22:09:25

For reference:

% cat NumberLikeString.java                                                                                
package com.example;

import org.yaml.snakeyaml.Yaml;

class NumberLikeString {
    public static void main(String[] args) {
        String data = args[0];
        Yaml yaml = new Yaml();
        String output = yaml.dump(data);
        System.out.print(output);
    }
}
% java -classpath $HOME/.m2/repository/org/yaml/snakeyaml/1.29/snakeyaml-1.29.jar NumberLikeString.java 083
'083'
% java -classpath $HOME/.m2/repository/org/yaml/snakeyaml/1.30/snakeyaml-1.30.jar NumberLikeString.java 083
083
% java -classpath $HOME/.m2/repository/org/yaml/snakeyaml/1.32/snakeyaml-1.32.jar NumberLikeString.java 083
083
That wasn’t terrible. Thanks for the reminder.

grzm11:09:26

Well, that was fun.

borkdude11:09:48

fucking hell

grzm11:09:13

You read my mind.

grzm11:09:14

A couple of things: • I'm wondering if it's worth my time trying to parse the Yaml 1.1 spec. • I'm wondering if we should look at using snakeyaml-engine, which is supposed to be Yaml 1.2 compliant, and what the Yaml 1.2 spec says about this case (and what other surprises await there).

grzm11:09:35

(I guess that's four things, not just a couple)

borkdude11:09:45

Another thing: • Do the custom thing where we preserve behavior of pre 1.30

borkdude11:09:23

I'd be fine with checking out snakeyaml-engine but there might be other breaking changes we'd introduce. Perhaps clj-yaml 2.0 then

grzm11:09:32

If clj-yaml chooses to keep snakeyaml with its Yaml 1.1 "compliance", whether clj-yaml should use a custom resolver to "patch" this regression. For my particular use case with babashka, I don't think I can provide a custom resolver in the script itself: those are Java classes,, I believe.

borkdude11:09:54

yes, we could make that an option

borkdude11:09:06

and I personally would always use that option

💯 1
borkdude11:09:54

I don't think anyone is really interested in yaml 1.1 and yaml 1.2: just use a subset and be done with the fucking yaml

grzm11:09:06

Yeah, snakeyaml-engine would likely mean a new lib. And then, the babashka case: include both? Replacement would likely mean other behavioral differences.

borkdude11:09:35

I think it would be worth investigating the 2.0 option and see how many breakages there would be in practice. I will only include 1 yaml library

borkdude11:09:50

I won't spend any more megabytes on this bullshit

borkdude11:09:12

As you might have noticed, YAML really pisses me off every time

grzm11:09:46

Yeah. I'm surprised both how these kinds of breaking changes are tolerated in many communities and how violently I now react against them.

borkdude11:09:44

we could make a snakeyaml 2.0 pod as well or have the other one as a pod

borkdude11:09:51

this is always an option

grzm11:09:59

Me, too. By far my primary use case for yaml is working with AWS Cloudformation templates. A subset of that is Typescript CDK.

grzm12:09:28

Re; Yaml 1.1, I think this is the controlling text (https://yaml.org/spec/1.1/#id865585): > Tag resolution is specific to the https://yaml.org/spec/1.1/#application/, hence a YAML https://yaml.org/spec/1.1/#processor/ should provide a mechanism allowing the https://yaml.org/spec/1.1/#application/ to specify the tag resolution rules. It is recommended that https://yaml.org/spec/1.1/#node/information%20model having the “`!`” non-specific tag should be resolved as “`tag:http://yaml.org,2002:seq`”, “`tag:http://yaml.org,2002:map`” or “`tag:http://yaml.org,2002:str`” depending on the https://yaml.org/spec/1.1/#node/information%20model. This convention allows the author of a YAML character https://yaml.org/spec/1.1/#stream/information%20model to exert some measure of control over the tag resolution process. By explicitly specifying a https://yaml.org/spec/1.1/#plain%20style/information%20model has the “`!`” non-specific tag, the https://yaml.org/spec/1.1/#node/information%20model is resolved as a string, as if it was https://yaml.org/spec/1.1/#quoted%20style/information%20model or written in a https://yaml.org/spec/1.1/#block%20style/information%20model. Note, however, that each https://yaml.org/spec/1.1/#application/ may override this behavior. For example, an https://yaml.org/spec/1.1/#application/ may automatically detect the type of programming language used in source code https://yaml.org/spec/1.1/#present/ as a non-https://yaml.org/spec/1.1/#plain%20style/information%20model https://yaml.org/spec/1.1/#scalar/information%20model and resolve it accordingly.

borkdude12:09:38

I'm not sure what is relevant in this blob of text?

grzm12:09:26

I can read that as saying "whether or not it's quoted, if it doesn't include a tag, it should be interpreted as a seq, map, or string, depending on its structure"

grzm12:09:07

There's a lot of wiggle room in there, too.

grzm12:09:04

This is not how I wanted to spend my day.

pithyless12:09:19

Hi there! I was just driving by on #babashka and like an accident on the highway I could not look away from the unfolding chaos. Just wanted to send some virtual hugs your way @U0EHU1800 and @U04V15CAJ - I've dealt with yaml issues before and they have scarred me for life! :hugging_face:

❤️ 2
grzm13:09:49

FWIW, my particular use case does appear to be a difference between YAML 1.1 and YAML 1.2. I’m emitting YAML via babashka/clj-yaml/snakeyaml (YAML 1.1) and reading it with js-yaml v4 (https://github.com/nodeca/js-yaml) which is a YAML 1.2 processor, which notes the change in behavior from v3 to v4 (https://github.com/nodeca/js-yaml/blob/ab31bba6b41f58390f431123ffec5031b986edf5/migrate_v3_to_v4.md#loading-in-v4-documents-previously-dumped-in-v3).

grzm13:09:34

I haven’t found yet if v3 is specifically YAML 1.1.

grzm13:09:49

Actually, it looks like v3 is supposed to be YAML 1.2 as well? https://github.com/nodeca/js-yaml/tree/v3

borkdude13:09:18

have you tried json/generate-string? apparently it generates valid yaml 1.2 since json is a subset of yaml

grzm14:09:56

Been thinking about that, and want to explore it more. While it is a subset of yaml, it does have different usability characteristics.

grzm17:09:36

Throwing this incomplete thought out there (considering things might be on the table): my core use-cases for YAML are • AWS CloudFormation templates (mostly reading) For reading AWS CloudFormation I’m currently using https://github.com/owainlewis/yaml in Clojure because of it’s yaml.reader/passthrough-constructor (https://github.com/owainlewis/yaml/blob/master/src/yaml/reader.clj#L11-L15) which allows me to easily handle AWS’s intrinsic (in particular, the short forms). https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html I’d love to see this in clj-commons/clj-yaml and babashka. • Docker Dockerfile (though I currently mostly manage these manually rather than programmatically) • JavaScript/TypeScript interop (haven’t found good EDN parser in JavaScript (not ClojureScript), though I haven’t looked extensively) I should probably use JSON here instead. This is the case where I’m having issues right now with the snakeyaml 1.29 to 1.30 behavior changes. • Kubernetes resource file generation (more in the future, and yes I’m aware there is some k8s file Clojure library out there)

borkdude18:09:14

@U0EHU1800 I think we should support passthrough constructor

borkdude18:09:20

issue + PR welcome

borkdude18:09:13

if you're doing JavaScript/TypeScript + YAML, you could also consider #nbb + a Node.js yaml lib

borkdude18:09:18

I could also expose some of these YAML classes in bb. But I don't want to have both snakeyaml and snake-yaml-engine in bb because of the size

borkdude18:09:27

I don't think those projects build on the same sources

borkdude18:09:56

so continuing with the current lib is maybe best + some options to make it behave saner

grzm19:09:00

> so continuing with the current lib is maybe best + some options to make it behave saner That’s my gut reaction, too.

grzm19:09:19

That, and continue to minimize my YAML exposure. It’s like lead or mercury, right? It accumulates and becomes increasingly toxic?

borkdude19:09:45

We should invent noml. Which is the same as EDN, but just with a hyped marketing site

😆 1
grzm03:09:56

Well, here's something for PassthroughConstructor: https://github.com/clj-commons/clj-yaml/pull/38

grzm03:09:15

Haven't really wrapped my head around the quoting issue I first raised.

grzm17:09:36

Throwing this incomplete thought out there (considering things might be on the table): my core use-cases for YAML are • AWS CloudFormation templates (mostly reading) For reading AWS CloudFormation I’m currently using https://github.com/owainlewis/yaml in Clojure because of it’s yaml.reader/passthrough-constructor (https://github.com/owainlewis/yaml/blob/master/src/yaml/reader.clj#L11-L15) which allows me to easily handle AWS’s intrinsic (in particular, the short forms). https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html I’d love to see this in clj-commons/clj-yaml and babashka. • Docker Dockerfile (though I currently mostly manage these manually rather than programmatically) • JavaScript/TypeScript interop (haven’t found good EDN parser in JavaScript (not ClojureScript), though I haven’t looked extensively) I should probably use JSON here instead. This is the case where I’m having issues right now with the snakeyaml 1.29 to 1.30 behavior changes. • Kubernetes resource file generation (more in the future, and yes I’m aware there is some k8s file Clojure library out there)