clj-commons

grzm 2022-09-13T21:25:58.975029Z

‘lo all. Looks like there’s a regression in clj-yaml 0.7.109 and 0.7.110 (current). Number-like strings aren’t quoted:

% clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.108"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
'083'
{x: '083'}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.109"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}
 % clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.110"}}}' -M -e "(require '[clj-yaml.core :as yaml]) (doseq [x [\"083\" {:x \"083\"}]] (print (yaml/generate-string x)))"
083
{x: 083}

grzm 2022-09-14T11:46:26.916109Z

Well, that was fun.

borkdude 2022-09-14T11:47:48.161449Z

fucking hell

grzm 2022-09-14T11:48:13.249989Z

You read my mind.

grzm 2022-09-14T11:50:14.290179Z

A couple of things: • I'm wondering if it's worth my time trying to parse the Yaml 1.1 spec. • I'm wondering if we should look at using snakeyaml-engine, which is supposed to be Yaml 1.2 compliant, and what the Yaml 1.2 spec says about this case (and what other surprises await there).

grzm 2022-09-14T11:50:35.842349Z

(I guess that's four things, not just a couple)

borkdude 2022-09-14T11:53:45.670119Z

Another thing: • Do the custom thing where we preserve behavior of pre 1.30

borkdude 2022-09-14T11:54:23.630299Z

I'd be fine with checking out snakeyaml-engine but there might be other breaking changes we'd introduce. Perhaps clj-yaml 2.0 then

grzm 2022-09-14T11:54:32.996029Z

If clj-yaml chooses to keep snakeyaml with its Yaml 1.1 "compliance", whether clj-yaml should use a custom resolver to "patch" this regression. For my particular use case with babashka, I don't think I can provide a custom resolver in the script itself: those are Java classes,, I believe.

borkdude 2022-09-14T11:54:54.842779Z

yes, we could make that an option

borkdude 2022-09-14T11:55:06.239609Z

and I personally would always use that option

💯 1
borkdude 2022-09-14T11:55:54.440599Z

I don't think anyone is really interested in yaml 1.1 and yaml 1.2: just use a subset and be done with the fucking yaml

grzm 2022-09-14T11:56:06.574449Z

Yeah, snakeyaml-engine would likely mean a new lib. And then, the babashka case: include both? Replacement would likely mean other behavioral differences.

borkdude 2022-09-14T11:56:35.977409Z

I think it would be worth investigating the 2.0 option and see how many breakages there would be in practice. I will only include 1 yaml library

borkdude 2022-09-14T11:56:50.480139Z

I won't spend any more megabytes on this bullshit

borkdude 2022-09-14T11:57:12.806939Z

As you might have noticed, YAML really pisses me off every time

grzm 2022-09-14T11:57:46.495299Z

Yeah. I'm surprised both how these kinds of breaking changes are tolerated in many communities and how violently I now react against them.

borkdude 2022-09-14T11:58:44.959069Z

we could make a snakeyaml 2.0 pod as well or have the other one as a pod

borkdude 2022-09-14T11:58:51.532979Z

this is always an option

grzm 2022-09-14T11:58:59.892819Z

Me, too. By far my primary use case for yaml is working with AWS Cloudformation templates. A subset of that is Typescript CDK.

grzm 2022-09-14T12:02:28.803379Z

Re; Yaml 1.1, I think this is the controlling text (https://yaml.org/spec/1.1/#id865585): > Tag resolution is specific to the https://yaml.org/spec/1.1/#application/, hence a YAML https://yaml.org/spec/1.1/#processor/ should provide a mechanism allowing the https://yaml.org/spec/1.1/#application/ to specify the tag resolution rules. It is recommended that https://yaml.org/spec/1.1/#node/information%20model having the “`!`” non-specific tag should be resolved as “`tag:http://yaml.org,2002:seq`”, “`tag:http://yaml.org,2002:map`” or “`tag:http://yaml.org,2002:str`” depending on the https://yaml.org/spec/1.1/#node/information%20model. This convention allows the author of a YAML character https://yaml.org/spec/1.1/#stream/information%20model to exert some measure of control over the tag resolution process. By explicitly specifying a https://yaml.org/spec/1.1/#plain%20style/information%20model has the “`!`” non-specific tag, the https://yaml.org/spec/1.1/#node/information%20model is resolved as a string, as if it was https://yaml.org/spec/1.1/#quoted%20style/information%20model or written in a https://yaml.org/spec/1.1/#block%20style/information%20model. Note, however, that each https://yaml.org/spec/1.1/#application/ may override this behavior. For example, an https://yaml.org/spec/1.1/#application/ may automatically detect the type of programming language used in source code https://yaml.org/spec/1.1/#present/ as a non-https://yaml.org/spec/1.1/#plain%20style/information%20model https://yaml.org/spec/1.1/#scalar/information%20model and resolve it accordingly.

borkdude 2022-09-14T12:03:38.318919Z

I'm not sure what is relevant in this blob of text?

grzm 2022-09-14T12:05:26.181389Z

I can read that as saying "whether or not it's quoted, if it doesn't include a tag, it should be interpreted as a seq, map, or string, depending on its structure"

grzm 2022-09-14T12:09:07.690649Z

There's a lot of wiggle room in there, too.

grzm 2022-09-14T12:20:04.751759Z

This is not how I wanted to spend my day.

pithyless 2022-09-14T12:22:19.867719Z

Hi there! I was just driving by on #babashka and like an accident on the highway I could not look away from the unfolding chaos. Just wanted to send some virtual hugs your way @grzm and @borkdude - I've dealt with yaml issues before and they have scarred me for life! 🤗

❤️ 2
grzm 2022-09-14T13:34:49.286359Z

FWIW, my particular use case does appear to be a difference between YAML 1.1 and YAML 1.2. I’m emitting YAML via babashka/clj-yaml/snakeyaml (YAML 1.1) and reading it with js-yaml v4 (https://github.com/nodeca/js-yaml) which is a YAML 1.2 processor, which notes the change in behavior from v3 to v4 (https://github.com/nodeca/js-yaml/blob/ab31bba6b41f58390f431123ffec5031b986edf5/migrate_v3_to_v4.md#loading-in-v4-documents-previously-dumped-in-v3).

grzm 2022-09-14T13:36:34.332299Z

I haven’t found yet if v3 is specifically YAML 1.1.

grzm 2022-09-14T13:37:49.140259Z

Actually, it looks like v3 is supposed to be YAML 1.2 as well? https://github.com/nodeca/js-yaml/tree/v3

borkdude 2022-09-14T13:53:18.804929Z

have you tried json/generate-string? apparently it generates valid yaml 1.2 since json is a subset of yaml

grzm 2022-09-14T14:46:56.440509Z

Been thinking about that, and want to explore it more. While it is a subset of yaml, it does have different usability characteristics.

borkdude 2022-09-14T14:48:05.855809Z

true

grzm 2022-09-14T17:40:36.660849Z

Throwing this incomplete thought out there (considering things might be on the table): my core use-cases for YAML are • AWS CloudFormation templates (mostly reading) For reading AWS CloudFormation I’m currently using https://github.com/owainlewis/yaml in Clojure because of it’s yaml.reader/passthrough-constructor (https://github.com/owainlewis/yaml/blob/master/src/yaml/reader.clj#L11-L15) which allows me to easily handle AWS’s intrinsic (in particular, the short forms). https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html I’d love to see this in clj-commons/clj-yaml and babashka. • Docker Dockerfile (though I currently mostly manage these manually rather than programmatically) • JavaScript/TypeScript interop (haven’t found good EDN parser in JavaScript (not ClojureScript), though I haven’t looked extensively) I should probably use JSON here instead. This is the case where I’m having issues right now with the snakeyaml 1.29 to 1.30 behavior changes. • Kubernetes resource file generation (more in the future, and yes I’m aware there is some k8s file Clojure library out there)

borkdude 2022-09-14T18:50:14.302669Z

@grzm I think we should support passthrough constructor

borkdude 2022-09-14T18:50:20.069809Z

issue + PR welcome

borkdude 2022-09-14T18:51:13.972479Z

if you're doing JavaScript/TypeScript + YAML, you could also consider #nbb + a Node.js yaml lib

borkdude 2022-09-14T18:52:18.776079Z

I could also expose some of these YAML classes in bb. But I don't want to have both snakeyaml and snake-yaml-engine in bb because of the size

borkdude 2022-09-14T18:52:27.151509Z

I don't think those projects build on the same sources

borkdude 2022-09-14T18:52:56.463249Z

so continuing with the current lib is maybe best + some options to make it behave saner

grzm 2022-09-14T19:05:00.112899Z

> so continuing with the current lib is maybe best + some options to make it behave saner That’s my gut reaction, too.

grzm 2022-09-14T19:05:19.717489Z

That, and continue to minimize my YAML exposure. It’s like lead or mercury, right? It accumulates and becomes increasingly toxic?

borkdude 2022-09-14T19:06:45.589689Z

We should invent noml. Which is the same as EDN, but just with a hyped marketing site

😆 1
grzm 2022-09-15T03:32:56.358849Z

Well, here's something for PassthroughConstructor: https://github.com/clj-commons/clj-yaml/pull/38

grzm 2022-09-15T03:35:15.535149Z

Haven't really wrapped my head around the quoting issue I first raised.

grzm 2022-09-13T21:26:24.832219Z

I opened https://github.com/clj-commons/clj-yaml/issues/35 to track.

grzm 2022-09-13T21:26:50.092769Z

I suspect it’s in the upstream snakeyaml library, but haven’t confirmed.

borkdude 2022-09-13T21:27:55.874279Z

you could try to bump that in the newest to confirm?

grzm 2022-09-13T21:28:15.476679Z

I’ll give it a shot.

🙏 1
grzm 2022-09-13T21:30:30.645759Z

Looks like the current version of clj-yaml (0.7.10) uses the latest version of snakeyaml (1.32)

borkdude 2022-09-13T21:33:35.008409Z

maybe post an issue here? https://bitbucket.org/snakeyaml/snakeyaml/issues?status=new&status=open

dpsutton 2022-09-13T21:34:07.189609Z

i’m seeing the same behavior on 110 and 109. And the quoted behavior on 108

grzm 2022-09-13T21:36:12.589779Z

@dpsutton Thanks for confirming.

grzm 2022-09-13T21:36:32.891639Z

Looks like the regression was introduced between 1.29 and 1.30

borkdude 2022-09-13T21:40:52.832229Z

exactly

borkdude 2022-09-13T21:43:34.173629Z

maybe it's JavaScript semantics or so? 083 in a Node REPL is just 83

borkdude 2022-09-13T21:43:46.185449Z

or YAML spec weirdness

grzm 2022-09-13T21:45:47.190269Z

YAML spec weirdness, I suspect.

borkdude 2022-09-13T21:46:05.341249Z

not sure, online yaml converters do not just automatically change strings into numbers

grzm 2022-09-13T21:46:23.510539Z

(and whatever is going on with the node REPL is just being bad)

borkdude 2022-09-13T21:46:44.096509Z

I think the leading 0 being octal is just Java

grzm 2022-09-13T21:47:17.900699Z

The leading 0 is just a red-herring. It works (or rather doesn’t) with other numbers, too. Actually, maybe it’s not a red herring.

borkdude 2022-09-13T21:47:18.023779Z

I think it's worth posting an issue about in snakeyaml

grzm 2022-09-13T21:47:58.168539Z

I meant YAML spec weirdness in that it’s so lax that people are often surprised by it’s behavior and as a result screw up their parsers/generators.

grzm 2022-09-13T21:49:23.515999Z

My Java is so freakin’ weak. What’s the quickest way to make a repro for a Java library?

borkdude 2022-09-13T21:49:57.899589Z

I think your best bet is to look into the clj-commons library and just inline all the java interop into one blob

grzm 2022-09-13T21:50:57.777909Z

Oh, that I can do. What I’ll have trouble doing is building the durned thing 🙂

borkdude 2022-09-13T21:51:27.341539Z

building? oh right

borkdude 2022-09-13T21:51:46.016439Z

what I do:

javac --classpath $(clojure -Spath) Foo.java

borkdude 2022-09-13T21:52:01.694569Z

and then java --classpath $(clojure -Spath) Foo.class

borkdude 2022-09-13T21:52:17.747389Z

nowadays java also supports running a .java file (since java 11)

grzm 2022-09-13T21:52:55.613689Z

Coolio. Yeah, that sounds good. I was thinking of adding a test to their suite.

borkdude 2022-09-13T22:05:50.940139Z

can you post a link in the clj-yaml issue when you've create one? going to sleep now, good luck!

👍 1
grzm 2022-09-13T22:17:25.344169Z

For reference:

% cat NumberLikeString.java                                                                                
package com.example;

import org.yaml.snakeyaml.Yaml;

class NumberLikeString {
    public static void main(String[] args) {
        String data = args[0];
        Yaml yaml = new Yaml();
        String output = yaml.dump(data);
        System.out.print(output);
    }
}
% java -classpath $HOME/.m2/repository/org/yaml/snakeyaml/1.29/snakeyaml-1.29.jar NumberLikeString.java 083
'083'
% java -classpath $HOME/.m2/repository/org/yaml/snakeyaml/1.30/snakeyaml-1.30.jar NumberLikeString.java 083
083
% java -classpath $HOME/.m2/repository/org/yaml/snakeyaml/1.32/snakeyaml-1.32.jar NumberLikeString.java 083
083
That wasn’t terrible. Thanks for the reminder.