yaml

lread 2023-09-11T19:01:59.716179Z

Ok @ingy, it is time this channel had its first questions! We are working on an issue in clj-yaml that touches on circular references in YAML. For example:

recursive:
  name: "A node"
  child: &ref_node
    name: "Child node"
    child: *ref_node
This YAML describes a child, with a child, with a child, with a child... and so on forever... This seems maybe a conceptually interesting thing to describe, but how would a YAML parser ever fully parse this? Would this be considered invalid YAML by some parsers? Is it common for YAML to have infinite recursion like this? If so, where and why?

Ingy döt Net 2023-09-11T19:15:27.857649Z

A bunch of things to say here 🙂

Ingy döt Net 2023-09-11T19:16:20.088009Z

Let's start by avoiding useless quotes 🙂

recursive:
  name: A node
  child: &ref_node
    name: Child node
    child: *ref_node

Ingy döt Net 2023-09-11T19:18:53.165029Z

I thinking you meant "loader" instead of "parser" here. In YAML parlance "load" means all the way from text to native. "parse" is one stage of "load" load = read->lex->parse->compose->resolve->contruct

👍 1
Ingy döt Net 2023-09-11T19:19:59.561139Z

parsing is the most difficult part to implement and the part we test the most. this is you yaml parsed by 17 implementations: https://play.yaml.io/main/parser?input=cmVjdXJzaXZlOgogIG5hbWU6IEEgbm9kZQogIGNoaWxkOiAmcmVmX25vZGUKICAgIG5hbWU6IENoaWxkIG5vZGUKICAgIGNoaWxkOiAqcmVmX25vZGU=

🆒 1
Ingy döt Net 2023-09-11T19:21:39.217079Z

wrt recursive aliases, your example is completely valid and any loader should handle it (assuming the data model it loads to supports references and cyclic referencing)

Ingy döt Net 2023-09-11T19:22:25.809329Z

YAML is a foremost a data serialization language. the opposite of load is "dump"

Ingy döt Net 2023-09-11T19:23:03.628029Z

you should be able to dump any clojure data structure and later load back to an equivalent

Ingy döt Net 2023-09-11T19:24:12.682359Z

this might involve configuring your dumper and loader to understand certain non-standard data structures.

Ingy döt Net 2023-09-11T19:24:38.093649Z

but yaml is intended for that purpose, in a programming language agnostic manner

Ingy döt Net 2023-09-11T19:26:56.447889Z

$ perl -MYAML -E '$a = []; push @$a, $a; say YAML::Dump $a'
--- &1
- *1
is a vector whose only element is a reference to itself

Ingy döt Net 2023-09-11T19:28:03.988889Z

$ perl -MYAML -E '$a = []; push @$a, $a; say Dump Load Dump $a'
--- &1
- *1
and that's an example of re loading the dump and then dumping again 🙂

Ingy döt Net 2023-09-11T19:28:23.241769Z

that's all I can think of to say

Ingy döt Net 2023-09-11T19:29:02.241179Z

@lee feel free to ask more questions. I'll even answer them in a thread 🙂

lread 2023-09-11T19:36:30.950019Z

Thanks @ingy! I'll take some time to digest that.

lread 2023-09-11T19:37:22.323249Z

It's your channel you can thread (or not) if you want to!

Ingy döt Net 2023-09-11T19:37:39.427299Z

I like threads, just forgot.

👍 1
lread 2023-09-11T20:30:26.467669Z

So, loading a circular YAML example like the one above should not, in theory, try to realize the circularity? And therefore does not lead to things like endless loops or stack overflows?

Ingy döt Net 2023-09-11T22:06:03.396169Z

Well I don't know. Can you make an immutable collection with a circular reference? Seems impossible to me...

Ingy döt Net 2023-09-11T22:07:13.849939Z

In most languages it is very easy for a loader to create a circular ref. It's basically a pointer.

lread 2023-09-11T22:44:51.187639Z

Right, I did start my career in C... so ya, if I were saving some YAML that described some data structure...

Ingy döt Net 2023-09-11T22:46:09.885149Z

In theory you could write out the entire state of a clojure thread, save it to disk and then reload it later

Ingy döt Net 2023-09-11T22:47:17.412259Z

I've used yaml to serialize the entire perl symbol table. The dump was 14,000 lines or so

Ingy döt Net 2023-09-11T22:48:22.441059Z

Do you agree with me that Clojure data structures are not capable of circular references?

Ingy döt Net 2023-09-11T22:49:14.527619Z

I haven't really thought about how laziness plays into the game

lread 2023-09-11T22:53:19.547609Z

I think Clojure's immutable data structures don't support circular references... but I've never attempted such a thing. I'd have to think about that.

Ingy döt Net 2023-09-11T22:54:14.868039Z

Well create a hashmap and then assoc a new key with that hash map

Ingy döt Net 2023-09-11T22:54:25.537119Z

You would get a different hash map

Ingy döt Net 2023-09-11T22:54:36.125949Z

So it's not really circular at all

lread 2023-09-11T22:55:17.514309Z

So the lazy seq was used in clj-yaml (we adopted it, history unknown), I think, to avoid instantly blowing up on circular references. You could at least load some of the YAML.

lread 2023-09-11T22:56:06.141069Z

But you might be able to contrive something. Not sure.

Ingy döt Net 2023-09-11T22:56:09.930069Z

Do you use lazy hash maps as well?

Ingy döt Net 2023-09-11T22:56:32.564159Z

Circular aliasing applies to both sequences and hash maps which are both collections

lread 2023-09-11T23:00:27.116239Z

Lazyness typically applies to sequences. But... clj-yaml does use https://github.com/clj-commons/ordered to preserve order of YAML... but don't think there is laziness there.

lread 2023-09-11T23:09:15.725049Z

Yeah, I think standard Clojure data structures, by design, don't support circular references....

lread 2023-09-11T23:10:12.697529Z

But that doesn't mean you couldn't create an abstraction that describes the circular reference.

Ingy döt Net 2023-09-11T23:39:13.533269Z

like... yaml?!

Ingy döt Net 2023-09-11T23:39:23.372019Z

j/k

lread 2023-09-12T02:28:59.194849Z

Ha!

lread 2023-09-11T21:22:44.249539Z

And 2nd question: are there real-world use cases of YAML with circular references? It would help me to understand its practical usage.

lread 2023-09-11T21:47:44.630379Z

It still feels like a user error at this point in my journey, in practical usage.

Ingy döt Net 2023-09-11T22:08:27.103949Z

It seems you are thinking of YAML from the pov of something written by a human.

Ingy döt Net 2023-09-11T22:09:45.030879Z

Circular references (maybe not in clojure) are common, and yaml is prepared to dump them and load them.

Ingy döt Net 2023-09-11T22:11:14.825939Z

The first serializer library I wrote (before YAML) I asked a friend to test it and it crashed his machine, because I didn't handle circular refs

Ingy döt Net 2023-09-11T22:11:59.039289Z

In terms of writing them by hand, it's probably rare but not unheard of

Ingy döt Net 2023-09-11T22:13:09.897869Z

If clojure's data model has no possibility of circular refs I guess the loader should error on them.

Ingy döt Net 2023-09-11T22:14:10.004279Z

What's the clj-yaml issues you are working on?

lread 2023-09-11T22:36:44.646849Z

Clj-yaml currently returns YAML sequences as clojure lazy seqs. This trips enough people up that we are considering optionally instead returning them as vectors. But this impacts any YAML that might include circular references, because in clj-yaml (and I assume SnakeYAML), these currently, when loaded result in stack overflows. So then I started to wonder about these circular references, realized I did not understand their use case, and decided to pester you! simple_smile https://github.com/clj-commons/clj-yaml/issues/110

lread 2023-09-11T22:42:53.437259Z

My imagination is probably limited to things I use clj-yaml for. Which is typically on config files. But the world is bigger than that, ya? simple_smile

Ingy döt Net 2023-09-11T22:43:41.703949Z

That's what most people use it for but it was designed for bigger things

Ingy döt Net 2023-09-11T23:47:01.826299Z

@lee

user=> (clj-yaml.core/parse-string "&x { x: *x }")
Execution error (StackOverflowError) at clj-yaml.core/eval2176$fn$iter$fn (core.clj:50).
null

user=> (clj-yaml.core/parse-string "&x [ *x ]")
Error printing return value (StackOverflowError) at clojure.lang.PersistentHashMap$BitmapIndexedNode/ensureEditable (PersistentHashMap.java:812).
null

Ingy döt Net 2023-09-11T23:50:20.634149Z

different errors for YAML circular map and circular seq

lread 2023-09-12T02:30:50.194539Z

Hmmm...!

Ingy döt Net 2023-09-11T23:50:48.572059Z

user=> (def h {:x "y"})
#'user/h
user=> (assoc h :y h)
{:x "y", :y {:x "y"}}
definitely not circular 🙂