Ok @ingy, it is time this channel had its first questions! We are working on an issue in clj-yaml that touches on circular references in YAML. For example:
recursive:
name: "A node"
child: &ref_node
name: "Child node"
child: *ref_node
This YAML describes a child, with a child, with a child, with a child... and so on forever...
This seems maybe a conceptually interesting thing to describe, but how would a YAML parser ever fully parse this?
Would this be considered invalid YAML by some parsers?
Is it common for YAML to have infinite recursion like this? If so, where and why?A bunch of things to say here 🙂
Let's start by avoiding useless quotes 🙂
recursive:
name: A node
child: &ref_node
name: Child node
child: *ref_node
I thinking you meant "loader" instead of "parser" here. In YAML parlance "load" means all the way from text to native. "parse" is one stage of "load" load = read->lex->parse->compose->resolve->contruct
parsing is the most difficult part to implement and the part we test the most. this is you yaml parsed by 17 implementations: https://play.yaml.io/main/parser?input=cmVjdXJzaXZlOgogIG5hbWU6IEEgbm9kZQogIGNoaWxkOiAmcmVmX25vZGUKICAgIG5hbWU6IENoaWxkIG5vZGUKICAgIGNoaWxkOiAqcmVmX25vZGU=
wrt recursive aliases, your example is completely valid and any loader should handle it (assuming the data model it loads to supports references and cyclic referencing)
YAML is a foremost a data serialization language. the opposite of load is "dump"
you should be able to dump any clojure data structure and later load back to an equivalent
this might involve configuring your dumper and loader to understand certain non-standard data structures.
but yaml is intended for that purpose, in a programming language agnostic manner
$ perl -MYAML -E '$a = []; push @$a, $a; say YAML::Dump $a'
--- &1
- *1
is a vector whose only element is a reference to itself$ perl -MYAML -E '$a = []; push @$a, $a; say Dump Load Dump $a'
--- &1
- *1
and that's an example of re loading the dump and then dumping again 🙂that's all I can think of to say
@lee feel free to ask more questions. I'll even answer them in a thread 🙂
Thanks @ingy! I'll take some time to digest that.
It's your channel you can thread (or not) if you want to!
I like threads, just forgot.
So, loading a circular YAML example like the one above should not, in theory, try to realize the circularity? And therefore does not lead to things like endless loops or stack overflows?
Well I don't know. Can you make an immutable collection with a circular reference? Seems impossible to me...
In most languages it is very easy for a loader to create a circular ref. It's basically a pointer.
Right, I did start my career in C... so ya, if I were saving some YAML that described some data structure...
In theory you could write out the entire state of a clojure thread, save it to disk and then reload it later
I've used yaml to serialize the entire perl symbol table. The dump was 14,000 lines or so
Do you agree with me that Clojure data structures are not capable of circular references?
I haven't really thought about how laziness plays into the game
I think Clojure's immutable data structures don't support circular references... but I've never attempted such a thing. I'd have to think about that.
Well create a hashmap and then assoc a new key with that hash map
You would get a different hash map
So it's not really circular at all
So the lazy seq was used in clj-yaml (we adopted it, history unknown), I think, to avoid instantly blowing up on circular references. You could at least load some of the YAML.
But you might be able to contrive something. Not sure.
Do you use lazy hash maps as well?
Circular aliasing applies to both sequences and hash maps which are both collections
Lazyness typically applies to sequences. But... clj-yaml does use https://github.com/clj-commons/ordered to preserve order of YAML... but don't think there is laziness there.
Yeah, I think standard Clojure data structures, by design, don't support circular references....
But that doesn't mean you couldn't create an abstraction that describes the circular reference.
like... yaml?!
j/k
Ha!
And 2nd question: are there real-world use cases of YAML with circular references? It would help me to understand its practical usage.
It still feels like a user error at this point in my journey, in practical usage.
It seems you are thinking of YAML from the pov of something written by a human.
Circular references (maybe not in clojure) are common, and yaml is prepared to dump them and load them.
The first serializer library I wrote (before YAML) I asked a friend to test it and it crashed his machine, because I didn't handle circular refs
In terms of writing them by hand, it's probably rare but not unheard of
If clojure's data model has no possibility of circular refs I guess the loader should error on them.
What's the clj-yaml issues you are working on?
Clj-yaml currently returns YAML sequences as clojure lazy seqs. This trips enough people up that we are considering optionally instead returning them as vectors. But this impacts any YAML that might include circular references, because in clj-yaml (and I assume SnakeYAML), these currently, when loaded result in stack overflows. So then I started to wonder about these circular references, realized I did not understand their use case, and decided to pester you! simple_smile https://github.com/clj-commons/clj-yaml/issues/110
My imagination is probably limited to things I use clj-yaml for. Which is typically on config files. But the world is bigger than that, ya? simple_smile
That's what most people use it for but it was designed for bigger things
user=> (clj-yaml.core/parse-string "&x { x: *x }")
Execution error (StackOverflowError) at clj-yaml.core/eval2176$fn$iter$fn (core.clj:50).
null
user=> (clj-yaml.core/parse-string "&x [ *x ]")
Error printing return value (StackOverflowError) at clojure.lang.PersistentHashMap$BitmapIndexedNode/ensureEditable (PersistentHashMap.java:812).
nulldifferent errors for YAML circular map and circular seq
Hmmm...!
user=> (def h {:x "y"})
#'user/h
user=> (assoc h :y h)
{:x "y", :y {:x "y"}}
definitely not circular 🙂