Fork me on GitHub
#clj-yaml
<
2023-09-20
>
Ingy döt Net02:09:44

I thought this was a bug in clj-yaml:

user=> (-> {"a b" "c d"} clj-yaml.core/generate-string clj-yaml.core/parse-string)
{:a b "c d"}
user=> {:a b "c d"}
Syntax error reading source at (REPL:1:13).
Map literal must contain an even number of forms
but it's just that Clojure doesn't have a good way to print keywords with spaces in them.

lread03:09:41

Hmmm... yeah, a keyword with spaces won't do to well with round-tripping:

user=> (keyword "a b")
:a b
user=> (-> "a b" keyword pr-str)
":a b"
user=> (-> "a b" keyword pr-str read-string)
:a
When input yaml keys are known to have spaces, a user of clj-yaml would probably use :keywords false on parse.

Ingy döt Net03:09:06

user=> (-> {"a b" "c d"} clj-yaml.core/generate-string clj-yaml.core/parse-string clj-yaml.core/generate-string clj-yaml.core/parse-string)
{:a b "c d"}
yaml roundtrips them fine

Ingy döt Net03:09:55

I think it could likely be considered a bug in pr-str

Ingy döt Net03:09:15

it could make (keyword "a b") instead I should think

Ingy döt Net03:09:15

maybe it's not a good idea for clj-yaml to load keys with spaces or other weird things as keywords

lread03:09:10

The annoyance is that keyword can generate Clojure keywords that are technically invalid that can be printed but not read in. Questions around this come up often here on Slack. See keywords under https://clojure.org/reference/reader#_literals

lread03:09:24

> maybe it's not a good idea for clj-yaml to load keys with spaces or other weird things as keywords Huh, we documented this https://github.com/clj-commons/clj-yaml/blob/master/doc/01-user-guide.adoc#key-conversion: > When clj-yaml detects a key that cannot be converted to a Clojure keyword, it will leave it unconverted. Detection and conversion is not sophisticated and can result in keywords that are illegal and unreadable.

lread03:09:52

We could bump up the illegal keyword detection if this is making folks suffer.

Ingy döt Net03:09:26

I'm imagining a code exploit here...

user=> (-> {"x (println 123)" 1 "y (println 456)" 2} clj-yaml.core/generate-string clj-yaml.core/parse-string str read-string eval)
123
456
{:x nil, 1 :y, nil 2}

lread03:09:00

Ya, but you probably don't want to eval from untrusted data, yeah? And you'd want to use clojure.edn/read-string as well.

Ingy döt Net03:09:03

I'm not saying I'd write that code. I'm saying that people might find a way to use it in an exploitable way

lread03:09:10

Gotta hit the hay for today!

Ingy döt Net03:09:59

this reminds me of pyyaml documenting:

Loading YAML
Warning: It is not safe to call yaml.load with any data received from an untrusted source! yaml.load is as powerful as pickle.load and so may call any Python function. Check the yaml.safe_load function though.
from its very first release in 2006. People insisted that documenting it wasn't good enough and that yaml.load needed to call yaml.safe_load

lread03:09:05

I'll re-read your example tomorrow, I don't think I even read it right.... 😴

Ingy döt Net03:09:26

ok. have a good sleep

lread03:09:50

Yeah there is a SnakeYAML SafeConstructor we use by default. So we don't instantiate Java classes without the user opting in. But 💤

1
lread12:09:10

Awake again. So in your example, you are explicitly calling eval. There's nothing a library can do to stop folks from calling eval on things.

Ingy döt Net13:09:55

Well sure. I was just coming up with a quick example of how that yaml load could lead to something completely unexpected and possibly exploitable.

Ingy döt Net13:09:34

I think the main point here is that it's probably a bad idea to load all scalar keys into keywords.

Ingy döt Net13:09:01

But if you already have something the user can do to change that then I guess it's okay for now.

lread13:09:23

Agreed, we would not have made that choice a default, from the user guide: > The :keywords option defaults to true for historical reasons. It's a default behavior we inherited.

Ingy döt Net03:09:04

user=> (require 'clj-yaml.core) (def v [1 2 3]) (-> [v [1 2 3] v] clj-yaml.core/generate-string print)
nil
#'user/v
- [1, 2, 3]
- [1, 2, 3]
- [1, 2, 3]
nil
user=>  
I think clj-yaml really should alias identical values. See:
$ perl -MYAML::PP -e '$v = [1, 2, 3]; print YAML::PP::Dump [$v, [1, 2, 3], $v]'
---
- &1
  - 1
  - 2
  - 3
- - 1
  - 2
  - 3
- *1
or:
$ python -c 'import yaml; v = [1, 2, 3]; print(yaml.dump([v, [1, 2, 3], v]))'
- &id001
  - 1
  - 2
  - 3
- - 1
  - 2
  - 3
- *id001

lread12:09:30

Thanks for the input @U05H8N9V0HZ! Clj-yaml has many rough edges but is hopefully good enough for most folks to parse/generate the odd bit of YAML. Do you think clj-yaml needs to alias identical values? What problem does this solve for clj-yaml users? Can we just wait for some hypothetical future reference implementation to handle this type of thing? simple_smile

Ingy döt Net13:09:30

Yep sounds good.

Ingy döt Net13:09:54

I do think this is probably a fairly common case if people are using YAML as a data serialization language which is what it was originally made for.

Ingy döt Net13:09:59

And I don't think you can create cyclic data structures with Clojure immutable values. So that simply is not a use case.

Ingy döt Net13:09:34

But if nobody's clamoring for it it's probably up to me to make a PR 😄

lread13:09:24

And if nobody is clamoring, maybe you shouldn't create that PR!

lread13:09:28

For cyclic structures, maybe a strategy like https://github.com/aroemers/rmap could work. Worth maybe exploring. But nobody is clamoring for this for clj-yaml. So not going to look into it at this time.

👍 1