clj-yaml

Ingy döt Net 2023-09-20T02:32:44.344179Z

I thought this was a bug in clj-yaml:

user=> (-> {"a b" "c d"} clj-yaml.core/generate-string clj-yaml.core/parse-string)
{:a b "c d"}
user=> {:a b "c d"}
Syntax error reading source at (REPL:1:13).
Map literal must contain an even number of forms
but it's just that Clojure doesn't have a good way to print keywords with spaces in them.

lread 2023-09-20T12:43:10.687099Z

Awake again. So in your example, you are explicitly calling eval. There's nothing a library can do to stop folks from calling eval on things.

Ingy döt Net 2023-09-20T13:48:55.022429Z

Well sure. I was just coming up with a quick example of how that yaml load could lead to something completely unexpected and possibly exploitable.

Ingy döt Net 2023-09-20T13:49:34.893659Z

I think the main point here is that it's probably a bad idea to load all scalar keys into keywords.

Ingy döt Net 2023-09-20T13:50:01.895509Z

But if you already have something the user can do to change that then I guess it's okay for now.

lread 2023-09-20T13:52:23.084409Z

Agreed, we would not have made that choice a default, from the user guide: > The :keywords option defaults to true for historical reasons. It's a default behavior we inherited.

Ingy döt Net 2023-09-20T13:56:46.312729Z

Ah. Got it.

lread 2023-09-20T03:02:41.925649Z

Hmmm... yeah, a keyword with spaces won't do to well with round-tripping:

user=> (keyword "a b")
:a b
user=> (-> "a b" keyword pr-str)
":a b"
user=> (-> "a b" keyword pr-str read-string)
:a
When input yaml keys are known to have spaces, a user of clj-yaml would probably use :keywords false on parse.

Ingy döt Net 2023-09-20T03:05:06.235439Z

user=> (-> {"a b" "c d"} clj-yaml.core/generate-string clj-yaml.core/parse-string clj-yaml.core/generate-string clj-yaml.core/parse-string)
{:a b "c d"}
yaml roundtrips them fine

Ingy döt Net 2023-09-20T03:05:55.550729Z

I think it could likely be considered a bug in pr-str

Ingy döt Net 2023-09-20T03:10:15.584219Z

it could make (keyword "a b") instead I should think

Ingy döt Net 2023-09-20T03:32:15.992469Z

maybe it's not a good idea for clj-yaml to load keys with spaces or other weird things as keywords

lread 2023-09-20T03:33:10.621469Z

The annoyance is that keyword can generate Clojure keywords that are technically invalid that can be printed but not read in. Questions around this come up often here on Slack. See keywords under https://clojure.org/reference/reader#_literals

lread 2023-09-20T03:36:24.996619Z

> maybe it's not a good idea for clj-yaml to load keys with spaces or other weird things as keywords Huh, we documented this https://github.com/clj-commons/clj-yaml/blob/master/doc/01-user-guide.adoc#key-conversion: > When clj-yaml detects a key that cannot be converted to a Clojure keyword, it will leave it unconverted. Detection and conversion is not sophisticated and can result in keywords that are illegal and unreadable.

lread 2023-09-20T03:37:52.827579Z

We could bump up the illegal keyword detection if this is making folks suffer.

Ingy döt Net 2023-09-20T03:42:26.948479Z

I'm imagining a code exploit here...

user=> (-> {"x (println 123)" 1 "y (println 456)" 2} clj-yaml.core/generate-string clj-yaml.core/parse-string str read-string eval)
123
456
{:x nil, 1 :y, nil 2}

lread 2023-09-20T03:46:00.040519Z

Ya, but you probably don't want to eval from untrusted data, yeah? And you'd want to use clojure.edn/read-string as well.

Ingy döt Net 2023-09-20T03:47:03.431179Z

I'm not saying I'd write that code. I'm saying that people might find a way to use it in an exploitable way

lread 2023-09-20T03:47:10.114479Z

Gotta hit the hay for today!

Ingy döt Net 2023-09-20T03:47:17.112479Z

o/

Ingy döt Net 2023-09-20T03:48:59.497719Z

this reminds me of pyyaml documenting:

Loading YAML
Warning: It is not safe to call yaml.load with any data received from an untrusted source! yaml.load is as powerful as pickle.load and so may call any Python function. Check the yaml.safe_load function though.
from its very first release in 2006. People insisted that documenting it wasn't good enough and that yaml.load needed to call yaml.safe_load

lread 2023-09-20T03:49:05.242649Z

I'll re-read your example tomorrow, I don't think I even read it right.... 😴

Ingy döt Net 2023-09-20T03:49:26.804819Z

ok. have a good sleep

lread 2023-09-20T03:50:50.388589Z

Yeah there is a SnakeYAML SafeConstructor we use by default. So we don't instantiate Java classes without the user opting in. But 💤

✅ 1
Ingy döt Net 2023-09-20T03:57:04.193689Z

user=> (require 'clj-yaml.core) (def v [1 2 3]) (-> [v [1 2 3] v] clj-yaml.core/generate-string print)
nil
#'user/v
- [1, 2, 3]
- [1, 2, 3]
- [1, 2, 3]
nil
user=>  
I think clj-yaml really should alias identical values. See:
$ perl -MYAML::PP -e '$v = [1, 2, 3]; print YAML::PP::Dump [$v, [1, 2, 3], $v]'
---
- &1
  - 1
  - 2
  - 3
- - 1
  - 2
  - 3
- *1
or:
$ python -c 'import yaml; v = [1, 2, 3]; print(yaml.dump([v, [1, 2, 3], v]))'
- &id001
  - 1
  - 2
  - 3
- - 1
  - 2
  - 3
- *id001

lread 2023-09-20T12:37:30.469129Z

Thanks for the input @ingy! Clj-yaml has many rough edges but is hopefully good enough for most folks to parse/generate the odd bit of YAML. Do you think clj-yaml needs to alias identical values? What problem does this solve for clj-yaml users? Can we just wait for some hypothetical future reference implementation to handle this type of thing? simple_smile

Ingy döt Net 2023-09-20T13:51:30.707509Z

Yep sounds good.

Ingy döt Net 2023-09-20T13:52:54.220629Z

I do think this is probably a fairly common case if people are using YAML as a data serialization language which is what it was originally made for.

Ingy döt Net 2023-09-20T13:53:59.087879Z

And I don't think you can create cyclic data structures with Clojure immutable values. So that simply is not a use case.

Ingy döt Net 2023-09-20T13:55:34.146819Z

But if nobody's clamoring for it it's probably up to me to make a PR 😄

lread 2023-09-20T13:58:24.748329Z

And if nobody is clamoring, maybe you shouldn't create that PR!

lread 2023-09-20T13:59:28.417739Z

For cyclic structures, maybe a strategy like https://github.com/aroemers/rmap could work. Worth maybe exploring. But nobody is clamoring for this for clj-yaml. So not going to look into it at this time.

👍 1