Fork me on GitHub
#cljs-dev
<
2023-05-04
>
Adam Kalisz08:05:18

Together with @mfikes and @dnolen we recently improved the random-uuid implementation in ClojureScript making it considerably faster (https://clojure.atlassian.net/browse/CLJS-3369). However, it is not ideal, since it didn't fix the ultimate problem that it relies on Math/random in the form of rand-int instead of proper crypto functions. For that I thought about something like this:

(defn get-array
  []
  (js/Array.from (.getRandomValues js/crypto (js/Uint16Array. 2048))))

(defn get-uint16
  []
  (let [uint16 (when crypto-array
                 (.shift crypto-array))]
    (if-not uint16
      (do (def ^:dynamic crypto-array (get-array))
          (get-uint16))
      uint16)))

(defn crypt3-random-uuid
  "Improved UUIDv4 generation."
  []
  (letfn [(quad-hex []
            (let [unpadded-hex ^string (.toString (get-uint16) 16)]
              (case (count unpadded-hex)
                1 (str "000" unpadded-hex)
                2 (str "00" unpadded-hex)
                3 (str "0" unpadded-hex)
                unpadded-hex)))]
    (let [ver-tripple-hex ^string (.toString (bit-or 0x4000 (bit-and 0x0fff (get-uint16))) 16)
          res-tripple-hex ^string (.toString (bit-or 0x8000 (bit-and 0x3fff (get-uint16))) 16)]
      (uuid (str (quad-hex) (quad-hex) "-" (quad-hex) "-"
                 ver-tripple-hex "-" res-tripple-hex "-"
                 (quad-hex) (quad-hex) (quad-hex))))))
It just shows getting a bigger chunk of random data and then basically taking smaller chunks from this small buffer is quite efficient. However, this is too concrete, hacky and very likely thread UNsafe in my view. I would much rather open a discussion in the Clojure and ClojureScript community about how we would like to treat getting cryptographically random data and whether everybody is fine with the current state of things. I envision basically a /dev/urandom but with configurable output to get e.g. a byte stream, or an endless well of characters possibly even constrained to some alphabet/ strings possibly from some alphabet/doubles possibly from some range/ longs possibly from some range or whatever the literals and possibly even platform specific types are. This would make it much simpler to implement random-uuid in ClojureScript properly, however it could probably make it much simpler to implement portable libraries for cryptographic key generation/ generating truly random data e.g. for testing. The idea is to focus on practical usability and good performance (it shouldn't be much slower than tapping directly into /dev/urandom). My naive JavaScript implementation was able to reach about 1/8th of the /dev/urandom bandwidth based on the measurement of how many UUIDv4s I am able to generate per second in Chrome. The biggest bottleneck seems to be the browser/ the browser API. What do you think?

hifumi12309:05:11

I would personally like to find a way to resort to the Web Cryptography API, but it has poor support outside of the latest browsers; e.g. Node.js only supports randomUuid() in 19+, and I am not sure if GCC has "good" polyfills for this

Adam Kalisz12:05:03

The .getRandomValues works outside the secure context too and is widely supported since many years. I would need to test/ check more in depth on old versions of Node.JS for compatibility.

niwinz09:05:39

We have implemented alternative impl on penpot: https://github.com/penpot/penpot/blob/develop/common/src/app/common/uuid_impl.js maybe it can be useful for another example.

niwinz09:05:27

Focused on performance and uses WebCrypto if it is available

Adam Kalisz12:05:14

You will probably loose the performance battle when getting small buffers of 16 bytes, especially when you need more than a single UUID (which is very often the case). The expensive thing is the API call and getting the random data. Getting it by the OS page size (usually 4096 bytes -> 2048 "shorts") turns out to be quite efficient. Working with strings and single numbers that the JIT recognizes as fitting in 31 bits is quite efficient in JS but your version could probably be slightly faster there than mine.

niwinz20:05:33

agree, but in this case it is already pretty fast (twice faster than random-uuid) and it is more than enough. Using BigInt math makes it a bit slower that the best performance, but reduces the code and complexity a lot, so this is a tradeoff. our approach generates truly v7/v8 UUID type fast enough for us

niwinz20:05:22

We have the same implementation for JVM also, so it is completely multi platform https://github.com/penpot/penpot/blob/develop/common/src/app/common/UUIDv8.java, we can think on extract it to a library if there are some interest in it, but for now it lives on the penpot codebase

niwinz20:05:57

v8 (based on same ideas of v7) outperforms with a good margin all other fully random v4 implementations because it does not need to generate random values on each invocation

borkdude09:05:35

> would personally like to find a way to resort to the Web Cryptography API I guess you could probe if it exists via js/globalThis.crypto and then use it, if not, then fall back on the older less optimal impl

p-himik09:05:53

TIL there's globalThis. Instead of adding self to Node or adding global to browsers, they added globalThis everywhere.

👍 1
borkdude10:05:26

Funny detail is that in SCI environments like scittle or nbb, I map js to globalThis , so js/crypto is just looking up crypto from globalThis

borkdude10:05:36

whereas in regular CLJS it's a special construct

borkdude10:05:04

if you do:

npx nbb@latest -e '(= js js/globalThis)'
it returns true because js is just the global object, not a special syntax

p-himik10:05:29

Reading this to understand the reasoning: https://github.com/tc39/proposal-global/blob/master/NAMING.md Do you have any idea what they mean by "realm"?

p-himik10:05:17

Phew, it's not a new entity: > The language reference uses abstract terms because JavaScript environments can vary widely. In the browser, a window (a frame, a window opened with window.open(), or just a plain browser tab) is a realm.

borkdude10:05:05

I guess those people read too many fantasy books

p-himik10:05:19

Oh, we also have this wonder, albeit at the proposal stage: https://github.com/tc39/proposal-compartments Every single time I read something about JS, I feel like the relative amount of my knowledge about it reduces dramatically.

borkdude10:05:03

Maybe they should watch Rich Hickey's new talk. It seems they are starting with a solution rather than a problem statement.

p-himik10:05:29

Yeah. :) A cavalcade of poor decisions.

hifumi12310:05:42

One huge complication for web cryptography api usage is that browsers seem to only expose self.crypto in “secure context”. So we’ll need more than just a polyfill if we intend to use it in a browser. Overall, I think the current implementation of random-uuid is fine as-is, but if we want to try making use of the host, then there is a lot of tinkering one can do in this area

borkdude10:05:23

As @U05224H0W suggested, maybe best to just use a library for this

👍 1
hifumi12310:05:53

funny thought that just occurred to me: we can expose a random-uuid macro in case people want to compute UUIDs at compile-time and avoid as much run-time generation as possible 😄 in any case, i think my opinion is firmly “keep random-uuid as-is”

borkdude10:05:10

if compile time random uuids are a solution to your problem, are you sure you needed random uuids at all?

Adam Kalisz13:05:35

As noted before I obviously thought about the API support a bit... The .getRandomValues works outside the secure context too and is widely supported since many years. I would need to test/ check more in depth on old versions of Node.JS for compatibility.

Adam Kalisz13:05:23

random-uuid should use crypto by default - the same as the Clojure (Java) version. It was very unfortunate to use Math/random by default at all especially since the API getRandomValues seems to be older than ClojureScript.

thheller10:05:27

why is it important that this happens in cljs.core and not a library? just curious but what are you doing that the speed of random-uuid matters?

4
Adam Kalisz13:05:11

If you are giving up on speed just because you can and the implementation isn't an order of magnitude simpler, there is no good reason to just burn the cycles. Getting random data in any form with a reasonable speed is always good. That way you can use the API in many situations without thinking twice - e.g. you can give every animation, every span a UUID and don't need to invent your identifiers and think about all the potential problems. Especially for the UUID case - I thought the implementation in quads to be somewhat more elegant and an obvious way how to reduce the number of function calls. It turned out to bring the expected speed up with it. Now I would like to actually get the proper crypto too (which might reduce performance by a bit, but it will probably still end up faster overall than the previous version).

Adam Kalisz13:05:55

random-uuid is in core, so an endless well of cryptographically pseudo-random data would be much welcome to be useable by random-uuid without duplication of effort. I am not opposed to have the "random well" as a separate library however it would be quite strange if random-uuid would tap into it if the library was present for instance.

thheller13:05:55

I'm more concerned about the availability of js/crypto. if its not available universally then it can't be in core. like is it available in node, deno, bun, browsers, jsc and whatever other engines might exist?

Adam Kalisz14:05:44

That needs to be tested. We can also have the current implementation as a fall back while e.g. setting some kind of flag to either not the fallback was used or throw/log an error if it is used or something for more constrained environments/ environments with stricter requirements that are now quite possibly mislead about the quality of random-uuid.

thheller14:05:16

but if I see this correctly in node its not a global js/crypto but something you require/import?

Adam Kalisz14:05:00

May be different with very new Node.js though. I definitely would need more guidance, I am not sure what minimal version ClojureScript targets. Perhaps with the next major version though, the requirement could be lifted to only support slightly more recent stuff that is not so constrained in regards to cryptography for instance.

Adam Kalisz10:05:01

It seems, the non-randomness in the cryptographic sense of Math.random can be exploited rather practically: https://www.youtube.com/watch?v=_Iv6fBrcbAM It would help a lot to know what the minimal supported environment should look like. I guess we can have the Math.random implementation as a fallback if there is no supported cryptographic implementation. In such a case, it should be possible for the developer to know up front if actual randomness will be used. My worry is this is just so easy to overlook and use e.g. UUIDs as some session tokens with Node.js backend possibly making guessing a practical attack vector.