Fork me on GitHub
#cljs-dev
<
2018-11-17
>
shaunlebron01:11:05

cljs.core/test doesn’t seem to find the :test function in metadata:

$ clj
Clojure 1.9.0
user=> (defn ^{:test #(assert false)} foo [] nil)
#'user/foo
user=> (test #'foo)
AssertionError Assert failed: false  user/fn--145 (NO_SOURCE_FILE:1)
user=> ^D

$ clj -m cljs.main -r
ClojureScript 1.10.439
cljs.user=> (defn ^{:test #(assert false)} foo [] nil)
#'cljs.user/foo
cljs.user=> (test #'foo)
:no-test
cljs.user=> ^D

$ planck
ClojureScript 1.10.439
cljs.user=> (defn ^{:test #(assert false)} foo [] nil)
#object[Function]
cljs.user=> (test #'foo)
:no-test
cljs.user=>

mfikes02:11:24

@shaunlebron My guess would be that because it existed before the rudimentary support for vars was added, so instead you pass the function value

mfikes02:11:33

Maybe things could be revised

shaunlebron02:11:20

@mfikes ah, you’re right, thanks:

$ clj -m cljs.main -r
ClojureScript 1.10.439
cljs.user=> (defn ^{:test #(assert false)} foo [] nil)
#'cljs.user/foo
cljs.user=> (test foo)
Error: Assert failed: false
cljs.user=>

mfikes02:11:34

Maybe worth a JIRA to see if we can clean it up a bit

shaunlebron02:11:59

i’ll file and post here

martinklepsch11:11:43

I hope people don't mind another question about analyzer behavior. When analyzing specter the analyzer is throwing an Unable to resolve var: coll? in this context at line 1449. coll? being a built in I'm unsure how this is possible? (link to code in thread)

borkdude13:11:13

Found a discrepancy between clj + cljs:

(subs "" 1) ;; throws in clj
(subs "" 1) => "" ;; in cljs
Is this something that should be fixed, or is it “let the host platform deal with it”. There is code that depends on this behavior (at least re-seq does)

borkdude13:11:07

(subs "" :a :b)
also works in cljs. it seems you can pass whatever you want and it returns an empty string if it doesn’t make sense

mfikes13:11:32

I want to say that all are undefined behavior. (Also, any attempt to do anything about it messes up the “optimized for correct programs” theme.)

borkdude13:11:08

@mfikes I think I’ll need to make a special case for the subs spec in cljs then, because currently re-seq runs into trouble with it

mfikes13:11:31

Maybe re-seq could be revisited if it is relying on undefined behavior?

borkdude13:11:42

I’ll make an issue for it

mfikes13:11:35

Really, if we absolutely need to, re-seq could call .substring directly, which is a little gross. But, arguably if it is relying on JavaScript semantics for some reason, maybe it shouldn’t be calling subs

jaawerth18:11:54

Huh, looking at the source for re-seq I think I also see an opportunity for improving performance: it's searching the string twice by running .search to find the index after running re-find. re-find already uses re.exec to find the match, it's just discarding the index you get from .exec. It could skip having to search the string twice by calling .exec directly.

borkdude19:11:32

Interesting. I’m trying this in a Planck REPL:

(let [regex #"a", s "aaaa"] (while (let [res (.exec regex s)] (println (.-lastIndex regex)) res)))
but the loop never terminates.

borkdude19:11:46

@jesse.wertheim .exec only returns the match right, not the index?

jaawerth19:11:42

it sticks the index onto the array via an .index property

jaawerth19:11:20

which is admittedly kinda weird, but that's javascript for you

borkdude19:11:38

is this supported on all platforms, browser, node etc?

jaawerth19:11:56

yep, it's part of the spec

borkdude19:11:57

do you have a link to the spec? I’m reading Ecma 2018 and it says: > returns an Array object containing the results of the match, or null if string did not match.

jaawerth19:11:25

poking through it now 😉

jaawerth19:11:21

It's possible I'm conflating it being in spec with the lastIndex property of (stateful) global regular expressions - if I can't find it here I'll bug someone in TC39 about it

jaawerth20:11:49

@U04V15CAJ ahh it's more explicit going back to ECMA 5.1's spec version - there's more indirection in the latest spec since it refers to RegExpBuiltinExec. https://www.ecma-international.org/ecma-262/5.1/#sec-15.10.6.2

jaawerth20:11:08

(see item 15 in that list)

borkdude20:11:01

so this is 2011? how far back does cljs want to support standards?

jaawerth20:11:27

I dunno. I didn't refer back further - it could go back to ES 5 or even ES3

jaawerth20:11:02

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec talks about the index stuff and at the bottom there are various spec links - I just opened ES3 to see if I can find it there

jaawerth20:11:51

(that's MDN - the earlier devdocs link just scrapes it - I use devdocs since it's faster to search than MDN itself)

borkdude20:11:04

> 13. Return a new array with the following properties: • The index property is set to the position of the matched substring within the complete string

jaawerth20:11:24

so yeah it dates back to at least 1999

borkdude20:11:25

what’s the maximum ecmascript standard that cljs wants to support? specifically, is it ok to rely on the .-index property in the array resulting from a call with (.exec #"a" "a").

borkdude20:11:51

@jesse.wertheim points out it dates back to at least 1999 (link: https://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf). if we could use this in re-seq, it could be further optimized.

borkdude20:11:19

should be good then

borkdude21:11:44

@jesse.wertheim it seems to help a little bit:

cljs.user=> (simple-benchmark [re #"\d+" s "a1b22c333d"] (re-seq re s) 100000000)
[re #"\d+" s "a1b22c333d"], (re-seq re s), 100000000 runs, 50975 msecs
nil
cljs.user=> (simple-benchmark [re #"\d+" s "a1b22c333d"] (cljs.core/re-seq re s) 100000000)
[re #"\d+" s "a1b22c333d"], (cljs.core/re-seq re s), 100000000 runs, 53529 msecs

borkdude21:11:37

for longer strings, the difference may become more significant

jaawerth21:11:03

yeah - I imagine with direct benchmarking the JS jit engine might be able to optimize things a bit to reduce the difference, but that might vary in the context of a full application as it prioritizes what to optimize

jaawerth21:11:55

but to your point it could make for a big difference for longer strings, which are going to be a good usecase for re-seq

jaawerth21:11:49

I imagine that a global JS regex might be faster since you can just keep calling exec and look at the lastIndex property of the global regex, but it probably would lose out since you have to do something nasty under the hood like (RegExp. (.-source original-re) "g") and dynamically rebuilding the regex using the constructor would be expensive

jaawerth21:11:39

correction: converting to a JS global RegExp would prevent needing to use subs at all (unless you wanted to include it in the result), since each exec gives you the matches, but it would probably only really win for long strings because of the cost of using the RegExp constructor

borkdude21:11:40

could that still be implemented lazily?

borkdude21:11:33

anyway feel free to try. the latest version I had was:

(defn re-seq
  "Returns a lazy sequence of successive matches of re in s."
  [re s]
  (when-let [matches (.exec re s)]
    (let [match-idx (.-index matches)
          match-str (first matches)
          post-idx (+ match-idx (max 1 (count match-str)))
          matches (if (== (count matches) 1)
                    (first matches)
                    (vec matches))]
      (lazy-seq (cons matches
                      (when (<= post-idx (count s))
                        (re-seq re (subs s post-idx))))))))

jaawerth17:11:57

it can be, but it's kinda gross and I think it's probably best to leave it using non-global regular expressions (since they don't exist in clj/cljs anyway). But this is what I mean - you would convert the existing regex into a global one and then just keep calling .exec until it returns null. Should be fine since the mutable global regex only exists within the lazy-seq, but doing this conversion dynamically is expensive compared to using a regex literal:

(require 'clojure.string)

(defn- re-seq-g* [global-reg s]
  (lazy-seq
    (some->
      (.exec global-reg s)
      vec
      (cons (re-seq-g* global-reg s)))))

(defn re-seq-g [pattern s]
  (let [[src flags]
        (if (string? pattern)
          [pattern "g"]
          [(.-source pattern) (clojure.string/join (into #{"g"} (.-flags pattern)))])
        global-reg (js/RegExp. src flags)]
    (re-seq-g* global-reg s)))

jaawerth17:11:07

to be clear that's just a way it COULD be done - I'm not seriously suggesting this be the implementation 😉

borkdude17:11:37

I haven’t bothered. The most recent one is in the -2 patch: https://dev.clojure.org/jira/browse/CLJS-2979

borkdude17:11:59

1.6-1.8x speedup, seems nice

jaawerth17:11:03

nice! yeah I just wrote the above to show what i was talking about - I'm vaguely curious whether it would have better perf in some usecases but even if it did, I'm guessing it wouldn't be worth the dynamism of RegExp.

jaawerth17:11:03

oh and with the above I didn't turn it into a string rather than vector for the case of no subgroup matches

jaawerth17:11:23

so more like:

(require 'clojure.string)
(defn- re-seq-g* [global-reg s]
  (lazy-seq
    (some->
      (.exec global-reg s)
      (as-> match-data
        (if (= 1 (.-length match-data))
          (aget match-data 0)
          (vec match-data)))
      (cons (re-seq-g* global-reg s)))))

(defn re-seq-g [pattern s]
  (let [[src flags]
        (if (string? pattern)
          [pattern "g"]
          [(.-source pattern) (clojure.string/join (into #{"g"} (.-flags pattern)))])
        global-reg (js/RegExp. src flags)]
    (re-seq-g* global-reg s)))