Fork me on GitHub
#clojure
<
2021-07-11
>
borkdude12:07:46

Is there a way to instrument a var before it gets directly linked into other code?

Ben Sless13:07:07

You can hack defn 🙂

borkdude13:07:01

Can you explain what you have in mind? My use case is: a library uses function x in several places, but I would like to monkey-patch x to do something more. But with direct linking, I'd have to patch the "several places" as well.

Ben Sless13:07:51

defn returns a var. You can alter-var-root defn to wrap the original defn, which gives you the var, do whatever you want to it, then return it like defn would regularly

Ben Sless13:07:51

You might want to limit this hack's scope by creating something like instrumenting-require which would instrument defn, require the file, then "uninstrument" it

Ben Sless13:07:09

Why not just turn off direct linking?

borkdude13:07:36

So first patch defn. Then "watch" for the var you want to patch, and ignore the rest? And then load the actual lib?

borkdude13:07:59

Yes, disabling direct linking seems like a better "workaround".

borkdude13:07:06

This is for a library

Ben Sless13:07:24

can you expand a little bit regarding the use case?

borkdude13:07:46

I want to give users the ability to patch eval in SCI but I don't want to pay any extra performance hit in the normal case

borkdude13:07:02

In CLJS it works nicely since you can just patch the function reference directly, there is no "direct linking"

borkdude13:07:26

but perhaps the most important environment in which you want to do this is CLJS, since in JVM settings you can just spawn threads and kill stuff

borkdude13:07:08

But I sometimes I need this feature myself as well when I want to patch functions for libraries in babashka. And bb is compiled using direct linking

borkdude13:07:22

So when I do this, I have to dig for every usage of that function and also patch those functions

Ben Sless13:07:33

hm. A registry of interceptors for var names which defn always checks?

borkdude13:07:54

something like that could work

borkdude13:07:30

user=> (def old-defn @#'clojure.core/defn)
#'user/old-defn
user=> (alter-var-root #'clojure.core/defn (constantly (fn [form env fn-name & args] (prn :fn fn-name) (apply old-defn form env fn-name args))) )
#object[user$eval140$fn__141 0xc3177d5 "user$eval140$fn__141@c3177d5"]
user=> (defn foo [])
:fn foo
#'user/foo

Ben Sless13:07:40

This is such a hack 😄

😆 2
didibus20:07:04

You control eval no? Can't you do something like:

(def eval
  (if user-eval-fn
    user-eval-fn
    sci-eval))

didibus20:07:47

And then people can do (set! user-eval-fn) with user-eval-fn being a dynamic var?

borkdude20:07:31

the problem is that dynamic vars are slow and when you call eval 1M times, it will take significantly longer with a dynamic var

didibus20:07:29

But it won't no? Because the if will be executed only once when the lib is loaded, afterward eval will be direct linked with the value of user-eval-fn no?

borkdude20:07:07

ah like that, yes, that could work

didibus20:07:18

Or maybe you have to do:

(def eval
  (if user-eval-fn
    @user-eval-fn
    ...
To force getting the value out and bind eval not to the dynamic var but to its value

borkdude20:07:59

it's slightly awkward in the order of loading stuff though. the user probably wants the normal eval + something extra and by the time you get the normal eval, everything's already loaded

didibus20:07:06

Ya, so I'm thinking it's like a "compile" time dynamic var, the user can set! it before they require the lib

didibus20:07:59

Maybe another way, so it doesn't force the user to have a weird require or calling set! before the call to ns, is you can take a JVM property/env variable. So maybe I can set a JVM property pointing to my custom eval, and your def can see if that property is set it uses that one.

didibus20:07:15

Oh, I see what you mean. Like inside their user-eval they might want to use the sci-eval, and so that one would need to already be loaded... Hum..

didibus20:07:54

Okay, I just had another idea

didibus20:07:23

What if the user provided a factory for creating the eval function. Something like:

(def ^:dynamic eval-factory)

(defn get-eval-factory []
  (or eval-factory
      (requiring-resolve (symbol (System/getProperty "eval-factory")))
      (requiring-resolve (symbol (System/getenv "eval-factory"))))

(def eval
  (letfn [eval ([...] )] ;; This is the normal eval
    (if-let [ef (get-eval-factory)]
      (ef eval) ;; And you pass the normal eval to the eval factory
      eval)
So the user provided a function that will return the eval function which takes the normal eval function as an argument.

didibus20:07:19

That way by the time the user eval is compiled everything is loaded, but the eval Var isn't loaded yet, but also the user can reference the normal eval function.

didibus20:07:42

I think that would work with direct-linking

borkdude20:07:01

yeah, that can work

kennytilton14:07:58

If Clojure had copied Common Lisp packages https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node111.html we would not have forty-line namespaces at the top of every source file. https://github.com/kennytilton/matrix/blob/6300132cc64635922c7b2b484cdb7a52d0c64107/cljs/rxtrak/src/rxtrak/build.cljs#L1 Too late for an RFE? 🤪

p-himik14:07:37

Wouldn't we have same 40 lines of use, import, and many, many more lines that would have things like tiltontec.webmx.html/dom-ancestor-by-tag?

thheller14:07:13

would be much shorter if you :refer less 😛

👍 2
vemv14:07:33

I have ns forms auto-collapsed so idc about their length :)

vemv14:07:36

between that, consistent naming, and cljr-slash I feel pretty abstracted away from intricacies. I just use aliases directly in place, with no editing of the ns form

kennytilton17:07:30

@U2FRKM4TW No, with CL packages I would define in one http://xyz-package.cl source file a package :xyz setting up all the dependencies, aliases, shadowing, whatever and then just code (in-package :xyz) at the top of source files that rely on those dependencies. dom-ancestor-by-tag or any symbol would not require the package prefix tiltontec.webmx.html unless a symbol were not exported, and then it seems right and proper that I have to advertise my invasion of the package internals. Even then, CL's defpackage supports so-called nicknames, so that would be (wxhtm::internal-use-only 42). @U05224H0W Agreed, it is time I started writing smaller programs. "ns forms auto-collapsed", @U45T93RA6? I like it! But when churning out new code greenfield I find it irritating having to scoop up the ns requirements I need from some existing source. Fun note: many in the CL community are fans of breaking an app up into packages appropriate to different sets of functions, such as for :ui, :i/o, etc. I did that post hoc once to 40kloc, and surfaced some useful refactoring. A month later I reverted to a monolithic package. Famous Lisp saying: "It is probably a package problem. It is always a package problem." I can confirm.

p-himik17:07:29

I see. If that's your cup of tea, you can already do that with e.g. https://github.com/clj-commons/potemkin with the power of its macros. But personally, I dislike it, at least because of the lack of the tooling support. There were some other negative implications described in the relevant Google Group discussion, but my memory fails me here.

dominicm19:07:24

The way dynamics work is a bit funky with potemkin.

dominicm19:07:47

@U0PUGPSFR you could do all of this with a macro of course 🙂

(def ^:private packages (atom {}))
(defmacro in-package [pkg]
  `(do ~@(get packages pkg)))
(defmacro make-package [pkg & requires]
  `(swap! packages assoc ~pkg '~requires))

;; 

(make-package :xyz
  (require '[foo.bar :as baz :refer [glom]]))
(I haven't tested this, but it should be fine 🙂)

dominicm19:07:25

I think it will even work in Cljs since the changes to support macros which expand to require.

didibus20:07:05

You can split your code across multiple files (at least in Clojure, not sure for Cljs) and use (in-ns 'xyz) So just create a xyz namespace where you require and alias everything as you want, and then in different files just use (in-ns 'xyz)at the top and you'll be inside the context of that namespace.

didibus20:07:19

Something else you can do is just create a macro:

(defmacro ns-xyz
  []
  `(ns xyz
     (:require ...)))
And use ns-xyz instead for your ns declaration. You can customize this however you want.

didibus20:07:42

Pretty sure that would work with Cljs as well. In fact a.nice way to do it would be:

(defmacro require-xyz
  []
  `(require '[foo.bar :as b :refer [baz]]))
Since Clojure Script supports multiple requires at the top of a file. So now you can do:
(ns my-file
  (:require ...) ; Require more stuff
  (:require-macros ...) ; Require our macro ns-xyz)
(require-xyz)
So I think this will work in ClojureScript

kennytilton21:07:05

Ha-ha, I considered using a macro but then thought, "Nahhh, Clojure and especially the tooling around it would never allow that." Doh!

didibus04:07:01

Ya, ClojureScript is the only one I'm not 100% sure this would work. Otherwise it should be fine with Cider and Calva or other REPL based tooling. With static analyzers there might be issues, clj-kondo you can teach it about your macro, but others like Cursive I don't think you can, so it might cause issue with that one not knowing about your requires.

didibus06:07:27

I asked on the ClojureScript channel, and people seem to think it should work on Cljs as well

zendevil.eth14:07:53

suppose I have fibonacci defined like so:

(def fib
  (memoize (fn [i]
             (case i
               0 0
               1 1
               (+ (fib (dec i)) (fib (dec (dec i))))))))
I need to only take the fibonacci numbers that are less than y = 10,000. Is there a way that works for a general y without knowing for which x that y exceeds 10,000 for the first time?

Nazral15:07:36

given that you memoize fib, you could simply recur over it until its result is >= y

Andrew Lai17:07:08

Seeking some help understanding of the defprotocol implementation in Clojure. If I create a protocol:

(defprotocol FileSystem
  "Methods for interacting with a FileSystem"
  (ls [_ path] "List contents of a directory at `path`"))
My understanding is that the emitted method ls implements AFunction, and has a .__methodImplCache property that contains a MethodImpleCache mapping between known classes and known implementations of the protocol. If I then create a new implementation and try to execute the protocol method, my current understanding of the code is that it will add the new method to the MethodImplCache's .table property, caching the result and updating the ls function's cache of known ways to dispatch the method. When running this in the REPL, what I'm seeing is that the MethodImplCache isn't actually updated. However, if I re-evaluate the -cache-protocol-fn method, and step through using debug mode (I'm using Emacs/cider) , I actually see the MethodImplCache being updated.
(defrecord S3 []
  FileSystem
  (ls [_ path]
    {:baz :qux}))

(ls (->S3) nil)

;; MethodImplCache is not updated
(seq (.table (.__methodImplCache ls)) 
;; => nil



;; Instrument -cache-protocol-fn for debugging 
(ls (->S3) nil)


;; After instrumenting for debugging - MethodImplCache is now updated
(seq (.table (.__methodImplCache ls)) 
;; => (my-ns.S3
;;     #object[clojure.lang.MethodImplCache$Entry 0x740df35d "clojure.lang.MethodImplCache$Entry@740df35d"]
;;     nil
;;     nil)
Could someone help me understand if (1) I should expect the MethodImplCache to be updated after calling the ls method for the first time using a new class (the S3 defrecord in this case)? (2) Why I get different behavior and the cache IS updated when I instrument the -cache-protocol-fn for debugging?

potetm17:07:15

First off, you are deeeep in the implementation weeds 😄 This might be a better question for #clojure-dev. My understanding is, like you said, when you dynamically add a type to a protocol (e.g. via extend-protocol), it updates the method cache. However the (defrecord … Protocol impls…) syntax is different from other syntaxes. When you use this syntax, you’re actually generating a class that implements a corresponding internal interface generated by defprotocol. Dispatch to those records is implemented by regular JVM dynamic dispatch.

✅ 3
potetm17:07:20

I might be wrong about that, but that’s my understanding.

Ben Sless17:07:16

Small example from malli: -schema? is a protocol method

(defprotocol Schemas
  (-schema? [this])
  (-into-schema? [this]))

(defn schema?
  "Checks if x is a Schema instance"
  [x] (-schema? x))
Decompiles to:
public final class core$schema_QMARK_ extends AFunction
{
    private static Class __cached_class__0;
    public static final Var const__0;

    public static Object invokeStatic(final Object x) {
        if (Util.classOf(x) != core$schema_QMARK_.__cached_class__0) {
            if (x instanceof Schemas) {
                return ((Schemas)x)._schema_QMARK_();
            }
            core$schema_QMARK_.__cached_class__0 = Util.classOf(x);
        }
        return ((IFn)core$schema_QMARK_.const__0.getRawRoot()).invoke(x);
    }

    @Override
    public Object invoke(final Object x) {
        return invokeStatic(x);
    }

    static {
        const__0 = RT.var("malli.core", "-schema?");
    }
}

Ben Sless17:07:06

Regarding what goes on in the protocol definition, macroexpansion tells most of it

potetm17:07:55

hmm… actually yeah. I see what you’re saying now.

potetm18:07:05

It looks like the method cache is only updated after the first invocation: https://github.com/clojure/clojure/blob/master/src/clj/clojure/core_deftype.clj#L587-L626

Ben Sless18:07:51

It's optimistic?

Ben Sless18:07:19

Also looks like any function which wraps a protocol method call caches the first class it's called on

potetm18:07:42

that can’t be true, right?

potetm18:07:22

I don’t read it like that, but I also don’t fully understand what’s going on in this code.

Ben Sless18:07:57

I think the caching is per AFn where the protocol method is called

ghadi18:07:26

AMA about this. The inline cache on Class -> impl uses a packed array, then falls back to a map when it grows beyond a certain size, or cannot pack an array unambiguously

ghadi18:07:54

clojure.core/maybe-min-hash is used to lay out the packed array

ghadi18:07:42

direct extenders of the protocol's backing interface do not go through any of this lookup process at all, just invoke interface

✅ 3
ghadi18:07:12

my talk at the JVM Language Summit covers some of this

Andrew Lai00:07:11

Thank you all for the comments. I'll check out the talk, the Malli example and read through your comments a couple times to see if it sinks in for me!

Andrew Lai02:07:07

After the Malli example and the Clojure Futures talk (both of which were super helpful!), here's my mental model based on the Malli example Ben gave above: Invoking the schema? function (which wraps the -schema? method) will first check if the argument is in the single-item cache attached to the schema? function, __cached___class__0. If the argument's class matches the cached class, then we proceed to get the root of the protocol method var ( -schema? ) and invoke that. If the argument was NOT the cached class, we have two potential cases. The first one is that the argument actually implements the underlying interface (`Schema` in this case). If that's the case, great! Just use the interface. If not, update the cached class, and then invoke the protocol method var. If we're invoking the protocol method var instead of the underlying interface, the protocol method var comes with the MethodImplCache . As we continue to invoke the protocol method with Objects that do not implement the underlying protocol, we will continue to add to the MethodImplCache. What I don't understand: • Based on the decompiled code, it seems like the schema? function which wraps the -schema? method is responsible for determining whether the argument, x implements the underlying Schema interface, or if it does not. Does this imply that all potential consumers of a protocol method are responsible for figuring out whether or not to delegate to the underlying interface implementation, or invoking the protocol method var? So the var containing the protocol method doesn't actually know how to deal with cases where the object implements the underlying interface? • Why is the __cached___class__0 present on the core$schema_QMARK class? I'm struggling to see how that particular cache gets leveraged for performance. Is (x instanceof Schemas) extremely costly and optimizing to avoid it saves a lot of cycles?

ghadi14:07:37

There are two sets of implementations for a protocol. Classes that extend the backing interface (fast path), and classes that don’t, which are looked up in a table. The basic logic is: if target.implements interface call interface target else lookup target class in table but: the calling mechanism remembers the last seen target class when it is a table class, and jumps directly to that impl, instead of checking the interface

Ben Sless15:07:54

Which would avoid a table lookup and can be more JIT friendly Very cool

Russell Mull19:07:01

Do any of the datalog dbs / libs (datomic, datalevin, datascript, crux, etc) support any kind of "virtual table" mechanism, as in postgres and sqlite? I would like to be able to incorporate information outside of the database into my datalog queries.

Russell Mull20:07:02

This appears to broadly do the same kind of thing, but (iiuc) in an eager way. I'm looking for some kind of functional interface, where the query engine is able to push down part of the query to the external datasource.

Russell Mull20:07:48

I wonder if I can approximate it using rules and some functions.

quoll21:07:05

Oh, so THAT’S what that’s called! 🙂

👍 2
quoll21:07:36

I implemented them in Asami, and they’re incredibly useful, but I didn’t know the name 🙂

quoll21:07:19

They also let you feed the results of one query into another query, giving you a “subquery” mechanism. But unlike some approaches to subqueries, they’re much more efficient, since they just join into the bindings as if they’re part of the current query already

Joshua Suskalo21:07:53

Oh that's cool. I want to figure out how to do that effectively.

quoll21:07:08

I’m also hoping to turn the idea of threading queries into a query syntax (it won’t be compatible with other databases, but :woman-shrugging:) https://github.com/threatgrid/asami/issues/147

Joshua Suskalo21:07:38

Thanks for the link!

👍 2
quoll21:07:28

Oh, I just realized that this is on #clojure. There’s a #datalog channel where this will be more appropriate

Joshua Suskalo21:07:35

It'd be really nice if there was a facility in the various other databases to allow "transactional reads" where more than one query is run at the same time.

Joshua Suskalo21:07:00

Well this is a thread that's not getting sent to the main channel, and the base message is about datalog, so this is probably fine.

Joshua Suskalo21:07:53

Or rather not that multiple queries are run at the same time, but that multiple queries are run without applying new transactions to the second query.

quoll21:07:54

Why can’t you run more than one query at a time? The database is a single immutable value (a snapshot in time).

quoll21:07:27

Sorry, I don’t follow what you’re saying here?

Joshua Suskalo21:07:00

Maybe I'm misunderstanding something, but in datahike for example (with the new read-only peers) the backend might be across the network, and both queries might require a round-trip, so unless I'm missing something about how the facilities of calling the db function works, the two queries might have a different set of transactions having been "resolved" to the datastore.

Joshua Suskalo21:07:39

Unless calling the db function (or doing a deref in datahike) "freezes" it to the current max transaction id.

quoll21:07:36

I would need to look at that API to know for sure, but yes, that’s what the db function is supposed to do

Joshua Suskalo21:07:02

So this should be fine then

quoll21:07:45

It returns the current “value” of the database at that point in time. If you do queries against it, then they will be consistent. You also won’t see any new data that is inserted until you ask for a new db

quoll21:07:52

I see lots of people doing queries like:

(q '[:find ..... ] (db my-connection))
Because they want the latest version of the database. That’s often going to be OK, especially if you’re the only process accessing it, but it’s a bad habit. If you do that with multiple queries in a row then you can get inconsistent data coming back between the queries

quoll21:07:48

I mention it in case that’s the sort of thing you may have seen

Joshua Suskalo22:07:29

Makes a lot of sense.

Joshua Suskalo22:07:35

I have seen that quite often.

Joshua Suskalo22:07:32

Seems like a reasonable usecase for something like (or *db* (db *connection*)) .

quoll22:07:53

My colleagues have been doing it lately, but I haven’t complained because they’re in ClojureScript (hence, a write can’t happen in between), but yes, I’m deeply uncomfortable with this

quoll22:07:46

Much better to use

(let [the-db (db my-connection)
      result1 (q '[:find ....] the-db)
      result2 (q '[:find ....] the-db)]
  ...)

Joshua Suskalo22:07:36

Right, the dynvar approach is only really useful in cases where you don't want to pass things between multiple layers, which is an API choice for sure.

alpox21:07:58

Hi all! Short question: Does anyone know a good simple alternative to quartz based libraries for cron-scheduling?

JoshLemer22:07:08

I have enjoyed using this java library, it's extremely simple and just needs you to create 1 table in an RDBMS: https://github.com/kagkarlsson/db-scheduler

alpox22:07:45

@UEH6VEQQJ Sadly I have no database available in this application

JoshLemer22:07:38

No database at all? So we're just talking in-memory? What about java's native TimerTask et al?

JoshLemer22:07:11

@U45T93RA6 thanks for the link, interesting perspective

alpox22:07:46

It is an interesting perspective - however I believe that only applies for the phrases like "Please now in xyz time do something". I'm more aiming for "At midnight do this please". For now without much safety - later I'm considering to store execution state (success/error) in a safe kv-store such as etcd to ensure no execution gets lost. @UEH6VEQQJ Thanks for the link! TimerTask might help me out to get more simplicity out of this. Nevertheless I'm not sure if it is helpful for "crontab"-like cases as "At one-o-clock every day" especially if the process (hopefully not) sometimes dies and has to restart.

👍 2
alpox22:07:00

I just found cron4j - I guess the duo of TimerTask and cron4j would cover me pretty well - TimerTask for short re-occurring timers for update-checks etc. and cron4j for the "At midnight" case. Thanks for the inputs!

rutledgepaulv11:07:25

I wrote this several years ago and would almost certainly change and decouple some things now but it has forward and backward infinite cron sequences: https://github.com/vodori/chronology