Fork me on GitHub

I have a singleton fact that I insert for referencing that contains a large map (hundreds of KB), and is a part of the LHS of most of my rules, so that the RHS can use the map when inserting more rules. Are there any performance issues to consider here? Or ways to exploit the fact that it is a singleton and will never need to be compared between multiple instances? I know with persistent structures it isn't inefficient to have the large map in multiple places, but was wondering last night if perhaps there was a better way to reference it. The LHS query for it is dead simple: [Tree (= ?root root)] in all cases


@dadair I don’t know of anything particularly harmful in the big fact like that


It’s probably best to make sure it has a “type” that is distinct from anything else


beyond that, assuming it is immutable (as facts should be), a cached hashcode may become important


Clojure persistent maps cache hashcode


You could also consider wrapping the big map in a custom type that optimizes equality checking and hashing too if you had noticeable perf issues come up


You’re example does look like you have a type on it called Tree though, so I am not sure what implications that type has


Tree is just a record wrapper around a map so it works with the default Fact type identification


For clarity: There are basically 2 situations that come to mind in this scenario: 1) How often is this big fact going to be compared to other different facts? 2) This big fact will be part of a token that is stored in memory. This token is sometimes used as a hash map sort of key. If this is hashed too often, the hash code may become a bottleneck. For (1) If the fact has a unique type that no other facts have, it should only ever be compared to maybe itself. The equals check should be fast tehre since most equals impl’s will short-circuit on the identical? (or Java == case) For (2) Clojure maps default to caching hash codes. Clojure records do not, prior to what looks like 1.9 (not yet fully released at least (see


So if anything, if your Tree wrapper is made via defrecord, you may be able to see some time spent recalculating the hash code when if this fact ends up being involved in a lot of rules LHS “tokens”


Tokens contain a sequence of all the facts that matched for a particular successful invocation of the rule


Then again, I may not get worked up about this situation unless you can see measurable issues with perf


Thanks for the advice, I'm just trying to keep some performance considerations in my back pocket in case I need to improve performance. This Tree is somewhat like a database, so it will continue to grow over time. I'm already seeing processing times approach >= 100ms for the rule engine and I generally want things to be below ~50ms for some soft real-time constraints we have


I typically use VisualVM


start the “sampling CPU” profiler


to clarify: the tree does not change within a rule session, only between rule sessions


then run whatever takes the tiem I’m interested in


then you take a “snapshot”, view the snapshot by drilling down through a fairly large callstack


it has visuals to show how much time is being spent where in the callstacks


For this particular concern I had above, I’d just try to get to the “bottom” of the larger time callstacks and see if I can find references to, in this example, the Tree object getting equiv or hash sort of calls on it


great I'll give that a try, thanks


no problem. I’m always willing to take a look at a profiler snapshot, a portion of it, or just say “I see lots of time in Clara function X, any ideas?” too if you find those parts to be hard to sort through. I’m not sure how easy it may be to read profiler snapshots sometimes if not familiar with the codebase and/or the way clj compiles out java class names