Fork me on GitHub
#clara
<
2017-09-12
>
dadair14:09:01

I have a singleton fact that I insert for referencing that contains a large map (hundreds of KB), and is a part of the LHS of most of my rules, so that the RHS can use the map when inserting more rules. Are there any performance issues to consider here? Or ways to exploit the fact that it is a singleton and will never need to be compared between multiple instances? I know with persistent structures it isn't inefficient to have the large map in multiple places, but was wondering last night if perhaps there was a better way to reference it. The LHS query for it is dead simple: [Tree (= ?root root)] in all cases

mikerod16:09:54

@dadair I don’t know of anything particularly harmful in the big fact like that

mikerod16:09:10

It’s probably best to make sure it has a “type” that is distinct from anything else

mikerod16:09:54

beyond that, assuming it is immutable (as facts should be), a cached hashcode may become important

mikerod16:09:20

Clojure persistent maps cache hashcode

mikerod16:09:07

You could also consider wrapping the big map in a custom type that optimizes equality checking and hashing too if you had noticeable perf issues come up

mikerod16:09:29

You’re example does look like you have a type on it called Tree though, so I am not sure what implications that type has

dadair16:09:50

Tree is just a record wrapper around a map so it works with the default Fact type identification

mikerod16:09:19

For clarity: There are basically 2 situations that come to mind in this scenario: 1) How often is this big fact going to be compared to other different facts? 2) This big fact will be part of a token that is stored in memory. This token is sometimes used as a hash map sort of key. If this is hashed too often, the hash code may become a bottleneck. For (1) If the fact has a unique type that no other facts have, it should only ever be compared to maybe itself. The equals check should be fast tehre since most equals impl’s will short-circuit on the identical? (or Java == case) For (2) Clojure maps default to caching hash codes. Clojure records do not, prior to what looks like 1.9 (not yet fully released at least (see https://dev.clojure.org/jira/browse/CLJ-1224))

mikerod16:09:07

So if anything, if your Tree wrapper is made via defrecord, you may be able to see some time spent recalculating the hash code when if this fact ends up being involved in a lot of rules LHS “tokens”

mikerod16:09:30

Tokens contain a sequence of all the facts that matched for a particular successful invocation of the rule

mikerod17:09:15

Then again, I may not get worked up about this situation unless you can see measurable issues with perf

dadair17:09:58

Thanks for the advice, I'm just trying to keep some performance considerations in my back pocket in case I need to improve performance. This Tree is somewhat like a database, so it will continue to grow over time. I'm already seeing processing times approach >= 100ms for the rule engine and I generally want things to be below ~50ms for some soft real-time constraints we have

mikerod17:09:28

I typically use VisualVM

mikerod17:09:33

start the “sampling CPU” profiler

dadair17:09:37

to clarify: the tree does not change within a rule session, only between rule sessions

mikerod17:09:40

then run whatever takes the tiem I’m interested in

mikerod17:09:00

then you take a “snapshot”, view the snapshot by drilling down through a fairly large callstack

mikerod17:09:09

it has visuals to show how much time is being spent where in the callstacks

mikerod17:09:00

For this particular concern I had above, I’d just try to get to the “bottom” of the larger time callstacks and see if I can find references to, in this example, the Tree object getting equiv or hash sort of calls on it

dadair17:09:50

great I'll give that a try, thanks

mikerod17:09:20

no problem. I’m always willing to take a look at a profiler snapshot, a portion of it, or just say “I see lots of time in Clara function X, any ideas?” too if you find those parts to be hard to sort through. I’m not sure how easy it may be to read profiler snapshots sometimes if not familiar with the codebase and/or the way clj compiles out java class names