Fork me on GitHub
#clara
<
2018-10-31
>
eraserhd18:10:09

Hi! So we're debugging a performance issue, and it seems like, in our example, fast-token-compare taskes 142 seconds of self time, while PriorityQueue.remove() takes 139 seconds of self time.

eraserhd18:10:41

It's not clear from VisualVM's sampling, but it looks like this referring to the PriorityQueue.remove(Object) method, which is consistent with the fast-token-compare times.

eraserhd18:10:39

This method is linear for priority queue. It should be possible to wrap PriorityQueue to make it log(n) by checking when values are extracted. Is there some reason this won't work?

eraserhd18:10:47

Alternately, are we doing something wrong to get runtime dominated by PriorityQueue.remove(Object)?

mikerod19:10:01

@eraserhd > Alternately, are we doing something wrong to get runtime dominated by PriorityQueue.remove(Object)? Seems like there is a lot of thrash of rules getting activated, but then subsequently “deactivated”

mikerod19:10:27

> This method is linear for priority queue. It should be possible to wrap PriorityQueue to make it log(n) by checking when values are extracted. Is there some reason this won’t work? I don’t understand what this is and how it’d make it log(n).

mikerod19:10:02

and lastly, fast-token-compare may be slowing down due to large tokens in these rules that are struggling with if they are satisfied/not satisfied

mikerod19:10:35

perhaps from large data accumulated via something like (acc/all)’s or similar

eraserhd19:10:48

@mikerod if remove/1 stores objects in a hash set, then remove/0 and add/1 skip removed tokens, remove/1 becomes O(1) and I think remove/0 is still O(log N)

mikerod19:10:08

We mostly avoid hashing objects

eraserhd19:10:12

Dijkstra's algorithm is frequently implemented like this, because removing from the priority queue is expensive.

mikerod19:10:35

because then you hit perf barriers with slow hash code impl’s

mikerod19:10:49

and often that comes down to the facts being used in the session

mikerod19:10:16

in the particular case where the prio queue is used, the object to remove is a new instance too, so wouldn’t be able to have a reusable cached hashcode

mikerod19:10:42

not to say it isn’t worth consideration at any time, just saying hashing isn’t always as cheap as it seems

mikerod19:10:03

fast-token-compare avoids deep equals most of the time too

mikerod19:10:19

it uses identity-based equivalence as a first-pass, so via identical? in clj semantics

mikerod19:10:30

and typically that is all that is needed to find things

mikerod19:10:38

during the fire-rules loop at least

mikerod19:10:52

I’ve seen this problem of large tokens being slow though

mikerod19:10:29

the activation stuff is what uses the prio queue, it has to do with how when rules are satisfied, they are “staged” as available to have their RHS action executed at some future time

mikerod19:10:54

it is a prio queue since you can have precedence on which rules should be staged ahead of one another - e.g. :salience

mikerod19:10:06

I think there are some related issues out there (though not explained the same) https://github.com/cerner/clara-rules/issues/385

mikerod19:10:56

well, that’s the only one I see at the moment

mikerod19:10:14

I know I’ve messed around with this sort of problem before though, not sure I ever had a good answer

mikerod19:10:26

sometimes you could just avoid a rule setup that is causing it though

eraserhd19:10:36

Those stats look very much like ours.

mikerod19:10:46

your hashing thing could be attempted at least, just have to consider a lot of things if it ends up looking like a good path

mikerod19:10:06

the issue above talks about a “distinct accumulator”, I think that may be overly specific

mikerod19:10:10

probably a more general problem

mikerod19:10:15

and I’m fairly sure you have big tokens involved

mikerod19:10:47

a big token would come from a rule left-hand side (LHS) that had a lot of facts associated with a single activation of the rule

mikerod19:10:03

Take for example something like

(r/defrule big-activation-maker
  [?xs <- (acc/all) :from [X]]
  =>
  (do-stuff ?xs))

mikerod19:10:14

now if you can get into a situation where the set of all X changes a lot

mikerod19:10:26

and there are a lot of Xs in working memory overall

mikerod19:10:56

then this rule may keep activating/cancelling/re-activating, as the set of X grows - and each time there is a big token comparison to do these removals

mikerod19:10:07

in a rule as simple as above, it typically wouldn’t thrash though

mikerod19:10:36

Most of the time, the network’s “batched propagation” semantics would lead you to a situation where all the X are around in one go with only 1 rule activation resulting

mikerod19:10:46

but certain setups can cause thrashing when more conditions and things are involved

mikerod19:10:04

perhaps the distinct accumulator case was a situations like this one, I just don’t remember

eraserhd19:10:29

This sounds like there is one thing we can write as a custom accumulator to avoid retriggering a whole slew of rules.

ethanc19:10:26

or play with salience, though as a last resort

mikerod19:10:50

yeah, sometimes that’s useful

mikerod19:10:09

sometimes you can look at the rules and sort of see where an accumulator rule may be getting retriggered in multiple phases of insert

mikerod19:10:37

and try to prevent that, there is salience to help, as well as just potentially logical structuring