Fork me on GitHub
#clara
<
2019-01-10
>
mauricio.szabo15:01:40

Does clara supports creation of a session and firing rules in different threads? I'm having an exception trying to do it:

Exception in thread "async-dispatch-24" java.lang.IllegalArgumentException: find not supported on type: clara.rules.engine.Token$reify__26254

mikerod15:01:28

@mauricio.szabo I’d expect it to be ok. Can I see (1) more stack trace and (2) an (perhaps simplified) example of how you are setting up for this?

mauricio.szabo16:01:57

Ok, I'll run some code here and then give you the results

wparker17:01:33

Are you creating more than one session at a time? The compiler uses eval calls internally, and the Clojure compiler itself isn’t thread-safe.

mikerod17:01:48

yeah, this is why I’m looking for some example of how it is being tried

mauricio.szabo18:01:04

@ I've tried creating more than one, and don't creating it, and they both gave me the same error

mikerod18:01:38

@mauricio.szabo can you give at least an outline of hwo you are doing this

mikerod18:01:55

like how you create the session, and then how you distribute it to other threads

mikerod18:01:14

I’m guessing the other threads have the same classloaders too?

mikerod18:01:22

an example would make that easier to track through

mauricio.szabo18:01:04

Okay, just give me some minutes 🙂

mikerod18:01:49

oh, no problem. not time sensitive (for me at least) hah

mauricio.szabo18:01:40

@ I'm running the code in the following way:

(let [session (-> (mk-session 'events-reports.reports.deferred
                              'events-reports.reports.normalize-data
                              :fact-type-fn :fact))
      res (jdbc/query db "SELECT * FROM my_table WHERE ...")
      location-ids (map :id res)]

  (doseq [id location-ids]
    (async/go
     (println "Emitting for" id)
     (jdbc/with-db-connection [db @smart/db]
        (let [facts (search-facts db id)
              rows (-> session
                       (insert-all facts)
                       (fire-rules)
                       (query detailed-anual-report)
                       (->> (map make-detailed-anual-row)))]
         (println "Saving for" id)
         (save-rows rows))))))

mauricio.szabo18:01:53

(it's a simplified version. search-facts is simply a bunch of JDBC queries)

mauricio.szabo19:01:54

Exception in thread "async-dispatch-4" java.lang.IllegalArgumentException: find not supported on type: clara.rules.engine.Token$reify__26254
        at clojure.lang.RT.find(RT.java:863)
        at clojure.core$select_keys.invokeStatic(core.clj:1539)
        at clojure.core$select_keys.invoke(core.clj:1532)
        at clara.rules.engine$propagate_items_to_nodes$fn__26782.invoke(engine.cljc:154)
        at clara.rules.platform$group_by_seq$fn__26030.invoke(platform.cljc:39)
        at clojure.core.protocols$naive_seq_reduce.invokeStatic(protocols.clj:62)
        at clojure.core.protocols$interface_or_naive_reduce.invokeStatic(protocols.clj:72)
        at clojure.core.protocols$fn__7852.invokeStatic(protocols.clj:169)
        at clojure.core.protocols$fn__7852.invoke(protocols.clj:124)
        at clojure.core.protocols$fn__7807$G__7802__7816.invoke(protocols.clj:19)
        at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
        at clojure.core.protocols$fn__7835.invokeStatic(protocols.clj:75)
        at clojure.core.protocols$fn__7835.invoke(protocols.clj:75)
        at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13)
        at clojure.core$reduce.invokeStatic(core.clj:6748)
        at clojure.core$reduce.invoke(core.clj:6730)
        at clara.rules.platform$group_by_seq.invokeStatic(platform.cljc:38)
        at clara.rules.platform$group_by_seq.invoke(platform.cljc:27)
        at clara.rules.engine$propagate_items_to_nodes.invokeStatic(engine.cljc:154)
        at clara.rules.engine$propagate_items_to_nodes.invoke(engine.cljc:147)
        at clara.rules.engine.LocalTransport.send_elements(engine.cljc:176)
        at clara.rules.engine.AlphaNode.alpha_activate(engine.cljc:538)
        at clara.rules.engine$flush_updates$flush_all__26940.invoke(engine.cljc:269)
        at clara.rules.engine$flush_updates.invokeStatic(engine.cljc:277)
        at clara.rules.engine$flush_updates.invoke(engine.cljc:252)
        at clara.rules.engine$fire_rules_STAR_.invokeStatic(engine.cljc:1825)
        at clara.rules.engine$fire_rules_STAR_.invoke(engine.cljc:1720)
        at clara.rules.engine.LocalSession.fire_rules(engine.cljc:1909)
        at clara.rules$fire_rules.invokeStatic(rules.cljc:44)
        at clara.rules$fire_rules.invoke(rules.cljc:29)

mauricio.szabo19:01:03

(More stacktraces :))

mikerod19:01:27

thanks for the details

mikerod19:01:37

nothing immediately is striking me as odd there

mikerod19:01:18

looking some at what could be oging on

mikerod19:01:21

This type clara.rules.engine.Token$reify__26254 is weird to me

mikerod19:01:30

I’m not aware of any reify going on in relation to tokens

mikerod19:01:54

which, I’d think that’s what produces a name like that

mikerod19:01:02

actually, must be thinking about that wrong, since you can’t reify a clara.rules.engine.Token, it’s a record

mikerod19:01:19

that is just relating to a reify happening from within the clj record impl I see - disregard those comments then,

mauricio.szabo19:01:11

Do you want me to open an issue?

mikerod19:01:30

I suppose we can for tracking

mikerod19:01:47

I think I’ve recreated it to some minimal degree as far as what could cause that type of exception

mikerod19:01:02

(find (.getLookupThunk (map->Token {}) :bindings) :x)

mikerod19:01:21

I believe the getLookupThunk can happen via clj keyword lookup compile-time inlining stuff

mikerod19:01:56

so this ends up coming from a call (:bindings <token-obj>) in the fn propagate-items-to-nodes

mikerod19:01:36

Also, can you give me your clj/java/core.async versions?

mauricio.szabo20:01:07

Clojure version: Clojure 1.9.0 JVM version: Eclipse OpenJ9 VM 1.8.0_181-b13 Core.Async version: 0.4.490

mauricio.szabo20:01:15

Also, I've opened an issue

mikerod20:01:07

thanks, I seeit

mikerod20:01:57

this problem is weird. asked a bit on #clojure-dev to see if there were any thoughts on how you could end up with a keyword callsite messing up like this. The only possible thing that comes to mind so far is just making sure that you aren’t somehow requiring clara.rules.engine (or other ns’s) multiple times

mikerod20:01:15

so somehow getting different versions of the same classes loaded into successive clj dynamic class loaders

mikerod20:01:56

It sort of looks to me now that you may have clojure.rules.engine being loaded more than once, and causing a duplicate class of clojure.rules.engine.Token being created.

eraserhd15:01:23

Hmm, this seems to say that we have retractions, right? (We shouldn’t.):

ethanc15:01:22

I think I have seen this behavior before, but i need to verify my suspicions before I could say for sure.

eraserhd15:01:52

We are currently investigating. What should we look for?

ethanc15:01:57

If it is what i'm thinking, its a scenario where facts are inserted but have no corresponding nodes to activate. Thus meaning all work in prior rules was "unnecessary" and can be retracted. It sounds weird making this comment, but i think it relates to keeping the memory as low as possible... Again, i need to verify this and i am writing a simple set of rules to test it

eraserhd15:01:18

Ahhh, interesting. We probably do have a lot of facts that aren't used by rules.

eraserhd16:01:07

we've verified that we're not intentionally retracting any datoms in this batch.

eraserhd16:01:27

scratch that, no we're seeing something weird

mikerod16:01:53

alpha-retract happens from the truth maintenance as well.

mikerod16:01:26

so using logical insert (the default) can/will lead to them as the TMS keeps the working state’s “truth” consistent

mikerod16:01:32

Seeing your screenshot there though - relative to the other times I see - it looks like a lot of time is being spent in equality comparisons.

mikerod16:01:07

However, the memory updating functions comparing tokens and “remove first of each” do not seem to be as high.

mikerod16:01:33

I’d have to expand on the callstack trees of these a bit more perhaps to really know what’s going on there though

eraserhd16:01:13

@ we made a mistake in a parameter in the call we are timing. After fixing, we don't see a high alpha-retract time.

mauricio.szabo16:01:41

@ what are you using to profile Clara? I've tried to use VisualVM but my code hang even with small group of facts...

eraserhd16:01:16

I'm using the CPU sampler in VisualVM. I've never successfully used the profiler.

eraserhd16:01:45

The profiler always hangs for me no matter what I try.

mikerod16:01:25

Yeah, VisualVM CPU sampler is typically pretty useful

mikerod16:01:47

I don’t try to use the more invasive profiler there - I don’t even know that I’d call that “recommended”

mikerod16:01:25

@ the example you showed did not look to have a high time spent in alpha-retract, so confused why you considered it a problem there

mikerod16:01:36

Or perhaps your screenshot was just not indicative of what you are looking at.

mikerod16:01:38

or maybe I was reading this wrong. I tend to look at the callstack based view - so nvm I think

ethanc16:01:08

yeah, @ thats seems to be a hotspot view

mikerod16:01:12

yep, just noticed. Harder to read that to me. but Yeah, that means it is the “entry point” to a lot of the stuff taking the higher self-times below

eraserhd16:01:27

Let me post a corrected screenshot...

mikerod16:01:29

I prefer the other view to see a direct hierarchical/callstack visual on it

mikerod16:01:19

just easier to pinpoint the more specific path to trace through

ethanc17:01:55

To my original comment, the scenario i was remembering was something like:

(defrule simple-rule
 [A (= ?f f)]
 [?bs <- (acc/all) :from [B (= f ?f)]] 
 =>
(insert! (->C)))
This does cause an retract on the production node, but thats due to new facts being add. So not as i remembered.

mikerod16:01:09

@ It’s a bit cutoff and always a challenge to really know. From this, it seems to suggest that you have a rule with a negation - :not that is initially true due to no matches at some point in fire-rules. Later, it gets matches so becomes false. Due to order of rule firing, when it is becomes false, there was already staged activations of the rule from when it was true that need to be discarded now that that is no longer the case - via the truth maintenance

mikerod16:01:36

The activations being discarded may have large tokens as one possibility - perhaps there is a large data gathered via an accumulator in this rule?

mikerod16:01:14

Alternatively, I may have to see more stack to know (or can’t really know), the rule had a lot of combinations of matches resulting in many activations that now all have to be discarded

eraserhd16:01:41

OK, my reading also. We're going to try to isolate which rule(s) cause it.

mikerod16:01:42

Maybe you can find one of that structure. The negation May be your actual last condition of the role, although that isn’t strictly true in all cases I believe.

mikerod16:01:25

May be able to do some listener tracing to find more details too. I’d have to play around to come up with a useful example of how to use it in this sort of case.

wparker17:01:49

Regarding tracing - I wrote some helpers at one point for this that I really need to clean up some and push to master - but basically you get an entry in a list for every interaction in the rules engine, most of which have a node ID attached. You can then use that ID to look up the node in the session and see what rules/queries it corresponds to. So if you see an ID come up a lot those rules are doing a lot of work. Since it is ordered, you can investigate in more complex ways, but often a “lots of stuff is happening in this rule” can be enough to narrow things down enough

mikerod17:01:30

would be good to get some doc examples of this sort of thing