Fork me on GitHub
#clara
<
2017-10-06
>
zylox13:10:09

.customer is a method isnt it?

zylox13:10:29

have you tried (= 1 (.customer this))

zylox13:10:59

just tried it with your code and it worked for me

huwigs13:10:07

I have a question about accumulators — is there a reason to do aggregation in a clara accumulator vs. in a RHS of something that uses all to get the values?

huwigs13:10:02

use case is a value computed from each fact that is then averaged

afurmanov14:10:05

Thanks @zylox - that is indeed working!

mikerod14:10:46

@steve313 Technically there may be performance advantages in that you are allowing the engine to have potential optimizations to avoid doing unnecessary work. However, some of that would relate to how performant the accumulator implementation is.

mikerod14:10:04

Also, you keep things a bit more structured and declarative to pull as much out of the RHS as you can

mikerod14:10:32

However, there are times when the RHS may be justified just due to it being too complicated to try to fit into an accumulator. There is somewhat of a judgement call at times.

mikerod14:10:56

It’s a good question for sure. I’ve seen it come up fairly often.

huwigs14:10:00

I am still early in the learning curve; think I figured out how to do the LHS relatively simply

mikerod14:10:22

that’s good

mikerod14:10:04

One pattern that is easy to fall into, when new to trying to implement logic via rules, is to have a tendency to put a lot of code in the RHS

mikerod14:10:57

which sort of defeats the purpose. Accumulators are a bit of a more complex case, but in general it is often beneficial to try to keep the RHS as minimal as possible. Typically just “insert some facts”

huwigs15:10:43

I have pretty large tables of values… do people find that it’s often necessary to do some “fact” lookups outside of the engine?

zylox16:10:30

Not really no, but im not 100% sure what you mean.

huwigs16:10:08

Well, there are facts related to combinations of values (i.e. if a person is female, 18, and single, the value of some new fact should be X, but if they are male, 29, and married, the value of some new fact should be Y) — seems like I could use unification and declare each one of those mappings to be facts (->MyValue :female 18 :single X), (->MyValue :male 29 :married Y)

huwigs16:10:37

or i could just look up those values in an RHS that takes a Person fact

dadair16:10:59

Couldn't you use a DSL to parse your table into rules?

dadair16:10:45

Or you could have condition facts

huwigs16:10:46

I’m sure I could generate rules from the table values but I’m not sure if that is a good approach

huwigs16:10:57

some tables have thousands or tens of thousands of rows

mikerod17:10:33

@steve313 I am trying to understand the situation still. It isn’t clear to me what a row of this table represents or what information it has

mikerod17:10:37

any dummy example?

huwigs17:10:21

The table itself is structured like (k1, k2, k3, v1, v2, v3, v4, v5)… imagine something like that

huwigs17:10:27

er, like this:

huwigs17:10:52

For many many permutations of k1,k2,k3

huwigs17:10:43

not sure if that snippet came through? slack saying i ran out of space

huwigs17:10:54

k1  k2  k3  v1  v2  v3  v4  v5  v6  v7  v8
17  F   S   2.1647  3.8128  2.9743  1.4490  2.7630  0.3837  3.4900  1.1400  3.1700
18  F   S   2.8955  3.4350  1.4777  1.4490  2.4809  0.9656  2.9640  1.1058  2.6980
19  F   S   2.2341  2.7573  1.3664  1.4490  2.2500  0.8800  2.5947  1.1058  2.3622
20  F   S   1.8131  2.1821  1.9299  1.4490  1.7000  0.8900  2.2692  1.0944  2.0646
21  F   S   1.7448  2.0645  1.9294  1.4490  1.6500  0.8800  2.2176  1.1286  2.0196

huwigs17:10:41

now, in the rules engine we have facts that can provide (k1, k2, k3) values and want to know what a v1 value for that combination is

huwigs17:10:21

right now i just looked it up in a map i read from the file in an rhs

mikerod17:10:25

ok, that does make the idea clearer to me to think about

huwigs17:10:27

but in theory those could all be facts

mikerod17:10:36

You could put the table as a fact

mikerod17:10:49

It’s still a bit odd since the table would be big of course

mikerod17:10:38

However, I’m not too sure it would even be an issue to just put each row in as a fact

mikerod17:10:50

and then do a join between Person to one of those rows based on the k-criteria

mikerod17:10:57

From the looks of it, these would be = based joins

mikerod17:10:07

Which are going to be the most efficient type in the engine too

huwigs17:10:33

Well, I am ignoring the <lowest and >highest for the first column

mikerod17:10:49

Clara will automatically hash on binding keys and use that hash-based lookup for equal-based joins

huwigs17:10:03

there are values that aren’t strict equality based for that. I can probably hack it so the engine would only use equality based

mikerod17:10:34

So if the majority are equal based you may be fine still. Not completely sure though.

huwigs17:10:46

yeah the comparison ones are edge cases

mikerod17:10:46

However, if it is like 10K sort of facts doing < comparisons

mikerod17:10:55

I’m not convinced you’d see much perf issues anwyasy

mikerod17:10:32

It may be a good idea to use a different fact type

mikerod17:10:37

for the ones that aren’t equal based

mikerod17:10:45

may help partition up the network

mikerod17:10:12

some of this is just high-level talk to. so may be too vague for you to go off of

mikerod18:10:06

but at a really high level

mikerod18:10:14

say you have 10K people - call that P

mikerod18:10:26

and 100K rows that are eq join capable - call that E

mikerod18:10:44

Then only 100 rows that are more expensive than eq - call it F

mikerod18:10:11

You’ll have the P joining to E part will all be done with hash-based lookups/joins

mikerod18:10:26

so you’ll just have P x F comparisons on the non-eq, “more expensive” join tests

mikerod18:10:38

So the P, E factor is “constant”

mikerod18:10:06

But the non-eq-based table rows I’d put under a differnet fact type since that would ensure partitioning into groups like this

mikerod18:10:58

ignore my “more expensive” phrase

huwigs18:10:03

so if we have a fact type for the E and a fact type for the F cases

mikerod18:10:06

not that relevant, it’s more about the multiplier

huwigs18:10:19

would there be two rules, one joining PxE and one joining PxF ?

huwigs18:10:32

or use :or in one rule?

mikerod18:10:38

I’d probably do 2 rules

mikerod18:10:54

:or may be able to get you equivalent, but it may just be less readable

mikerod18:10:01

or more confusing

mikerod18:10:09

:or is for the most part “syntax sugar”

mikerod18:10:17

it isn’t short-circuiting

mikerod18:10:28

it splits at that point and all the data flow goes down both “branches” of logic

mikerod18:10:09

the only reason I’d use :or in a case like this is something like

mikerod18:10:28

(r/defrule tester
  [A (= ?id id)]
  [B (= ?id id)]
  [:or
   <your P,E and P,F joins here using the result from A,B above>]
=>
<etc>)
   

mikerod18:10:39

Meaning if you had some more conditions needed by both joins

mikerod18:10:06

Using :or above makes it so the network will share the work done in the prior conditions in the rule before the :or. So “node sharing”, as it is called in Rete terminology.

mikerod18:10:35

If you do it in separate rules, there still may be node sharing though. Clara tries to detect duplicate conditions.

huwigs18:10:52

It probably reads better as multiple rules almost always?

mikerod18:10:53

Just pointing out times when :or may be nice. It also lets you not write stuff twice.

mikerod18:10:06

I tend to say avoid :or unless you have noticeable/annoying duplication

mikerod18:10:25

of previous conditions before the :or branching

mikerod18:10:22

and back on your topic

mikerod18:10:32

if you have 10-100K rows in a table, I may think inserting all these facts is fine

mikerod18:10:46

if it were like 1 million rows, then maybe be a bit more concerned

mikerod18:10:26

and I’d still probably try it and profile/ take note of the runtime degradation before trying a non-fact based approach

mikerod18:10:00

Also, it’s good to be aware of your “hot spots”. A rule like

(r/defrule joining
  [P (= ?age age)]
  [F (< ?age (:threshold this))]
=>
<etc>)
Is going to evaluate (< ?age (:threshold this)) (number of P) X (number of F) times

mikerod18:10:22

so if that is a big number, then it becomes a hot spot and you may end up just being able to improve whatever test is done tehre

huwigs18:10:31

OK. So if many of these facts would be reused from case to case, is the right thing to do to load the standard facts (from the tables above), keep around a binding to the initial session, and then for each subsequent set of queries, reuse the initial session?

mikerod18:10:40

Or eliminate P facts that you know won’t match F (because they match something else with eq already perhaps).

huwigs18:10:52

assuming that works on account of persistent data structures etc., just don’t want to load stuff from files many times

mikerod18:10:10

I think what you said makes sense

mikerod18:10:15

as long as your rules aren’t changing

mikerod18:10:36

you can reuse an “initially loaded” working memory state for those rules for different runs of new facts

mikerod18:10:54

definitely a big win of Clara’s persistent, immutable working memory approach

souenzzo20:10:44

There is some way to dedup on insert? Example, if I insert (insert! ^{:type :foo}{:foo 33}) and then (insert! ^{:type :foo}{:foo 33}), then querty all :foo get just one?

souenzzo20:10:51

Yes, I know that (= ^{:type :foo}{} ^{:type :bar}{}) ;=> true

wparker21:10:00

@souenzzo if you’re using queries, you could put an accumulator or an :exists on your query conditions to remove duplication in your results

souenzzo21:10:36

It's not about query... It's about session Duplicated facts are fire'ing rules in a loop..

mikerod21:10:37

@souenzzo one approach I’ve done

mikerod21:10:43

introduce an “intermediate fact”

mikerod21:10:47

that can be duplicated

mikerod21:10:01

then have a rule that uses an accumulator acc/all on those possible duplicates

mikerod21:10:15

and inserts only one in the presence of any number of them

souenzzo21:10:45

Got it. I will try;

mikerod21:10:49

Assuming you want to go from X facts to B facts, but do not want duplicates of B “downstream”:

(r/defrule a-rule
  [X (= ?v v)]
  =>
  (r/insert! (->Intermediate ?v)))

(r/defrule aggregate-rule
  [?all <- (acc/all) :from [Intermediate]]
  [:test (seq ?all)]
  =>
  ;; Assume everything in `?all` is `=` otherwise do some merge, etc
  (r/insert! (map->B (first ?all))))

(r/defrule b-rule
  [B]
  =>
  <etc>)

souenzzo12:10:44

This method is not working because I need acc both Intermediate and 'B`, to ensure that there are no repeated elements

mikerod21:10:07

I’d use better naming though 😛