Fork me on GitHub
#clara
<
2018-05-30
>
jdt12:05:26

I'm just building this little prototype rule-based scheduler in clara to see if it will handle things at scale. It won't have many rules, but it will have a pragmatic maximum of 1-2 million base facts before firing the rule engine. Am I wasting my time, or will that overwhelm the system? (The bulk of the facts are job requests, from which we then infer things like whether there's a worker with adequate resources, round robin behaviors, and so on.)

jdt13:05:22

(I'm building the new intermediate binding of WorkerViableJob and ActiveUserJobCount, I just wondered if there wasn't a way to correlate the two records in the :and subclause that considers them without the use if an intermediate fact).

mikerod15:05:26

@dave.tenny I’d perhaps worry about that being a fairly slow way to do what you are saying given that you have so many facts in the session

jdt15:05:07

"that" being using a rules system?

mikerod15:05:29

I think if you have enough memory for that many objects, then the rules could perform alright. However, you will have to look out for bad algorithmic complexity in some rules if they are going to be processed over a very large amount of facts, like if you do some wide-open joins between sets of facts with size N and size M, you’ll get NxM comparisons in the join criteria

jdt15:05:58

yeah, trying to keep things tightly joined, and avoid derived facts that I don't need where possible

mikerod15:05:03

Oh, I mean the rule you gave above may be slow, if there are millions of WorkerViableJob and ActiveUserJobCount

jdt15:05:17

No, there's only millions of raw job request.

mikerod15:05:23

I think that a rules engine, such as Clara, may be capable of having reasonable perf characteristics, but you’d have to be careful for the “hot spots”

jdt15:05:54

There are relatively fiew WorkerViableJobs (those are jobs for which we know we have worker resources available, and we limit those to a small number of oldest/most-applicable jobs in any fire-rules run)

mikerod15:05:54

It’s best to quickly filter down to a smaller set of facts before performing the more involved joins between sets of facts

mikerod15:05:28

I would not fear intermediate facts either

mikerod15:05:41

I think it’ll help your rule given above even

jdt15:05:04

Yeah, already fixed that by asserting facts representing bindings of the clause.

jdt15:05:17

I just wondered if there was a better way

jdt15:05:24

I also repeatedly fall into the trap of [:not A] => A in my rules, because I only want to make an A if some other particular thing isn't true. Finding my way around it, but feels like I fight the problem a lot, whether or not the inserted A is unconditional.

jdt15:05:48

Even though I don't have too many rules now, there's actually a lot we want to do in our job management, affinities, special cases for "new job submitters" to give them optimal user experience in interacting with the jobs whose results they want, etc.

jdt15:05:12

So that's why I'm spending a bit too much time to see if this is viable, I think rules could be a real win here.

mikerod15:05:30

So in your above example, I wonder if you could make use of an accumulator like:

[?lowest <- (acc/min :n-jobs) :from [ActiveUserJobCount (= ?job-type job-type)]]

mikerod15:05:43

however, I don’t know that I immediately get the full semantics (like what the RHS) does

mikerod15:05:58

but doing that would give you the min job count for a given :job-type in terms of the ActiveUserJobCount facts

mikerod15:05:24

For this one: > I also repeatedly fall into the trap of [:not A] => A in my rules, because I only want to make an A if some other particular thing isn’t true I don’t know of a fix-all. It’s a case by case thing. Not sure what sort of scenario keeps getting you into it.

jdt15:05:28

For one of m y [:not A] => A scenarios I tried [acc/count ... and checked for count < than the limit I wanted, but the problem is it won't fire if the count is zero, even if the accumulator initializes with zero

jdt15:05:08

My scenarios are things like "this job is something we want to proceed with if some other job isn't eligible", a gross generalism, could be any fact, not just jobs. It often boils down to counting situations. E.g. only dispatch at most two at a time on a worker in one fire-rules loop.

jdt15:05:50

My approach now is to generate very minimal sets of candidates in a fire-rules session, then query the results, dispatch jobs, update relevant counters-as-facts, then run fire-rules again in a loop.

jdt15:05:45

The counters I need to maintain are mainly worker resource availability and active user job counts partitioned by type of job.

jdt15:05:59

Okay, well, hopefully I'm nearing some kind of first load test, we'll see what happens.

jdt16:05:13

Advice always appreciated.

wparker17:05:46

@jdt The Clara count accumulator should fire with an initial value of 0. Do you have an example where it does not? Also keep in mind that you can create your own accumulators with arbitrary domain-specific logic. So say "choose the top two at most" could be done. My instinct here is that it sounds like the problems you describe might be addressable with accumulators without any insert-unconditional logic, although as always hard to say without knowing the problem space. It sounds like @mikerod was suggesting that as well.

4
wparker17:05:34

See the writing accumulators section at http://www.clara-rules.org/docs/accumulators/

wparker17:05:58

Also regarding the cost of joins, that varies depending on the type of join, I'd suggest reading http://www.clara-rules.org/docs/hash_joins/ if you're working with millions of facts

jdt17:05:22

I thought the behavior I observed with the acc/count condition not firing seemed consistent with the documented behavior on this page: http://www.clara-rules.org/docs/accumulators/, however I suspect I read it wrong and they were talking about other accumulators not firing when there weren't facts matching the condition, instead of acc/count. I don't have the example in code any more so will have to revisit it later if necessary. Meanwhile I'll checkout those other links you posted.

mikerod18:05:43

@dave.tenny if an accumulator has a “truthy” :initial-value, it’s condition in a rule will be considered satisfied even if no facts exists to match the accumulators fact match criteria

mikerod18:05:32

default :initial-value is nil, so the default would not be true, however acc/count initializes to 0, so a condition that uses it will be satisfied when no facts match the condition.

mikerod18:05:12

e.g. [?count <- (acc/count) :from [NoMatchEver]] would bind ?count to 0 and the condition would be satisfied.

jdt18:05:15

I definitely had a [?x <- (acc/count ...)] that was not being successful, or at least a [:test (do (prn ...) true) ] following that accum condition was not printed, but perhaps they're not evaluated sequentially. My rule definitely wasn't firing, but again I no longer have the code to reason about it.

mikerod18:05:00

weird, I’d have to see it