announcements 2023-08-04

vlad_poh01:08:41

https://github.com/kbosompem/bb-excel 0.0.5 Use https://www.babashka.org/ to extract data from Excel Spreadsheets! • #2 Fixing a problem where whenever range has 0 in it the parse-range fails to correctly locate the rows. Thanks @hjbolide for the bug fix.

🎉 32

❤️ 6

Kent Bull20:08:45

Sweet! This is handy.

👍 2

Dotan Schreiber11:08:30

https://github.com/s-doti/ginfer A graph inference library. There are any number of disclaimers I might have added, but I won't. Instead, I'll just say - I'm looking for feedback, and would be more than happy to discuss the more in-depth concepts and ideas this is based on, if any of it happens to be your cup of tea 😉 ... I do mean it in the broader sense, this isn't really just about graph inference per se

👏 4

🌼 2

Max14:08:01

How does this differ from rules engines like https://github.com/oakes/odoyle-rules and https://github.com/cerner/clara-rules?

Dotan Schreiber16:08:14

I actually don't think Ginfer qualifies as a rules engine, definitely not at the level that both of these technologies do. Clara/Odoyle invite you to express your logic via their specific lingo, and their focus would then be on minimizing computations and producing the fastest answer, I imagine. Ginfer doesn't require that level of commitment - you still express your logic as you otherwise would, as clojure fns, at the granularity of your business entities and their attributes. You then just declare the dependencies between those attributes, and Ginfer takes care of when your logic should be invoked, as well as pulling and pushing data as needed (either from the world, or to/from your storage solution). Ginfer is agnostic to your data model as well. It is built with the following case in mind: your data consists of millions of business entities, of different types, with their intrinsic data attributes having elaborate dependencies on one another. An event comes in, and changes a single value of a single such entity and attribute, leading to a ripple effect involving dozens/hundreds/thousands of more changes all around; or, the user would like to evaluate some attribute per a specific entity, which requires a similar order of magnitude of evaluations all around. Ginfer aims to help you in pulling/pushing the relevant data when needed, and running the relevant logic, to make all this happen. It is the glue between your logic, your data model, and the world.

Max17:08:16

Do you have some specific applications in mind where you think Ginfer might be valuable? I’m having a hard time determining when to apply it from having read the readme

Dotan Schreiber19:08:05

Well, Ginfer was originally conceived for a EASM product (external attack-surface mapping), cyber-security. This requires mapping of orgs and their relations, and moving to their exposed assets on the internet, e.g. ips, domains, websites, certs and what have you. All of these are business entities, and there's a high level of interconnectivity between them, as far as the logic that applies to calculating their attributes. Ginfer should be applicable to businesses in many other domains though, just as well, given similar requirements in nature. I imagine Ginfer as being closer, in philosophy, to knowledge graphs, rather than engine rules; but as an added layer, and not as a standalone solution on the side. It wants to support the operations front, not just development aspects. So it can be executed eagerly or lazily, async or blocking, it can pause/resume, it can be used to aggregate data from multiple data sources, it can support disaster recovery scenarios, etc.

👍 4

Darrick Wiebe21:08:53

How does this compare to a propagator model? Have you looked at https://dspace.mit.edu/handle/1721.1/44215?

Max21:08:58

I also see a lot of similarity with reasoners, such as datascript with rules

Cora (she/her)01:08:26

it would help a lot if the first bits in the readme explained what you meant by a graph inference library. diagrams would be 🔥

👍 4

Cora (she/her)01:08:42

this looks really interesting, fwiw

Cora (she/her)01:08:10

is graph inference a concept I just don't know about? that might be the case and why I'm suggesting that, so take that with a grain of salt I guess

Darrick Wiebe01:08:24

I've worked with graphs for years and haven't heard the term used this way.

Cora (she/her)01:08:19

ahh, ok, well, new terms for new things are great 😊

Darrick Wiebe01:08:17

I can't decide if it's a graph-based reporting reporting engine, if it's updating the graph, if it's doing something like a BSP algo or something like a propagator. But I haven't tried to read the source. Hoping for clarity from the creator first...

Cora (she/her)01:08:50

it sounds spreadsheet-ish to me? but eventually consistent? or maybe I'm misunderstanding what it is

Darrick Wiebe01:08:09

🤷

Dotan Schreiber08:08:35

Ha ha, language/communication has always been the biggest challenge by far to solve, with me at least 😉 It is so hard to be concrete and convey what I have in mind with the needed clarity.. I'll address all questions/comments as best I can!

Dotan Schreiber08:08:36

Re @U01D37REZHP’s Propagator model - I think Ginfer incorporates a very similar concept, if I'm understanding the reference correctly. The user of Ginfer in effect creates their logic as composable logical units which take in multiple values and propagate their calculated values in turn. The cells mentioned in the reference, in Ginfer's case, are bits of data in the user data-model, persisted on the user's external storage solution (in whatever format). It's hard to wrap my head around this propagator model, that's new to me, but I think Ginfer actually has an abstract level consisting of 3 main propagators - one for each of the update/notify/eval steps - and then these become concrete at the user level, where the user declares their model attributes along with their logic and dependencies. Anyhow, if indeed similar, only in concept, my implementation probably doesn't make that similarity very apparent at all.

Dotan Schreiber08:08:43

@U01EB0V3H39, yes, I also see the similarity with datascript+rules, though Ginfer maintains its state out of mem. Datomic has built-in mechanics that allow for logic propagation, I think a Ginfer-based solution could possibly be based on Datomic just the same. But with different controls - like, Ginfer's propagation could be executed lazily, or paused/resumed, etc. One of the libraries Ginfer is based on - https://github.com/s-doti/sepl - converts Ginfer's update-notify-eval loop execution into data, that's where the level of control is gained.

Dotan Schreiber09:08:46

@U02N27RK69K lol, 'graph inference' is the term I've been holding in my head for quite some time, and I forget where I came up with it in the first place.. I'll update my readme along these lines: graph inference here hints of the process by which interconnected and interdependent bits of data are updated in sequence. I use 'graph' to acknowledge the interconnectivity between values, and 'inference' for them being interdependent, meaning a change to some value leads to a propagating ripple effect which ends in a change to another value (and potentially a whole lot of values in between).

Dotan Schreiber09:08:21

Also, not a graph-based reporting engine, and not based on BSP algo (unless what I'm doing is in fact reducible to BSP? I wouldn't know). In essence, I've only incorporated a very basic and naive approach that keeps on iterating through a update-notify-eval sequence for as long as updated values can be inferred, anywhere (data stability is assumed). There are no optimizations, no branch-skipping, no events-deferring/collapsing, at this point. If you have the stamina, I've created baby versions to deliver just the gist in short, they're really tiny (and yet I've incorporated visual diagrams in the readme!): https://github.com/s-doti/baby-ginfer

Dotan Schreiber09:08:45

I really appreciate you helping me examine all these concepts I try squeeze out of my head 🙂

Darrick Wiebe16:08:29

I appreciate that you've made this flexible to use a variety of back ends, etc, but I think it'd be better to try to simplify things for people to understand by keeping the examples as simple and concrete as possible. Also by showing the entire setup in your examples. In the baby-ginfer repo, there is a code block that uses a bunch of symbols that are not defined which appear to be referring to a global database of some sort which is also not defined.

Darrick Wiebe16:08:48

In your examples, you're using :refer :all , which can be convenient but has the disadvantage that we can't know where anything comes from without searching.

Dotan Schreiber17:08:33

Good feedback, I'll aim to incorporate all points right away. @U01D37REZHP what do you mean by 'entire setup'? Call out main api alongside the complete set of options explained in readme maybe?

Cora (she/her)17:08:46

I think Darrick means that it should start explicit and small so people can understand what this is, then build on that simple example to show different options and ways of doing things

👍 3

Cora (she/her)17:08:13

to an audience that has no idea what this is, the flexibility and advanced options just obscure the heart of what you're trying to convey

Cora (she/her)17:08:04

you have the curse of knowledge, you know enough about the software that it's hard to know how to begin explaining it, and the things you find exciting about it and are proud of aren't the things a novice audience are interested in (initially)

Cora (she/her)18:08:23

it's like how clojure intros go on and on about simplicity and complecting but novices just want to know how to start a project and do something they've done before so they can learn what's different through use and then they can start understanding what's so simple and what's cool about repl-driven development. the things advanced users find compelling aren't the same as what novices do, and it pays to be aware of that

Dotan Schreiber18:08:39

Wow, yes, I see what you're saying guys. I'm gonna ponder a bit about how best to incorporate the needed changes asap. Thanks so much!

Dotan Schreiber22:08:38

I just updated the README for added clarity, or so I hope. Still no visuals I'm afraid. It feels quite long, is it too much? It would take quite some time to improve the documentation as a whole, but then, the library in general is a long way from mature of course.

Darrick Wiebe01:08:08

Definitely much more comprehensible.

Darrick Wiebe01:08:50

Does this library store the inferences that it makes back into the data source?

Darrick Wiebe01:08:17

Using your example, what happens if I add an employee? Is all data invalidated, or some subset? Would the department head count need to be recalculated from scratch? How does the system know what needs to be recalculated? Same questions for if an employee were removed.

Dotan Schreiber05:08:18

Ginfer assumes 0 memory. It requires only the data necessary for a single local calculation to reside in memory (sort of). Everything gets persisted at each step. I should better document 'connectors' somewhere - these are the components responsible for translating data and persisting it in the user external storage solution. In my examples, a default in-memory connector is used, for demo purposes (implicit). Performance lost is regained with the internal persistence library - https://github.com/s-doti/persistroids. It adds a read/write-through cache of weak refs, which only gets flushed when gc actually needs to evacuate items from it (or by time/amount of accumulated mutations).

Dotan Schreiber05:08:13

In my example, adding/removing an employee generally triggers re-calculations from scratch. However, since all data is eagerly persisted, and values that have not changed do not need to be re-evaluated, there's no need to go far to acquire the input data necessary for a re-calculation of head-count for instance. Does that make sense? I'm not sure I manage to explain myself well enough with this.

2023-08-04

Channels