Fork me on GitHub
#off-topic
<
2023-12-05
>
Rupert (Sevva/All Street)11:12:24

Just a heads up that Maelstrom (Clojure Library for distributed systems) is on the homepage of the orange site currently: Maelstrom: A workbench for learning distributed systems https://news.ycombinator.com/item?id=38525968 https://github.com/jepsen-io/maelstrom

idiomancy22:12:41

Can anyone help me put words and vocabulary to this immutability concept? There's a notion of a 'campaign' entity, and lets say that campaign has 2 attributes: configuration and performance. but in a sense, its meaningless to discuss the performance of a campaign because if it has different configuration it will perform differently. So what I want to do is propose that we make campaign_configuration an immutable concept - conceptually (even if not in practice) something with its own ID, where every time you change the config of a campaign, you are actually creating a new campaign_configuration. Because conceptually, in order to make a system that recommends what configuration a campaign should have, you need to be able to look at the past successes of configurations, not the past success of campaigns. There must be a word for this kind of line of thinking. To take a currently mutable notion of entity+attribute and promote it to a first class immutable noun

idiomancy22:12:24

I guess in a sense I'm kind of "reifying" the notion of campaign configuration?

idiomancy22:12:01

I'm trying to assemble the vocabulary to discuss this concept so I can even begin to the approach the discussion of "is it worth materializing this reification by implementing code, or is this something that has a current business meaning which should just become shared cultural vocabulary"?

p-himik22:12:27

I've seen the word "snapshot" being used in similar contexts.

👍 2
idiomancy22:12:07

"version" is another word that makes sense here. I'm effectively trying to say "there's an implicit versioning going on with campaign configuration and I want to make it explicit"

Cora (she/her)22:12:07

it feels more to me like a performance entity has a campaign entity and a configuration entity that are necessary parts of it

1
Cora (she/her)22:12:57

meaning the performance is the root entity, and it points at a version of a configuration and an individual campaign

p-himik22:12:23

A somewhat tangent thought - just throw XTDB at it and always refer to entities at particular times. :)

idiomancy22:12:23

so, hypothetically the most complete decomposition of this would be that a single performance event (like, a click on an ad) has a bunch of context to it. I guess what I'm getting at is that there's a semantically important entity that sits at the intermediate level of "a configuration of a campaign at a certain time"

Cora (she/her)22:12:25

right, yes, sorry, that was an assumption of what I said, but I saw the hierarchy was a little off on the relationships and got caught up

idiomancy22:12:16

yeah, that makes sense, and it's I think an important thing to call out

idiomancy22:12:05

there has to be a word for this notion of "reorganizing the composition of base level entities such that this particular business concept is a top level proper noun"

Cora (she/her)22:12:23

I think that's called naming? you named a concept or entity so that you could manipulate it abstractly. it exists apart from naming but naming is what opens the door to it being useful

idiomancy22:12:11

like, the most granular way of looking at things is log events. the current aggregation of things is to consider campaigns as an entity that has a mutable config and a certain level of performance attributed to it. my proposal is that the campaign entity isn't particularly meaningful and that the combination of campaign and configuration should be projected as its own entity

idiomancy22:12:09

once I can pin down what exactly this concept is called, then you could describe a lot of ways that it might concretely be accomplished -- naming conventions is one, assigning it an ID and a schema is another

Cora (she/her)22:12:07

zeitgeist? 😜

idiomancy22:12:32

yeah, that's perfect 😂

idiomancy22:12:23

like, significantly this concept of "campaign version" doesn't really exist except as an implicit consequence of the fact that campaigns can have their configuration changed, and right now one has to just join across tables to find out if an edit event happened during the time period in question, and filter out any that have

Cora (she/her)22:12:08

like was said earlier in the thread, it feels like a snapshot is a good name for it. there are lots of words for where things come together, a junction/intersection/join/etc, and lots of words for a time when things happen, a moment/instant/etc, but not too many that combine the two I don't think

Cora (she/her)22:12:41

if you had other metadata about that snapshot that was meaningful you could use that name instead. you mentioned a performance, for example

Cora (she/her)22:12:14

but it feels like you'd still need to use the word "snapshot" to talk about what a performance refers to

Cora (she/her)22:12:58

you could just go german and mash some words together, make your own portmanteau perhaps 😜

idiomancy22:12:27

its interesting that you're particularly motivated about the entity "performance". It sounds like it has a 1:1 relationship with campaign_configuration, right? for instance, you wouldn't have multiple performance entities for each day of the week, yeah?

Cora (she/her)22:12:32

it has a 1:1 with a campaign snapshot and a 1:1 with a configuration snapshot 👀

Cora (she/her)22:12:01

I'm guessing at your system

idiomancy22:12:12

I guess the difficulty I'm having is that performance is a metric which could be associated at any level of data dissection, so I don't think it can really be an entity. for instance you could discuss the performance of a campaign across all of its configurations aswell

Cora (she/her)22:12:46

ahh, so it's more of a view of the data than an actual entity

idiomancy22:12:50

yes, I suppose that's true

idiomancy22:12:46

in general, what is the notion in software for assigning something an ID that didn't previously have an ID associated with it? "entity-ing"

idiomancy22:12:53

im entity-ing this bad boy

Cora (she/her)22:12:04

identifying?

🤯 1
Cora (she/her)22:12:43

that's borderline rich-hickey-giving-an-etymological-definition-at-the-beginning-of-a-talk-ish

Cora (she/her)22:12:16

for the right audience that's a positive 😉

idiomancy22:12:21

> borderline rich-hickey-giving-an-etymological-definition-at-the-beginning-of-a-talk-ish it really is. And I came to clojurians with this question because I was assuming that was something someone had already done 😂

Cora (she/her)23:12:02

well, I hope we've been helpful

idiomancy23:12:26

you have! Appreciate it. For what its worth, I think this is exactly what the word "reification" means

idiomancy23:12:46

from all my searches, that's the word for taking something and making it an entity.

thinking-face 2
idiomancy23:12:56

so I'm reifying campaign versions.

1
pithyless15:12:09

So I’ve seen this play out in other projects where the realization is what you’re calling the campaign version/snapshot is actually the “campaign”. It’s the thing most directly related to performance metrics, events, etc. When you change the config (ie parameters of the the “campaign” you get a brand new unrelated entity). Then what you were originally calling the campaign - which was probably driven by the end user/UX naming - is just some grouping for the potential user (who doesn’t care that “new configurations” are brand new things from a system runtime perspective). In other words, sometimes it’s good food for thought to consider the reified version/snapshot as the actual thing (ie campaign) and reconsider how you may want to group related runs (this could be a separate entity, or just a property like label/name shared across multiple campaigns, or something else entirely). It also helps open the door to consider there could be multiple interesting ways to group related campaigns (and not stuck in just one strict tree hierarchy)

2
jjttjj15:12:48

^ I had that exact same thought on considering the snapshot the "real thing" and the other stuff just tags/metadata/human labels. This discussion also reminds me of this library https://github.com/replikativ/hasch which I use as a building block for similar things. You shove some arbitrary config data into it and get back a deterministic id.

respatialized18:12:11

Yeah performance is clearly a derived attribute; "performance" of a campaign is just a function of individual events comprising the campaign (this email got X clicks, this post reached Y distinct users). Reifying performance is going to cause trouble as soon as someone asks "ok, well how are we doing if we look at just email?" and then you have to go back an recalculate everything by hand because you may not have snapshotted the necessary data to distinguish performance across subcategories

respatialized18:12:54

Similarly, "config" might also be an aggregate of the config for individual events plus some higher level attributes that don't cleanly map on to individual events - the nice thing about snapshotting by timestamp is that you always have an identifier so you can answer "what was the config that was 'current' when this email went out?"

respatialized18:12:43

Because it seems like time is a key dimension of your data model here I think it's worth watching Rich Hickey's discussion of the Epochal Time Model in "https://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey/?" if you haven't already because it explicitly contrasts itself with "traditional" OO approaches to modeling change. I realized that I'm probably just badly paraphrasing ideas from that talk that are better stated in the original!