architecture

john 2025-04-12T12:44:19.982759Z

Not quite ready to ping #announcements or #releases with this - doesn't even have a version v0.0.1 yet. But I did want to get this in front of some folks for some early feedback. The docs need a lot of work and I'll be updating those over the coming weeks. But the API is coming together so I think it's time to get some feedback. There's two APIs: a high level one and a low level one. More details in the repo: https://github.com/johnmn3/poly-map

3
🤘 5
🤘🏼 1
walterl 2025-04-16T11:54:53.661219Z

Which real-world scenarios would this be a good fit for? It feels like there should be singular cases, but I'm struggling to think of any.

walterl 2025-04-16T11:55:53.476969Z

In the general case it seems like this kind of map should be avoided, as it implements OOP-style behaviour hiding by complecting data and behaviour.

john 2025-04-16T12:38:57.430989Z

Have you seen some of the examples files? https://github.com/johnmn3/poly-map/blob/main/dev/ex/examples_high_level_md.clj

john 2025-04-16T12:40:14.894149Z

Yeah, in the general case you should still be using Clojure maps, absolutely

john 2025-04-16T12:41:29.915519Z

It's generally an expert level thing, where you want a map but you need some other behavior from it. Like: https://pathom3.wsscode.com/docs/smart-maps/

john 2025-04-16T12:42:27.768279Z

If these poly-maps existed before smart-maps, it would have probably been far easier for them to build them

👍 2
john 2025-04-16T12:48:51.582229Z

WRT OOP - thing is, here we're not complecting behavior data with state, so I don't know if we still fall victim to the dangers your hinting at. We're certainly not hiding behaviors behind the map. If anything, we're unhiding them. In terms of complecting data and behavior, that's the whole data/code dichotomy, right? Registering functions as values on map keys, which we do all the time, is complecting data and behavior - not necessarily a bad thing. You can rest assured though that no one will change the behavior of the particular immutable poly-map that you're using.

john 2025-04-16T12:50:07.403199Z

The original use-case I needed poly-maps for was https://github.com/johnmn3/ti-yong. That allows you to build functions out of maps.

john 2025-04-16T12:51:59.950709Z

And I needed that to build data-oriented front-end components

walterl 2025-04-16T13:09:28.763029Z

Yes, saw the examples, but they all seem like operations that should really rather be map-returning functions, or std lib function calls, so as not to hide what's happening. Going through each of them, the "normal" Clojure solution just seems clearer and simpler, so I thought perhaps some more gnarly, "real world" use cases could motivate better for this specific approach. I think the strongest example is case independent keys. What I mean by hiding behaviour: E.g. (assoc m :key "val") has a clear indication to a reader of a pure operation that can be mentally modeled as such. If it's suddenly side-effectful, that mental model (which is pretty central to Clojure) falls apart, and the code becomes much harder to reason about. This is similar to OOP accessors, which looks like simple variable access/updating, but can (and too often do) run all manner of other, side-effectful code, and does so implicitly.

john 2025-04-16T13:23:07.458109Z

Yeah, I had gemini generate most of those examples - some of them aren't that great. But stuff like smart-maps are the real use case. Without poly-map, you gotta drop down into some pretty arcane depths to get something working. In Clojure, we tell people, "in general, don't hide an atom in the scope of a function, where the behavior the function can be determined by the state of the atom." But, importantly, Clojure lets you do that and some very nice Clojure tools are built using those strategies. In reagent, stateful components are that, essentially. And people are adviced to "only use them when you need them," but we don't disallow the ability. So no, don't go putting gratuitous side-effects in your maps. In that rare instance where you, as an expert, actually do want to go do that thing and you think it's a good thing, Clojure's answer is: deftype. And that has been fine but now that's just a bit easier. No matter what we do, these maps are going to be made. Smart-maps are a case in point. But look at where-ever Potempkins def-map-type is used. We can choose to make building them hard or easy. This just makes it easier. For the most extreme performance, you'll still want to use deftype. But poly-map let's you hack map types fast and find the shape of the thing you want faster.

👍 1
john 2025-04-16T13:27:22.982809Z

There's some config file libs out there that could probably be easier to build with poly-map

john 2025-04-16T13:36:25.040149Z

Doing a search for def-map-type on github returns a few hundred code results: https://github.com/search?q=def-map-type&type=code The examples I have in the repo are mostly side-effect oriented, but nothing about it has to be side effects. It could just have special polynomial tradeoffs, for certain workloads. poly-maps just makes that easier than having to use def-map-type or deftype.

john 2025-04-16T13:46:45.199159Z

Also, I'll be adding a freeze method for folks that want to lock down the map and prevent future behavioral derivations off their frozen map

john 2025-04-16T14:02:04.180279Z

FYI I pushed up a new version and poly-map performance is now just about on par with hash-map performance on most of the benchmarks

john 2025-04-16T14:04:52.388539Z

Well, within 10% on most benchmarks, in CLJ

john 2025-04-16T14:19:21.126689Z

Also, @raspasov, poly-map now takes a map in it's 1-arity version. So you can now quickly wrap a map with poly-map, add your behaviors, do your magic, and then call get-coll on it to get the vanilla map back out. So if you only need the behavior on maps for a specific part of your pipeline, you can wrap the maps in poly-maps for only that region of the pipeline, and because one is just wrapping the other there's no conversion cost between the maps.

👍 2
john 2025-04-16T16:24:53.427139Z

And wrt that use-case, another cool use case would be instrumenting data that is getting changed somewhere deep in a function pipeline you have no control over (or is too opaque to reason about). Normally we only instrument our systems with spec at the boundaries of pipelines. This solution allows you to attach a spec that follows the map where-ever it goes, tattling on the call site that breaks the contract. Just instrument the map with spec, like so: https://github.com/johnmn3/poly-map/blob/a2caf456ed689b01900280cec1c5314fd4e847f3/dev/ex/examples_high_level_md.clj#L84... With some spec that throws when some bad condition is met. Then send the data down the pipeline like usual. The offending call site will attempt to make the bad data transformation and will get thrown at that call site, so you can trace it to the code in the lib that is responsible. That's definitely not easy to do in clojure without dropping down into deftype territory.

➕ 1
john 2025-04-16T19:14:25.865139Z

GDI there's already https://github.com/andreiavrammsd/polymap in c++ and https://github.com/simplegeo/polymaps in js 😅

😂 1
john 2025-04-16T19:15:31.928559Z

Needs a better name

john 2025-04-16T19:43:16.205549Z

Okay, what do y'all prefer: 1. type-map 2. multi-map 3. uni-map 4. any-map 5. impl-map

john 2025-04-16T19:43:57.027809Z

I like type-map the most, but it has the highest web search collision, looks like

john 2025-04-16T19:44:13.281289Z

impl-map is cool, with basically no collision

2025-04-17T00:21:51.341489Z

Hum, I feel maybe something more like: • extension-map • custom-map • hook-map But I'm also not sure if map should come last and not first. Say: • map-extender • map-customizer Or you can give it a fun name that's kind of make sense but also not like say: • mappy-hooky - Hook into Clojure maps.

john 2025-04-12T12:46:38.903229Z

@didibus you might be interested in this one. I'll be rebuilding transformers on top of these poly-maps

2025-04-12T14:02:57.672599Z

this is very very cool

john 2025-04-12T14:20:04.034129Z

Thanks. I've been itching to make it for a while. A half baked version was buried in another lib for months. But this seems useful so I've been trying to get a spike out

john 2025-04-22T14:41:40.282519Z

So, I'm debating what to do with the low-level api. I can either: 1. Leave it where it is 2. Go lower by passing this to the user (but I worry about memory leak footguns here) 3. Go higher by removing e altogether in the call sig, removing the requirement of (<- e ... construction altogether - automatically handling it like the transient version does - and then removing e from the call sig of both the persistent version and the transient version. Then the API would be more like the high level one, where you're just passed [m k v], etc. Reason being, if the user wants access to e, they can get to it via the IWrapAssociative methods, with get-impls, etc. The only reason to have access to e in transient mode is to have access to extra data from e or metadata during execution, where you could leverage metadata during execution, for particularly dynamic operations. In persistent mode, returning e, you can be updating behaviors with existing behaviors dynamically in a hot loop. I just can't imagine what hack would require that level of dynamism. If you're willing to do your meta programming jazz outside that hot loop, instead using the protocol methods, then why add e to the method signature at all?

john 2025-04-22T14:42:53.703689Z

On the one hand, why not go lower? Pass the this? Why not give the user full power? Make the high level API be the "responsible" api, while letting low level users employ their own protections while giving them full freedom to come up with whatever they think is appropriate

john 2025-04-22T14:43:32.283129Z

On the other, Most users will never use e, even in low level mode, while in the context of the method function body

john 2025-04-22T14:44:45.918709Z

So the overall api could seem way simpler by getting rid of the e... it'd just seem like your redefining an assoc function or whathaveyou

john 2025-04-22T14:47:21.703239Z

On the downside though, we're then forcing the user to always return a WrapMap+ for operations where we expect methods to return one. If a user wants to return something other than a WrapMap+ on assoc, like returning a string instead for whatever reason, you couldn't do that if you take away the users responsibility over construction (`(<- e ...`).

john 2025-04-22T14:48:01.326059Z

Hmm, yeah I think it's important for the user to be able to override that behavior 🤔

john 2025-04-22T14:57:00.861179Z

Hmm, thinking that through yeah I don't think it's worth getting rid of e. So I guess the other question is, why not go lower? Should I pass this or would that lead quickly to memory leaks?

john 2025-04-22T14:57:37.996039Z

And does that matter? deftype puts this in the hands of the user, so why not let it ride?

2025-04-22T16:41:14.287619Z

seems like you should let usage drive this

john 2025-04-22T17:31:00.719099Z

Yeah, I'll just leave it as is and see what pain points arise

john 2025-04-21T17:06:41.811429Z

Okay, I changed the name to https://github.com/johnmn3/wrap-map

🤘 2
john 2025-04-21T17:07:22.375409Z

It better connotes how there's actually a real map that is being wrapped, and that can be unwrapped at any time

john 2025-04-21T17:07:37.556289Z

And connotes the performance characteristics better

2025-04-21T17:08:08.962779Z

Much better, I like it. Ya, it's conveys the caveats and also what it does a lot better than poly-map did.

john 2025-04-21T17:08:20.538139Z

The AI is saying that the formalism is best described as a hybrid proxy/decorator pattern

john 2025-04-21T17:08:39.511869Z

So yeah, wrap conveys that better

2025-04-21T17:09:11.835319Z

Ya, proxy-map or decorator-map could have worked, but I don't like the connotation of Gang of Four OO Pattern they bring haha

2025-04-21T17:09:27.566239Z

wrap-map I think is great

john 2025-04-21T17:09:37.449259Z

nice

john 2025-04-21T17:10:00.403889Z

Here's a new workflow from the readme that highlights the wrap/unwrap pattern:

(-> {:a 1}
    (assoc :b 2)
    (w/assoc
      :T_assoc_k_v (fn [_ t-m k v]
                     (println "[Transient] assoc! key:" k "val:" v)
                     (assoc! t-m k v)))
    transient
    (assoc! :x 100)
    (assoc! :y 200)
    persistent!
    w/unwrap
    (dissoc :b)
    (w/assoc
      :assoc_k_v (fn [{:as e :keys [<-]} m k v]
                   (println "[Persistent] assoc key:" k "val:" v)
                   (<- e (assoc m k v)))) ;<- persistent ops require `<- constructor
    (assoc :z 300)
    w/unwrap
    (assoc :done 1))
; [Transient] assoc! key: :x val: 100
; [Transient] assoc! key: :y val: 200
; [Persistent] assoc key: :z val: 300
{:a 1, :x 100, :y 200, :z 300, :done 1}

john 2025-04-21T17:11:17.451729Z

w/assoc automatically wraps maps that aren't wrapped, so you don't even have to call wrap on the values in a pipeline to wrap them

john 2025-04-21T17:12:36.690909Z

So that's like for surgically updating the behavior of a map in a specific part of a (potentially opaque) pipeline

👍 1
2025-04-21T17:16:30.806309Z

lol, I know it's not idiomatic, but this would be great for a pseudo-ORM. Make a closed map, that doesn't let you set additional keys for example. And then, have it capture operations made to it in a way that you can build the SQL that applies the changes to the DB.

👀 1
john 2025-04-21T17:18:53.920359Z

Yeah, this would make that way easier to experiment with

john 2025-04-21T17:19:24.845079Z

I'm not too familiar with ORM though

john 2025-04-21T17:20:05.060329Z

Other than the distant stories lol

john 2025-04-21T17:21:31.408089Z

You might have been able to pull some of that off with records already

john 2025-04-21T17:21:53.385569Z

Not sure if you can make them closed

john 2025-04-21T17:23:21.136159Z

But yeah, I got rid of a lot of the ti-yong/transformers logic and left enough so that I could still build what I need for ti-yong on top of wrap maps

john 2025-04-21T17:23:37.396719Z

So that the basic abstraction is as performant as possible

2025-04-21T17:23:45.688309Z

They can get hairy. But the basic premise is you kind of build something like: defentity -> Creates a struct that maps to a table in your DB. If it has a relation to another defentity, it will appear as nested data in code, but it knows that this is a join in the DB. Support for lazy access, so when you try to get the nested data (is when it actually queries the other table, solving the N+1 problem) Now you can edit this "entity", and under the hood it converts those edits as SQL, as well as update the data in the entity. And finally you have a "commit" you can call on the entity that applies the SQL updates.

john 2025-04-21T17:25:12.352379Z

Ah, so you can have larger than memory objects and whatnot?

2025-04-21T17:27:48.886519Z

It's not really for larger than memory. It's to kind of fake the mismatch between relational data, and hierarchical data. A Clojure map normally doesn't have a relation to another map. It just nests the other map within itself, forming a nested hierarchy of data. But your DB will have a relation to another table. So if each Table Row is represented as a Clojure Map. What if that Table has a relation to another? Normally you'd just shove the :id in the map, but that does not let you follow it automatically. So what an ORM will do is it will basically nest the other entry, but in a way that it will query the DB for it only when accessed.

2025-04-21T17:31:35.947199Z

Think:

{
 :id 12
 :username "John"
 :items [345 645 6346] ;; IDs to the the Item table
}
So if you do (get (:items m) 0) you just get an id 345. Now you have to manually do another query to fetch the actual item. Now the ORM will instead do:
{
:id 12
 :username "John"
 :items [(lazy-query 345) (lazy-query 645) (lazy-query 6346)] ;; IDs to the the Item table
}
So if you do: (get (:items m) 0) the ORM will secretly make another DB call to fetch the item 345 and will a map of the actual item (well or another ORM mapped entity)

2025-04-21T17:32:14.201509Z

This is for a one to many, but if it was a one to one, then it would be the value of the key itself that is like "lazy fetched".

2025-04-21T17:32:35.597269Z

Haha, though I guess now we need wrap-vector library 😛

😂 1
john 2025-04-21T17:33:12.647589Z

For sure 🙂

john 2025-04-21T17:33:40.859289Z

Okay that makes sense

2025-04-21T17:34:18.658279Z

And it does something similar for writes and updates as well. Where as you update the entity in code, it doesn't just change the data in it, but also builds up an update query and then you can commit it where it'll apply the update to the DB as well. Including understanding how to update relations.

2025-04-21T17:35:50.246849Z

The reason they get "hairy" lol, is that, it can get all kind of wonky. Like you don't always want to lazy-fetch everything, because that can be slower, so you might want to be able to say what to lazy-fetch and what to bulk fetch. Then you might need to have transactions over things. Also it's like doing side-effects as you use the entity, etc.

2025-04-21T17:36:17.806979Z

Or the way they build the the SQL automatically can have issues, or be suboptimal, etc.

john 2025-04-21T17:36:24.830789Z

it kinda violates my sensibilities around identity semantics though I think - unless you put it behind a swap/deref interface, like an in mem db abstraction

2025-04-21T17:37:44.018049Z

Hum, ya that's true. It probably doesn't work with value semantics haha. A deref might be cleaner. But ya, they have all these issues for sure. It becomes a very side-effecting magic black box object thing. Like it captures a DB connection as well and all that.

john 2025-04-21T17:38:29.571229Z

Collecting updates before a commit is very statefully

2025-04-21T17:38:39.550639Z

It's just kind of noob friendly for people who don't understand SQL or relational data. Or when you need to do a simple CRUD, and want to go super fast and don't want to deal with SQL, following relations manually, and all that.

john 2025-04-21T17:38:59.710529Z

Gotcha

raspasov 2025-04-21T17:39:01.443879Z

Hahah… yeah… the ORM black box world 😅

raspasov 2025-04-21T17:47:03.865379Z

You pretend to work with the database via an object “interface”, calling some version of set()/get()/update()/save() on an object (save being the thing that triggers IO/transaction)… It’s a “convenient” but quite leaky & messy abstraction. The code gets infested with pervasive IO/side effects. Hard to keep track of what is getting send over the wire while one “hopes” the ORM is going to do a good job. Once I started doing Clojure I totally left that world and I don’t miss it.

💯 1
john 2025-04-21T18:03:51.767809Z

The beauty of the swap/deref interface is that it's a "transaction" abstraction that creates a scope where everything within that scope (that is a pure function) passes or fails in the context of that transaction. I'd much rather deal with a database in the form of swapping clojure objects into an out of a clojure atom like thing. But I def don't want "stateful db objects" that change in regular clojure-land, outside of some transactional context.

raspasov 2025-04-21T18:15:15.714959Z

Potentially there are ways to make it not too leaky. I too have done things sorta like that in Clojure, at various points. For example, passing a f for to take a row and return a new row to update a Postgres database. It’s an involved layered cake though, requiring some Postgres-specific functionality (row-level locking). Not compatible with every SQL database. The most straightforward way for a deref abstraction to work well is to have a truly immutable database (ala Datomic/Datascript).

raspasov 2025-04-13T09:08:33.779399Z

This is pretty neat 🙂 Gread looking repo and examples. Interesting use cases https://github.com/johnmn3/poly-map/blob/main/examples.md; my only worry is that in the wrong hands someone might attach a bit too much … “behavior” to the map where it effectively becomes a dancing object 😂 Other than that, with moderation, and “proper” usage, it is very interesting.

raspasov 2025-04-13T09:10:04.472359Z

logging read access is also cool, and most likely harmless

john 2025-04-13T13:16:16.825779Z

Yeah, there's definitely tradeoffs. I'll probably add a "freeze" method that removes the ability to change its implementation, like a normal map. So you could prototype in "dancing" mode and then lock it in place once you have what you like.

john 2025-04-13T13:19:12.867349Z

Another thing, anything you can build with poly-maps, you can technically build it with regular deftypes, by hand. And it'll go faster than the poly-map implementation. So for that purpose you could use poly-maps for prototyping the behaviors you want to arrive at and once you have what you want then rebuild it by hand with deftypes.

john 2025-04-13T13:20:12.327239Z

Like, if you know you'll never want the behavior for a thing to change, why pay for the overhead?

john 2025-04-13T13:21:30.282799Z

But yeah the point is to have a dancing object. Just to make it easier to reshape things, even at runtime

john 2025-04-13T13:42:45.872589Z

I suppose it's useful here to rearticulate one of our values: "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." - Alan Perlis So we've always valued the lack of proliferation of new types in clojure. So we've warned people away from deftype, saying "please leave that to the experts." Which I still think is good advice. And deftype lacks all kinds of creature comforts available in clojure functions, so it's just super hard to do, working with deftype. I just don't think something should be hard to use, just because it's a sharp tool. Sharp tools can be easy to use too.

➕ 1
john 2025-04-13T13:45:23.648259Z

We should still warn against creating a proliferation of incompatible maps. Where possible, don't break existing contracts, unless the purpose of the new behavior is explicitly meant to.

raspasov 2025-04-13T21:59:40.476949Z

anything you can build with poly-maps, you can technically build it with regular deftypes, by hand…Yes, absolutely. I looked at the code briefly. Let me see if I understand the high level structure: 1. There’s (deftype PolyMap ...) definition. 2. The PolyMap definition takes a regular Clojure map m and an impls map which defines potential custom behavior ◦ m is a regular Clojure map, provided by the user ◦ impls is an internal/“private” map 3. PolyMap mostly everywhere calls through to the underlying Clojure map implementation ◦ Critically, before this call happens, it checks for any custom impls and executes/runs that functionality Is that about right? 🙂 P.S. And I see there’s additional TransientPolyMap to support transients, etc

john 2025-04-13T22:27:43.836609Z

That's pretty much it, yup! I wouldn't say though that m is "provided by the user." There's a low level constructor where you can pass in the actual m map. For regular usage, you'd construct poly-maps like hash-maps, like (poly-map :a 1 :c 3 ... or (into empty-poly-map {:x 5 :y 6 ..., etc.

john 2025-04-13T22:28:09.467019Z

m is the "inner" map, which our methods delegate to

raspasov 2025-04-13T22:28:34.247879Z

Ah interesting…

raspasov 2025-04-13T22:28:52.836659Z

So no way to say like (poly-map my-existing-huge-map) ?

john 2025-04-13T22:28:54.912349Z

But PolyMap has three maps inside it. m, impls and metadata

👍 1
raspasov 2025-04-13T22:29:14.549469Z

(thinking of the use case, of “my map is huge, I don’t want to seq over it”)

john 2025-04-13T22:29:54.452129Z

There's a private constructor that allows for that syntax but it's not recommended for usage downstream

john 2025-04-13T22:30:33.859389Z

Yeah, perhaps there's a use-case for that

raspasov 2025-04-13T22:30:50.778709Z

I understand there might be a PolyMap internal convenience to “see” every key… but it might be nice for certain cases to have a way to avoid that (again, if perf is a concern, and the user’s map is large)

john 2025-04-13T22:32:08.418179Z

impl/poly-map* will give you that escape hatch. But don't expect things like metadata to carry over, etc.

raspasov 2025-04-13T22:32:19.994129Z

I see, cool

raspasov 2025-04-13T22:32:50.691569Z

Most people might not mind though… I’ve recently been working with large in-memory maps, so my perspective might be a bit skewed

john 2025-04-13T22:34:56.421749Z

Well, it's basically free, since we're delegating to a map anyway, so why not make it an option? For general usage I'd want to lean into intuition/semantics around hash-map, so as to keep things simple, but since this is free, it should def be an option.

raspasov 2025-04-13T22:37:21.447889Z

Yes… I guess if you’re using PolyMap for everything, that wouldn’t be an issue 🙂

raspasov 2025-04-13T22:37:46.191639Z

But this illustrates the problem of conversion for large maps:

(do
  (time (into {} (zipmap (range 1000000) (range 1000000))))
  :done)

raspasov 2025-04-13T22:38:27.561869Z

(takes ~500 ms on my machine, and very likely more on most commodity cloud VMs…)

john 2025-04-13T22:39:04.369079Z

For sure. And being able to take a gigantic map, wrap it in a poly-map, do you're fancy thing on it, then do get-coll on it to get the hash map back out of it - that'd be a useful thing

raspasov 2025-04-13T22:39:33.884439Z

PS The initial zipmap takes 250ms… so 250ms for the O(n) conversion

raspasov 2025-04-13T22:41:41.239789Z

In terms of “API”, it’s kinda like (transient {}) … takes a fully formed map and gives you another map (object?), with different characteristics… (poly {}) or something like that 🙂

john 2025-04-13T22:43:51.005649Z

I thought about making poly a mode, like transient, where when you're in poly mode, assoc and dissoc operate on impl instead of m. Cally poly on it, add your methods, then call persistent! and it dumps m back out to you.

john 2025-04-13T22:44:24.045239Z

But it doesn't really need to be moded like that

raspasov 2025-04-13T22:46:31.239339Z

FWIW my naming suggestion was not fully thought out, I was just brainstorming 😉

john 2025-04-13T22:46:40.102039Z

Well, no, you'd add your methods, then use the wrapped m and then later dump out m with persistent! later when you're done with the poly wrapper

john 2025-04-13T22:47:25.309749Z

Well I don't plan on making it a mode like transient

👍 1
john 2025-04-13T22:48:06.903669Z

But I'll def make that workflow smoother, so you can quickly wrap and unwrap large maps

➕ 1
john 2025-04-13T22:49:55.295649Z

I believe poly-map/hash-map require an even number of args. I could just make the one arg arity of poly-map expect a whole m as a parameter. So (poly-map big-map)

💡 1