Not quite ready to ping #announcements or #releases with this - doesn't even have a version v0.0.1 yet. But I did want to get this in front of some folks for some early feedback. The docs need a lot of work and I'll be updating those over the coming weeks. But the API is coming together so I think it's time to get some feedback. There's two APIs: a high level one and a low level one. More details in the repo: https://github.com/johnmn3/poly-map
Which real-world scenarios would this be a good fit for? It feels like there should be singular cases, but I'm struggling to think of any.
In the general case it seems like this kind of map should be avoided, as it implements OOP-style behaviour hiding by complecting data and behaviour.
Have you seen some of the examples files? https://github.com/johnmn3/poly-map/blob/main/dev/ex/examples_high_level_md.clj
Yeah, in the general case you should still be using Clojure maps, absolutely
It's generally an expert level thing, where you want a map but you need some other behavior from it. Like: https://pathom3.wsscode.com/docs/smart-maps/
If these poly-maps existed before smart-maps, it would have probably been far easier for them to build them
WRT OOP - thing is, here we're not complecting behavior data with state, so I don't know if we still fall victim to the dangers your hinting at. We're certainly not hiding behaviors behind the map. If anything, we're unhiding them. In terms of complecting data and behavior, that's the whole data/code dichotomy, right? Registering functions as values on map keys, which we do all the time, is complecting data and behavior - not necessarily a bad thing. You can rest assured though that no one will change the behavior of the particular immutable poly-map that you're using.
The original use-case I needed poly-maps for was https://github.com/johnmn3/ti-yong. That allows you to build functions out of maps.
And I needed that to build data-oriented front-end components
Yes, saw the examples, but they all seem like operations that should really rather be map-returning functions, or std lib function calls, so as not to hide what's happening. Going through each of them, the "normal" Clojure solution just seems clearer and simpler, so I thought perhaps some more gnarly, "real world" use cases could motivate better for this specific approach. I think the strongest example is case independent keys.
What I mean by hiding behaviour: E.g. (assoc m :key "val") has a clear indication to a reader of a pure operation that can be mentally modeled as such. If it's suddenly side-effectful, that mental model (which is pretty central to Clojure) falls apart, and the code becomes much harder to reason about. This is similar to OOP accessors, which looks like simple variable access/updating, but can (and too often do) run all manner of other, side-effectful code, and does so implicitly.
Yeah, I had gemini generate most of those examples - some of them aren't that great. But stuff like smart-maps are the real use case. Without poly-map, you gotta drop down into some pretty arcane depths to get something working.
In Clojure, we tell people, "in general, don't hide an atom in the scope of a function, where the behavior the function can be determined by the state of the atom."
But, importantly, Clojure lets you do that and some very nice Clojure tools are built using those strategies. In reagent, stateful components are that, essentially. And people are adviced to "only use them when you need them," but we don't disallow the ability.
So no, don't go putting gratuitous side-effects in your maps. In that rare instance where you, as an expert, actually do want to go do that thing and you think it's a good thing, Clojure's answer is: deftype. And that has been fine but now that's just a bit easier.
No matter what we do, these maps are going to be made. Smart-maps are a case in point. But look at where-ever Potempkins def-map-type is used. We can choose to make building them hard or easy. This just makes it easier. For the most extreme performance, you'll still want to use deftype. But poly-map let's you hack map types fast and find the shape of the thing you want faster.
There's some config file libs out there that could probably be easier to build with poly-map
Doing a search for def-map-type on github returns a few hundred code results: https://github.com/search?q=def-map-type&type=code
The examples I have in the repo are mostly side-effect oriented, but nothing about it has to be side effects. It could just have special polynomial tradeoffs, for certain workloads. poly-maps just makes that easier than having to use def-map-type or deftype.
Also, I'll be adding a freeze method for folks that want to lock down the map and prevent future behavioral derivations off their frozen map
FYI I pushed up a new version and poly-map performance is now just about on par with hash-map performance on most of the benchmarks
Well, within 10% on most benchmarks, in CLJ
Also, @raspasov, poly-map now takes a map in it's 1-arity version. So you can now quickly wrap a map with poly-map, add your behaviors, do your magic, and then call get-coll on it to get the vanilla map back out.
So if you only need the behavior on maps for a specific part of your pipeline, you can wrap the maps in poly-maps for only that region of the pipeline, and because one is just wrapping the other there's no conversion cost between the maps.
And wrt that use-case, another cool use case would be instrumenting data that is getting changed somewhere deep in a function pipeline you have no control over (or is too opaque to reason about). Normally we only instrument our systems with spec at the boundaries of pipelines. This solution allows you to attach a spec that follows the map where-ever it goes, tattling on the call site that breaks the contract. Just instrument the map with spec, like so: https://github.com/johnmn3/poly-map/blob/a2caf456ed689b01900280cec1c5314fd4e847f3/dev/ex/examples_high_level_md.clj#L84... With some spec that throws when some bad condition is met. Then send the data down the pipeline like usual. The offending call site will attempt to make the bad data transformation and will get thrown at that call site, so you can trace it to the code in the lib that is responsible. That's definitely not easy to do in clojure without dropping down into deftype territory.
GDI there's already https://github.com/andreiavrammsd/polymap in c++ and https://github.com/simplegeo/polymaps in js 😅
Needs a better name
Okay, what do y'all prefer:
1. type-map
2. multi-map
3. uni-map
4. any-map
5. impl-map
I like type-map the most, but it has the highest web search collision, looks like
impl-map is cool, with basically no collision
Hum, I feel maybe something more like: • extension-map • custom-map • hook-map But I'm also not sure if map should come last and not first. Say: • map-extender • map-customizer Or you can give it a fun name that's kind of make sense but also not like say: • mappy-hooky - Hook into Clojure maps.
@didibus you might be interested in this one. I'll be rebuilding transformers on top of these poly-maps
this is very very cool
Thanks. I've been itching to make it for a while. A half baked version was buried in another lib for months. But this seems useful so I've been trying to get a spike out
So, I'm debating what to do with the low-level api. I can either:
1. Leave it where it is
2. Go lower by passing this to the user (but I worry about memory leak footguns here)
3. Go higher by removing e altogether in the call sig, removing the requirement of (<- e ... construction altogether - automatically handling it like the transient version does - and then removing e from the call sig of both the persistent version and the transient version. Then the API would be more like the high level one, where you're just passed [m k v], etc. Reason being, if the user wants access to e, they can get to it via the IWrapAssociative methods, with get-impls, etc. The only reason to have access to e in transient mode is to have access to extra data from e or metadata during execution, where you could leverage metadata during execution, for particularly dynamic operations. In persistent mode, returning e, you can be updating behaviors with existing behaviors dynamically in a hot loop. I just can't imagine what hack would require that level of dynamism. If you're willing to do your meta programming jazz outside that hot loop, instead using the protocol methods, then why add e to the method signature at all?
On the one hand, why not go lower? Pass the this? Why not give the user full power? Make the high level API be the "responsible" api, while letting low level users employ their own protections while giving them full freedom to come up with whatever they think is appropriate
On the other, Most users will never use e, even in low level mode, while in the context of the method function body
So the overall api could seem way simpler by getting rid of the e... it'd just seem like your redefining an assoc function or whathaveyou
On the downside though, we're then forcing the user to always return a WrapMap+ for operations where we expect methods to return one. If a user wants to return something other than a WrapMap+ on assoc, like returning a string instead for whatever reason, you couldn't do that if you take away the users responsibility over construction (`(<- e ...`).
Hmm, yeah I think it's important for the user to be able to override that behavior 🤔
Hmm, thinking that through yeah I don't think it's worth getting rid of e. So I guess the other question is, why not go lower? Should I pass this or would that lead quickly to memory leaks?
And does that matter? deftype puts this in the hands of the user, so why not let it ride?
seems like you should let usage drive this
Yeah, I'll just leave it as is and see what pain points arise
It better connotes how there's actually a real map that is being wrapped, and that can be unwrapped at any time
And connotes the performance characteristics better
Much better, I like it. Ya, it's conveys the caveats and also what it does a lot better than poly-map did.
The AI is saying that the formalism is best described as a hybrid proxy/decorator pattern
So yeah, wrap conveys that better
Ya, proxy-map or decorator-map could have worked, but I don't like the connotation of Gang of Four OO Pattern they bring haha
wrap-map I think is great
nice
Here's a new workflow from the readme that highlights the wrap/unwrap pattern:
(-> {:a 1}
(assoc :b 2)
(w/assoc
:T_assoc_k_v (fn [_ t-m k v]
(println "[Transient] assoc! key:" k "val:" v)
(assoc! t-m k v)))
transient
(assoc! :x 100)
(assoc! :y 200)
persistent!
w/unwrap
(dissoc :b)
(w/assoc
:assoc_k_v (fn [{:as e :keys [<-]} m k v]
(println "[Persistent] assoc key:" k "val:" v)
(<- e (assoc m k v)))) ;<- persistent ops require `<- constructor
(assoc :z 300)
w/unwrap
(assoc :done 1))
; [Transient] assoc! key: :x val: 100
; [Transient] assoc! key: :y val: 200
; [Persistent] assoc key: :z val: 300
{:a 1, :x 100, :y 200, :z 300, :done 1}w/assoc automatically wraps maps that aren't wrapped, so you don't even have to call wrap on the values in a pipeline to wrap them
So that's like for surgically updating the behavior of a map in a specific part of a (potentially opaque) pipeline
lol, I know it's not idiomatic, but this would be great for a pseudo-ORM. Make a closed map, that doesn't let you set additional keys for example. And then, have it capture operations made to it in a way that you can build the SQL that applies the changes to the DB.
Yeah, this would make that way easier to experiment with
I'm not too familiar with ORM though
Other than the distant stories lol
You might have been able to pull some of that off with records already
Not sure if you can make them closed
But yeah, I got rid of a lot of the ti-yong/transformers logic and left enough so that I could still build what I need for ti-yong on top of wrap maps
So that the basic abstraction is as performant as possible
They can get hairy. But the basic premise is you kind of build something like: defentity -> Creates a struct that maps to a table in your DB. If it has a relation to another defentity, it will appear as nested data in code, but it knows that this is a join in the DB. Support for lazy access, so when you try to get the nested data (is when it actually queries the other table, solving the N+1 problem) Now you can edit this "entity", and under the hood it converts those edits as SQL, as well as update the data in the entity. And finally you have a "commit" you can call on the entity that applies the SQL updates.
Ah, so you can have larger than memory objects and whatnot?
It's not really for larger than memory. It's to kind of fake the mismatch between relational data, and hierarchical data. A Clojure map normally doesn't have a relation to another map. It just nests the other map within itself, forming a nested hierarchy of data. But your DB will have a relation to another table. So if each Table Row is represented as a Clojure Map. What if that Table has a relation to another? Normally you'd just shove the :id in the map, but that does not let you follow it automatically. So what an ORM will do is it will basically nest the other entry, but in a way that it will query the DB for it only when accessed.
Think:
{
:id 12
:username "John"
:items [345 645 6346] ;; IDs to the the Item table
}
So if you do (get (:items m) 0) you just get an id 345. Now you have to manually do another query to fetch the actual item.
Now the ORM will instead do:
{
:id 12
:username "John"
:items [(lazy-query 345) (lazy-query 645) (lazy-query 6346)] ;; IDs to the the Item table
}
So if you do: (get (:items m) 0) the ORM will secretly make another DB call to fetch the item 345 and will a map of the actual item (well or another ORM mapped entity)This is for a one to many, but if it was a one to one, then it would be the value of the key itself that is like "lazy fetched".
Haha, though I guess now we need wrap-vector library 😛
For sure 🙂
Okay that makes sense
And it does something similar for writes and updates as well. Where as you update the entity in code, it doesn't just change the data in it, but also builds up an update query and then you can commit it where it'll apply the update to the DB as well. Including understanding how to update relations.
The reason they get "hairy" lol, is that, it can get all kind of wonky. Like you don't always want to lazy-fetch everything, because that can be slower, so you might want to be able to say what to lazy-fetch and what to bulk fetch. Then you might need to have transactions over things. Also it's like doing side-effects as you use the entity, etc.
Or the way they build the the SQL automatically can have issues, or be suboptimal, etc.
it kinda violates my sensibilities around identity semantics though I think - unless you put it behind a swap/deref interface, like an in mem db abstraction
Hum, ya that's true. It probably doesn't work with value semantics haha. A deref might be cleaner. But ya, they have all these issues for sure. It becomes a very side-effecting magic black box object thing. Like it captures a DB connection as well and all that.
Collecting updates before a commit is very statefully
It's just kind of noob friendly for people who don't understand SQL or relational data. Or when you need to do a simple CRUD, and want to go super fast and don't want to deal with SQL, following relations manually, and all that.
Gotcha
Hahah… yeah… the ORM black box world 😅
You pretend to work with the database via an object “interface”, calling some version of set()/get()/update()/save() on an object (save being the thing that triggers IO/transaction)… It’s a “convenient” but quite leaky & messy abstraction. The code gets infested with pervasive IO/side effects. Hard to keep track of what is getting send over the wire while one “hopes” the ORM is going to do a good job. Once I started doing Clojure I totally left that world and I don’t miss it.
The beauty of the swap/deref interface is that it's a "transaction" abstraction that creates a scope where everything within that scope (that is a pure function) passes or fails in the context of that transaction. I'd much rather deal with a database in the form of swapping clojure objects into an out of a clojure atom like thing. But I def don't want "stateful db objects" that change in regular clojure-land, outside of some transactional context.
Potentially there are ways to make it not too leaky.
I too have done things sorta like that in Clojure, at various points. For example, passing a f for to take a row and return a new row to update a Postgres database. It’s an involved layered cake though, requiring some Postgres-specific functionality (row-level locking). Not compatible with every SQL database.
The most straightforward way for a deref abstraction to work well is to have a truly immutable database (ala Datomic/Datascript).
This is pretty neat 🙂 Gread looking repo and examples. Interesting use cases https://github.com/johnmn3/poly-map/blob/main/examples.md; my only worry is that in the wrong hands someone might attach a bit too much … “behavior” to the map where it effectively becomes a dancing object 😂 Other than that, with moderation, and “proper” usage, it is very interesting.
logging read access is also cool, and most likely harmless
Yeah, there's definitely tradeoffs. I'll probably add a "freeze" method that removes the ability to change its implementation, like a normal map. So you could prototype in "dancing" mode and then lock it in place once you have what you like.
Another thing, anything you can build with poly-maps, you can technically build it with regular deftypes, by hand. And it'll go faster than the poly-map implementation. So for that purpose you could use poly-maps for prototyping the behaviors you want to arrive at and once you have what you want then rebuild it by hand with deftypes.
Like, if you know you'll never want the behavior for a thing to change, why pay for the overhead?
But yeah the point is to have a dancing object. Just to make it easier to reshape things, even at runtime
I suppose it's useful here to rearticulate one of our values: "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." - Alan Perlis
So we've always valued the lack of proliferation of new types in clojure. So we've warned people away from deftype, saying "please leave that to the experts." Which I still think is good advice. And deftype lacks all kinds of creature comforts available in clojure functions, so it's just super hard to do, working with deftype. I just don't think something should be hard to use, just because it's a sharp tool. Sharp tools can be easy to use too.
We should still warn against creating a proliferation of incompatible maps. Where possible, don't break existing contracts, unless the purpose of the new behavior is explicitly meant to.
anything you can build with poly-maps, you can technically build it with regular deftypes, by hand…Yes, absolutely.
I looked at the code briefly.
Let me see if I understand the high level structure:
1. There’s (deftype PolyMap ...) definition.
2. The PolyMap definition takes a regular Clojure map m and an impls map which defines potential custom behavior
◦ m is a regular Clojure map, provided by the user
◦ impls is an internal/“private” map
3. PolyMap mostly everywhere calls through to the underlying Clojure map implementation
◦ Critically, before this call happens, it checks for any custom impls and executes/runs that functionality
Is that about right? 🙂
P.S. And I see there’s additional TransientPolyMap to support transients, etc
That's pretty much it, yup!
I wouldn't say though that m is "provided by the user." There's a low level constructor where you can pass in the actual m map. For regular usage, you'd construct poly-maps like hash-maps, like (poly-map :a 1 :c 3 ... or (into empty-poly-map {:x 5 :y 6 ..., etc.
m is the "inner" map, which our methods delegate to
Ah interesting…
So no way to say like (poly-map my-existing-huge-map) ?
But PolyMap has three maps inside it. m, impls and metadata
(thinking of the use case, of “my map is huge, I don’t want to seq over it”)
There's a private constructor that allows for that syntax but it's not recommended for usage downstream
Yeah, perhaps there's a use-case for that
I understand there might be a PolyMap internal convenience to “see” every key… but it might be nice for certain cases to have a way to avoid that (again, if perf is a concern, and the user’s map is large)
impl/poly-map* will give you that escape hatch. But don't expect things like metadata to carry over, etc.
I see, cool
Most people might not mind though… I’ve recently been working with large in-memory maps, so my perspective might be a bit skewed
Well, it's basically free, since we're delegating to a map anyway, so why not make it an option? For general usage I'd want to lean into intuition/semantics around hash-map, so as to keep things simple, but since this is free, it should def be an option.
Yes… I guess if you’re using PolyMap for everything, that wouldn’t be an issue 🙂
But this illustrates the problem of conversion for large maps:
(do
(time (into {} (zipmap (range 1000000) (range 1000000))))
:done)
(takes ~500 ms on my machine, and very likely more on most commodity cloud VMs…)
For sure. And being able to take a gigantic map, wrap it in a poly-map, do you're fancy thing on it, then do get-coll on it to get the hash map back out of it - that'd be a useful thing
PS The initial zipmap takes 250ms… so 250ms for the O(n) conversion
In terms of “API”, it’s kinda like (transient {}) … takes a fully formed map and gives you another map (object?), with different characteristics…
(poly {}) or something like that 🙂
I thought about making poly a mode, like transient, where when you're in poly mode, assoc and dissoc operate on impl instead of m. Cally poly on it, add your methods, then call persistent! and it dumps m back out to you.
But it doesn't really need to be moded like that
FWIW my naming suggestion was not fully thought out, I was just brainstorming 😉
Well, no, you'd add your methods, then use the wrapped m and then later dump out m with persistent! later when you're done with the poly wrapper
Well I don't plan on making it a mode like transient
But I'll def make that workflow smoother, so you can quickly wrap and unwrap large maps
I believe poly-map/hash-map require an even number of args. I could just make the one arg arity of poly-map expect a whole m as a parameter. So (poly-map big-map)