This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-01-24
Channels
- # announcements (22)
- # babashka (33)
- # babashka-sci-dev (161)
- # beginners (25)
- # calva (57)
- # cider (6)
- # clara (6)
- # clerk (14)
- # clj-kondo (24)
- # clojars (10)
- # clojure (65)
- # clojure-austin (1)
- # clojure-conj (2)
- # clojure-europe (23)
- # clojure-miami (3)
- # clojure-nl (3)
- # clojure-norway (3)
- # clojure-uk (3)
- # clojurescript (28)
- # cursive (24)
- # datomic (136)
- # emacs (38)
- # graalvm (29)
- # graphql (3)
- # introduce-yourself (8)
- # jackdaw (4)
- # jobs-discuss (9)
- # malli (5)
- # nbb (36)
- # off-topic (11)
- # pathom (58)
- # polylith (2)
- # practicalli (1)
- # re-frame (5)
- # reagent (11)
- # releases (1)
- # remote-jobs (8)
- # sci (15)
- # shadow-cljs (31)
- # slack-help (2)
- # spacemacs (11)
- # sql (7)
- # tools-build (9)
Open ended question here, i’m developing a JSON API powered by Datomic, typically in REST design, your PUT method overwrites document-as-a-whole, e.g. PUT /entity/123 {:x …, y: …., z: … } PUT /entity/123 {:x …., y: …. } would imply a retraction of z attribute. Also likewise, if you had JSON arrays, typically in REST you’d overwrite the array-as-a-whole. This leads to a -bit- of tension with Datomic’s information accrual model. I can make it all work of course. But I was wondering if anyone reached a similar tension, and took a different path, either away from traditional PUT REST design, etc. Maybe Datomic’s model is trying to push me in a better direction, i’m not sure, i’m taking a step back to reconsider the API. The problem domain i’m working with is an asset management system.
The semantics of PUT are “replace the whole resource, make the thing look like this,” which implies removing things that are absent. If you care about http semantics, this is what you gotta do, doesn’t matter if it’s datomic or sql or whatever else.
This only feels “natural” in a document store, because http was designed for documents
it is likely not a good semantic for a json-rpc oriented system where you want finer-grained updates
nor for any clojure-inspired, open-map, let-through-what-I-don’t-understand architecture
finer grained updates i can use PATCH, but it would be odd for our clients to support PaTCH but not PUT
I’m just pointing to the tension between PUT (which only operates on resources with a full representation) and what most people actually want to do with a db.
There’s clearly a tension, and I want to avoid it if there’s another way, otherwise i’m having to calculate all these retract statements for omitted attributes
To avoid that tension, you’re saying I would have to exit REST entirely then, if I understand correctly
PUT is only useful to create entities when you want the “caller” to control the resource’s url
I like that, don’t know if our clients will, but that seems logical from a datomic POV
So you arrived at this decision on your own, without a datomic backend, to use POST, PATCH, DELETE, and/or PUT create-only, PATCH, DELETE that’s interesting
I used to be a much bigger fan of “do it the restful HATEOS way”, but I’ve encountered so many situations where it just felt like make-work and it wasn’t adding anything useful
so recognize when you’re doing essentially json RPC and be explicit about not using advanced features of HTTP unless they have practical value
PUT create-only, isn’t idempotent , which is a nice-to-have attribute , but oh well
I wouldn’t use “real” http without something like liberator’s decision tree to manage the complexity of all the negotiation
conditional headers are allowed on unsafe methods, so you can definitely conditional-put
Alright, well this certainly helps, thanks, i’ll keep mulling this over before I write some code
@U09R86PA4 any suggestions on what PATCH payload format to use, i’ve been looking at https://www.rfc-editor.org/rfc/rfc7386 just point me in the right direction if you know better things, happy to read
it turns out in PATCH RFC it explicitly says you can create new resources too, so if i really want i don’t need any of POST, PUT, etc
I can’t believe there’s an rfc for merging two json maps. And it has errata. And it still doesn’t handle merging array values.
elastic search API said to hell with PATCH, and they have their own scripting language to define diffs :/
The two problems with http: there aren’t that many cases where resources actually have more than one representation that are all complete and substitute-able enough to be the same resource instead of different ones, and trying to do that is a large portion of http’s complexity; and there’s often a fundamental asymmetry between the shape of read and write operations, and trying to make a universal and representation-independent mechanism for that is a big task that in practice practitioners of rest have given up on. As you are seeing with unsatisfying gestures towards PATCH.
There’s nothing wrong with “json is the only representation, and we have a POST which you just have to read our docs to know how to use”
ya on the mutation/patch side i guess there’s really nothing wrong about treating it like json rpc
There are two hateos principles I think are nice and with saving if you can: of using Uris to bookmark state (especially if you can leverage caching or conditional request features of http) and returning representations which have discoverable annotated uri links in them for actions vs making the client know uri structure and construct Uris itself (but you will probably have to define that representation yourself, because html is the only one in the wild that is widely understood and it’s really meant for human “clients”)
The status quo state of the art nowadays is definitely something like swagger and generated or custom clients, not really the hateos/http vision, and people are getting work done that way
I guess just be internally consistent and coherent, and don’t worry about an abstract ideal if it brings painful and confusing impedance mismatches
the scripting language idea isn’t terrible either given the low state of the art, and will be much easier to roll my own lisp interpreter in clojure than C
we’re already on swagger, so hypermedia links probably overkill, i checkout some other hateos concepts
Maybe even make it generate or use it as an interpreter for large parts of the server boilerplate
I'm working on a batch process that walks the tx-log and barely benefits from peer caching. To reduce GC pressure, I've lowered the object cache to the minimum, -Ddatomic.objectCacheMax=32M
. Since then, I've noticed that larger queries (apparently) never finish. For example, querying for 10,000 eids, or calling pull-many for large/deep tree. Should I assume that the peer will have issues if it can't install query results in the small cache?
(Also, no need to comment on using a Peer Server; that's not in the cards for us at the moment.)
The object cache is in a sense the database. It is an object graph whose nodes are lazily loaded from storage and decoded to objects as code accesses object properties, and objects are evicted as new loads need room.
There are various peer metrics which can diagnose this for you, such as object cache hit rate and storage gets
> It sounds like your OC is small enough that IO churn is dominating your query time It's true that I/O is dominating my query time, but the queries in question do actually finish if I up the size of the OC, e.g. to the default OC value.
@U09QBCNBY the point I think is that your object cache is so small that you may have segments being read/written to ocache and immediately cleared/unable to then put the result in cache or as you say install the result in the cache. Do you see any error in the logs? I'd also be interested in seeing thehttps://docs.datomic.com/on-prem/api/io-stats.html under this configuration and under the higher ocache config where the query returns. What size heap are you using on the peer and is the peer connected to multiple databases? I would also want to know what GC pressure you were expereincing that prompted you to lower ObjectCacheMax to 32M, but I guess that's a separate issue.
Regarding GC pressure, when profiling I noticed I was having long GC pauses and lots of CPU dedicated to GC. Most of my memory was old generation objects and there was very little room for younger generational objects to grow before GCs occurred. I was able to ameliorate this by lowering the OC -- my reasoning being that this process benefits very little from application caching (aside from tx-log segments). After putting the cache to the floor my GC churn is way down.
Also it's a process that always runs with a cold start and then shuts down. I know that we should be using a Peer Server for this sort of thing, but we just don't have the bandwidth to set that up at the moment.
I compared the peer debug logs when running with default and floor OC settings. All I see is a bunch of :kv-cluster/get-val events in both cases. The only difference is that with default cache this finishes after 15 seconds or so, and with the cache at 32M, it keeps going and eventually my fans start kicking out steam until I kill the peer
Are you suggesting grouping get-val events by :val-key, or just the average number of get-val events during a window of time?
in terms of simply counting the number of get-val events, that's going to depend on when I kill the process. It will apparently continue indefinitely if I don't kill (when OC=32)
sure, the first query finishes in 13 seconds. so I'll truncate the other log at the same time
This may be educational, but I don’t think there’s a deep mystery here: your OC is too small
> I’m working on a batch process that walks the tx-log and barely benefits from peer caching.
“barely benefits from peer caching”--seemingly not true, because a bigger OC makes it go faster?
that’s just the extreme limit of slower 🙂 I’m sure if you waited long enough (hours? days?) it would finish.
In my initial testing, where I was not using batch queries, I am able to floor the OC and see it finish, with less CPU usage and more quickly
it was only when I tried to lower I/O thru batching queries that I ran into issues with the low cache setting
let me give an example, as there are a couple of situations. Instead of testing N eids one-by-by if they are a root (using a membership-id:)
(defn is-root?
"Return truthy if eid has the membership-attr."
[db membership-attr eid]
(first (d/datoms db :eavt eid membership-attr)))
I can instead test N eids at once with something like
(d/q '[:find [?e ...]
:in $ [?e ...] ?a
:where
[?e ?a]]
db eid-batch membership-attr)
All things being equal, I would expect the latter to need more segments loaded at once to work
the process trawls the tx-log and looks to extract a single entity type for a given time span. So it ends up computing roots relative to some membership-id
Right, that makes sense. But that sounds like it's a requirement just to get it to work, not it benefiting from caching. With this process, it's almost guaranteed that I will not need to fetch the same thing twice.
in my existing implementation, which worked one-by-one, lowering the cache removed long GC pauses and excessive CPU time spent on gc
In the peer model, the OC is functionally the data all queries read, as they cannot read things unless they are loaded there. It’s not like a traditional disk-backed db, where you may be doing zero-copy reads and the cache is just to avoid some disk. So whatever a query needs to read has to be in memory, or it churns in and out forever.
a d/q is doing work in parallel, reading multiple segments at a time, etc. and d/datoms doesn’t
In fairness, I imagine I could have gotten similar results by doubling the heap, but then I'll have to bug infrastructure 😂
Got it, my mental model was in wishful thinking mode -- hoping that it could read results without retrieving them from OC
> Your GC + cpu problems are curious though, and possibly unrelated 👍 I have been trying to get to the bottom of this with profiling and testing alternative implementations!
if lowering the OC improved things, and you were using very modest result-size interfaces like d/datoms, that suggests your application code itself is a source of allocation pressure
😂 So my real goal is to lower execution time, and I've been trying to trial and error to figure out what my bottleneck is. Since I've been iterating, I have been stacking approaches that were beneficial. And since the low cache setting was only beneficial with the one-by-one approach, i have no attachment to it -- and I only noticed it because I left it in place when i moved to test/profile batching (larger queries)
there's definitely a lot of young generational objects allocated by my application code. (old gen is steady and fixed after initial build up)