Fork me on GitHub
#datomic
<
2023-01-24
>
jasonjckn22:01:26

Open ended question here, i’m developing a JSON API powered by Datomic, typically in REST design, your PUT method overwrites document-as-a-whole, e.g. PUT /entity/123 {:x …, y: …., z: … } PUT /entity/123 {:x …., y: …. } would imply a retraction of z attribute. Also likewise, if you had JSON arrays, typically in REST you’d overwrite the array-as-a-whole. This leads to a -bit- of tension with Datomic’s information accrual model. I can make it all work of course. But I was wondering if anyone reached a similar tension, and took a different path, either away from traditional PUT REST design, etc. Maybe Datomic’s model is trying to push me in a better direction, i’m not sure, i’m taking a step back to reconsider the API. The problem domain i’m working with is an asset management system.

favila23:01:11

The semantics of PUT are “replace the whole resource, make the thing look like this,” which implies removing things that are absent. If you care about http semantics, this is what you gotta do, doesn’t matter if it’s datomic or sql or whatever else.

favila23:01:34

This only feels “natural” in a document store, because http was designed for documents

favila23:01:15

it is likely not a good semantic for a json-rpc oriented system where you want finer-grained updates

favila23:01:42

nor for any clojure-inspired, open-map, let-through-what-I-don’t-understand architecture

jasonjckn23:01:04

finer grained updates i can use PATCH, but it would be odd for our clients to support PaTCH but not PUT

favila23:01:49

well, what do you care about more, HATEOS or JSON RPC?

jasonjckn23:01:09

hateos? i’ll have to google that one min

favila23:01:09

are you manipulating document resources or database state?

jasonjckn23:01:38

i’m manipulating entities with attributes

jasonjckn23:01:51

eg. machine with ip addresses

jasonjckn23:01:13

and list of installed software

jasonjckn23:01:16

it’s asset management

favila23:01:15

I’m just pointing to the tension between PUT (which only operates on resources with a full representation) and what most people actually want to do with a db.

favila23:01:54

If you’re not sure, I would use POST or PATCH with specific operations

jasonjckn23:01:14

There’s clearly a tension, and I want to avoid it if there’s another way, otherwise i’m having to calculate all these retract statements for omitted attributes

favila23:01:19

The rigid meaning of PUT is why no one uses it much

jasonjckn23:01:27

To avoid that tension, you’re saying I would have to exit REST entirely then, if I understand correctly

jasonjckn23:01:35

any sugggestion for the replacement

favila23:01:39

You gotta bend it where it makes sense to bend it

favila23:01:56

POST is restful too, “perform this operation on this resource”

jasonjckn23:01:22

well PATCH is datomic friendly for the most part

jasonjckn23:01:25

but I need more than just that..

jasonjckn23:01:44

in REST RFC I don’t believe PATCH can create entities

favila23:01:54

yeah, POST does

jasonjckn23:01:19

so you think POST, PATCH, DELETE, is the way to go

favila23:01:22

PUT is only useful to create entities when you want the “caller” to control the resource’s url

favila23:01:34

thats pretty darn rare

jasonjckn23:01:52

thath actually is our use case for some of the assets

favila23:01:17

ok, then perhaps you can use it for that, and reject for updates

jasonjckn23:01:19

but I can just have POST take a parameter for the resource URL / request URI

jasonjckn23:01:31

that’s a possibility too

jasonjckn23:01:50

I like that, don’t know if our clients will, but that seems logical from a datomic POV

favila23:01:06

I’m not sure how datomic changes the picture here

favila23:01:28

These musings of mine are a decade old, long before datomic

jasonjckn23:01:19

So you arrived at this decision on your own, without a datomic backend, to use POST, PATCH, DELETE, and/or PUT create-only, PATCH, DELETE that’s interesting

favila23:01:20

I used to be a much bigger fan of “do it the restful HATEOS way”, but I’ve encountered so many situations where it just felt like make-work and it wasn’t adding anything useful

favila23:01:51

I arrived at the decision that not every problem is modeled well by resources

👍 2
favila23:01:56

in the http model

favila23:01:08

but you gotta use http as a transport layer no matter what

favila23:01:52

so recognize when you’re doing essentially json RPC and be explicit about not using advanced features of HTTP unless they have practical value

jasonjckn23:01:02

PUT create-only, isn’t idempotent , which is a nice-to-have attribute , but oh well

favila23:01:22

if-match/if-none-match can fix that

favila23:01:09

I wouldn’t use “real” http without something like liberator’s decision tree to manage the complexity of all the negotiation

jasonjckn23:01:10

that’s used on GET only I thought

jasonjckn23:01:12

maybe i’m rusty

favila23:01:21

conditional headers are allowed on unsafe methods, so you can definitely conditional-put

favila23:01:44

you can even conditional-post

jasonjckn23:01:27

Alright, well this certainly helps, thanks, i’ll keep mulling this over before I write some code

jasonjckn23:01:02

gives me confidence to push back on the whole world going REST

jasonjckn05:01:08

@U09R86PA4 any suggestions on what PATCH payload format to use, i’ve been looking at https://www.rfc-editor.org/rfc/rfc7386 just point me in the right direction if you know better things, happy to read

jasonjckn05:01:58

it turns out in PATCH RFC it explicitly says you can create new resources too, so if i really want i don’t need any of POST, PUT, etc

favila05:01:01

I can’t believe there’s an rfc for merging two json maps. And it has errata. And it still doesn’t handle merging array values.

😆 4
jasonjckn05:01:47

merging array values part was what got me 😫

jasonjckn05:01:13

elastic search API said to hell with PATCH, and they have their own scripting language to define diffs :/

jasonjckn05:01:45

everything i see out in the wild and RFC seems ‘off’

favila06:01:29

The two problems with http: there aren’t that many cases where resources actually have more than one representation that are all complete and substitute-able enough to be the same resource instead of different ones, and trying to do that is a large portion of http’s complexity; and there’s often a fundamental asymmetry between the shape of read and write operations, and trying to make a universal and representation-independent mechanism for that is a big task that in practice practitioners of rest have given up on. As you are seeing with unsatisfying gestures towards PATCH.

favila06:01:01

There’s nothing wrong with “json is the only representation, and we have a POST which you just have to read our docs to know how to use”

👍 2
jasonjckn06:01:51

ya on the mutation/patch side i guess there’s really nothing wrong about treating it like json rpc

favila06:01:09

There are two hateos principles I think are nice and with saving if you can: of using Uris to bookmark state (especially if you can leverage caching or conditional request features of http) and returning representations which have discoverable annotated uri links in them for actions vs making the client know uri structure and construct Uris itself (but you will probably have to define that representation yourself, because html is the only one in the wild that is widely understood and it’s really meant for human “clients”)

favila06:01:05

But at the end of the day, do whatever gets the job done I think

👍 2
favila06:01:07

The status quo state of the art nowadays is definitely something like swagger and generated or custom clients, not really the hateos/http vision, and people are getting work done that way

👍 2
favila06:01:27

I guess just be internally consistent and coherent, and don’t worry about an abstract ideal if it brings painful and confusing impedance mismatches

favila06:01:35

Is my midnight rambling take

🌙 2
favila06:01:18

And maybe a slightly sad one because I do really miss the old semantic web vision

jasonjckn06:01:45

the scripting language idea isn’t terrible either given the low state of the art, and will be much easier to roll my own lisp interpreter in clojure than C

jasonjckn06:01:57

thanks for all the ideas

jasonjckn06:01:24

we’re already on swagger, so hypermedia links probably overkill, i checkout some other hateos concepts

favila06:01:27

Yeah if you’re on swagger I would just lean in to it as much as possible

jasonjckn06:01:59

ya it certainly likes you having three dozen POST apis , handles it like a champ

favila06:01:07

Maybe even make it generate or use it as an interpreter for large parts of the server boilerplate

👍 2
favila06:01:34

Eg validation of inputs, url structure etc

uwo23:01:00

I'm working on a batch process that walks the tx-log and barely benefits from peer caching. To reduce GC pressure, I've lowered the object cache to the minimum, -Ddatomic.objectCacheMax=32M. Since then, I've noticed that larger queries (apparently) never finish. For example, querying for 10,000 eids, or calling pull-many for large/deep tree. Should I assume that the peer will have issues if it can't install query results in the small cache? (Also, no need to comment on using a Peer Server; that's not in the cards for us at the moment.)

favila00:01:42

The object cache is in a sense the database. It is an object graph whose nodes are lazily loaded from storage and decoded to objects as code accesses object properties, and objects are evicted as new loads need room.

favila00:01:24

It sounds like your OC is small enough that IO churn is dominating your query time

favila00:01:05

There are various peer metrics which can diagnose this for you, such as object cache hit rate and storage gets

favila00:01:56

Note that the tx log segments occupy object cache too as you read them

👍 2
uwo01:01:02

> It sounds like your OC is small enough that IO churn is dominating your query time It's true that I/O is dominating my query time, but the queries in question do actually finish if I up the size of the OC, e.g. to the default OC value.

uwo01:01:12

(Also, thank you for the reply!)

jaret01:01:37

@U09QBCNBY the point I think is that your object cache is so small that you may have segments being read/written to ocache and immediately cleared/unable to then put the result in cache or as you say install the result in the cache. Do you see any error in the logs? I'd also be interested in seeing thehttps://docs.datomic.com/on-prem/api/io-stats.html under this configuration and under the higher ocache config where the query returns. What size heap are you using on the peer and is the peer connected to multiple databases? I would also want to know what GC pressure you were expereincing that prompted you to lower ObjectCacheMax to 32M, but I guess that's a separate issue.

uwo01:01:28

Sorry, I don't know why I said "installed" to cache.

uwo02:01:14

the heap is Xmx=4g and Xms=4g

jaret02:01:15

No worries. I think I know what you meant... in terms of written to it.

👍 2
uwo02:01:29

I will look into errors and io-stats first thing tomorrow!

jaret02:01:08

I think I might also be able to repro and will try tonight/tomorrow.

jaret02:01:51

If there are multiple DBs on the peerthough they compete for Objectcache.

uwo02:01:34

Regarding GC pressure, when profiling I noticed I was having long GC pauses and lots of CPU dedicated to GC. Most of my memory was old generation objects and there was very little room for younger generational objects to grow before GCs occurred. I was able to ameliorate this by lowering the OC -- my reasoning being that this process benefits very little from application caching (aside from tx-log segments). After putting the cache to the floor my GC churn is way down.

uwo02:01:28

Only one DB on this peer.

uwo02:01:22

Also it's a process that always runs with a cold start and then shuts down. I know that we should be using a Peer Server for this sort of thing, but we just don't have the bandwidth to set that up at the moment.

uwo20:01:34

I compared the peer debug logs when running with default and floor OC settings. All I see is a bunch of :kv-cluster/get-val events in both cases. The only difference is that with default cache this finishes after 15 seconds or so, and with the cache at 32M, it keeps going and eventually my fans start kicking out steam until I kill the peer

favila20:01:13

What is the relative quantity of get-val events?

favila20:01:28

I suspect the slower one is getting the same vals over and over

2
favila20:01:35

so you have many, many more of them

uwo20:01:31

Are you suggesting grouping get-val events by :val-key, or just the average number of get-val events during a window of time?

favila20:01:39

grep -F ':kv-cluster/get-val' | wc -l is a first approximation

🙏 2
favila20:01:54

or just look at the storageget/memcacheget/valcacheget metrics

uwo20:01:10

in terms of simply counting the number of get-val events, that's going to depend on when I kill the process. It will apparently continue indefinitely if I don't kill (when OC=32)

favila20:01:24

for the same span of time

favila20:01:33

15 minutes both cases

uwo20:01:33

doh 👍

uwo20:01:59

sure, the first query finishes in 13 seconds. so I'll truncate the other log at the same time

favila20:01:11

This may be educational, but I don’t think there’s a deep mystery here: your OC is too small

favila20:01:46

> I’m working on a batch process that walks the tx-log and barely benefits from peer caching.

favila20:01:15

“barely benefits from peer caching”--seemingly not true, because a bigger OC makes it go faster?

uwo20:01:31

not true

uwo20:01:05

it's a matter of finishing or not finishing. not faster or slower

favila20:01:56

that’s just the extreme limit of slower 🙂 I’m sure if you waited long enough (hours? days?) it would finish.

uwo20:01:58

In my initial testing, where I was not using batch queries, I am able to floor the OC and see it finish, with less CPU usage and more quickly

uwo20:01:26

it was only when I tried to lower I/O thru batching queries that I ran into issues with the low cache setting

favila20:01:45

what do you mean by “batch queries”?

uwo20:01:55

let me give an example, as there are a couple of situations. Instead of testing N eids one-by-by if they are a root (using a membership-id:)

(defn is-root?
  "Return truthy if eid has the membership-attr."
  [db membership-attr eid]
  (first (d/datoms db :eavt eid membership-attr)))
I can instead test N eids at once with something like
(d/q '[:find [?e ...]
     :in $ [?e ...] ?a
     :where
     [?e ?a]]
db eid-batch membership-attr)

uwo20:01:37

there are other cases, like computing parents as well

favila20:01:36

All things being equal, I would expect the latter to need more segments loaded at once to work

2
uwo20:01:37

the process trawls the tx-log and looks to extract a single entity type for a given time span. So it ends up computing roots relative to some membership-id

favila20:01:54

thus larger OC

uwo20:01:01

Right, that makes sense. But that sounds like it's a requirement just to get it to work, not it benefiting from caching. With this process, it's almost guaranteed that I will not need to fetch the same thing twice.

uwo20:01:54

in my existing implementation, which worked one-by-one, lowering the cache removed long GC pauses and excessive CPU time spent on gc

favila20:01:48

In the peer model, the OC is functionally the data all queries read, as they cannot read things unless they are loaded there. It’s not like a traditional disk-backed db, where you may be doing zero-copy reads and the cache is just to avoid some disk. So whatever a query needs to read has to be in memory, or it churns in and out forever.

👍 2
favila20:01:05

a d/q is doing work in parallel, reading multiple segments at a time, etc. and d/datoms doesn’t

uwo20:01:08

In fairness, I imagine I could have gotten similar results by doubling the heap, but then I'll have to bug infrastructure 😂

favila20:01:21

Your GC + cpu problems are curious though, and possibly unrelated

uwo20:01:39

Got it, my mental model was in wishful thinking mode -- hoping that it could read results without retrieving them from OC

favila20:01:09

OC does not cache query results, it caches what queries read to get results.

🙏 2
favila20:01:22

there is no query result caching in datomic

uwo20:01:44

> Your GC + cpu problems are curious though, and possibly unrelated 👍 I have been trying to get to the bottom of this with profiling and testing alternative implementations!

favila20:01:29

so the irony is in trying to reduce IO you’ve actually increased it

favila20:01:59

if lowering the OC improved things, and you were using very modest result-size interfaces like d/datoms, that suggests your application code itself is a source of allocation pressure

favila20:01:11

how big is your heap?

uwo20:01:14

😂 So my real goal is to lower execution time, and I've been trying to trial and error to figure out what my bottleneck is. Since I've been iterating, I have been stacking approaches that were beneficial. And since the low cache setting was only beneficial with the one-by-one approach, i have no attachment to it -- and I only noticed it because I left it in place when i moved to test/profile batching (larger queries)

uwo20:01:08

the heap is Xmx=4g and Xms=4g

uwo20:01:58

there's definitely a lot of young generational objects allocated by my application code. (old gen is steady and fixed after initial build up)

uwo20:01:34

I've reimplemented a number of functions using transducers + reifying IReduceInit, but I haven't found any major improvements yet.

uwo20:01:58

(with the hope that I could lessen allocation)