Fork me on GitHub
#off-topic
<
2020-07-20
>
dpsutton00:07:37

well i think you'll read this one. its about the api of core.cache 🙂

Alex Miller (Clojure team)00:07:53

I've sworn off all blogging systems where the content is not sitting in a text file in a github repo

dpsutton00:07:25

ideally i could upload an org file and it would do all the stuff for me. but i don't want to set all that up now so i'm looking for a blog as a service

Alex Miller (Clojure team)00:07:42

well, jekyll / gh pages

seancorfield00:07:49

Yeah, I do all my blogging via http://github.io (and a custom domain). I write markdown, run a process to create HTML, and push to GH.

Alex Miller (Clojure team)00:07:51

or whatever to gh pages

dpsutton00:07:40

ha, mine's still an outline of torts from law school

seancorfield00:07:01

I use Octopress (a variant of Jekyll)

borkdude07:07:33

I'm also still using this but still on Octopress 2. Getting a lot of deprecation warnings from Ruby last time I ran it..

seancorfield16:07:35

@U04V15CAJ Yeah, last time I tried to update Ruby to get rid of those warnings, I ended up breaking Octopress and it took me days to get it all working again 😞

borkdude16:07:21

Still on Octopress 2 as well?

borkdude16:07:06

We're in the same boat then. Creating a Docker container crossed my mind, so at least it keeps working consistently

seancorfield16:07:41

Luckily, I don't use Ruby for anything so I can mostly just ignore it and leave it all "old"...

borkdude16:07:33

same here. I don't use any fancy stuff so if needed I can port it to my own Clojure based solution (maybe even using babashka) but that's work

seancorfield16:07:33

There are a couple of Clojure-based site generators already... I keep meaning to investigate them and maybe port my blog to one of them... but, hey, Octopress works and I have more important stuff to work on (and I don't blog as often as I used to).

borkdude18:07:06

Yep. There's been a few people taking up babashka as a blogging tool as well: https://www.mxmmz.nl/blog/building-a-website-with-babashka.html It's a bit more DIY, but at least you can understand every part easily.

phronmophobic01:07:06

this is great :thumbsup:

phronmophobic01:07:06

> Operating on just a cache value I would say the main issue with the ttl cache is that it's technically not a value

phronmophobic01:07:12

even if it's not in an atom

phronmophobic01:07:19

> When we used the through-cache function we computed our value 19 million times, but had we not cached we would only have computed it (* 20 20000) = 400,000 times!. I would also point out that both look-or-miss and core.memoize wrap the computation in a delay so that you don't compute the value more times than you request the value from the cache

seancorfield01:07:58

That's a great article -- I'll probably update the README on core.cache to point to it. Yes, absolutely, the clojure.core.cache.wrapped/lookup-or-miss function is the "best" API -- and it's how we use all the caches at work, but it didn't exist when I took the library over from Fogus.

seancorfield01:07:59

@U7RJTCH6J Yup. That's a vector for a DoS attack on core.cache, unfortunately. Someone specifically raised that JIRA issue, which is where the wrapped API came from.

seancorfield01:07:50

The README has a new section "The core.cache API is hard to use correctly. That's why clojure.core.cache.wrapped/lookup-or-miss exists: it encapsulates all the best practices around using the API and wraps it in an atom from the get-go. Read this article about the core.cache API by Dan Sutton for why it is important to use the API correctly!"

parens 6
David Pham07:07:54

Are these caveate still valid for core.memoize?

seancorfield16:07:54

@UEQGQ6XH7 core.memoize was designed to use the core.cache API correctly, although it did have a nasty bug with TTL cache handling (which I fixed last year).

David Pham16:07:32

So we could say that memoize is a higher level and safer API?

seancorfield17:07:26

core.memoize is for memoizing functions. It uses core.cache behind the scenes, but just as an implementation detail.

p-himik06:07:25

Perhaps a stupid question about clojure.core.cache.wrapped/lookup-or-miss and caching in general. Why does it attempt to put a newly computed value in the cache 10 times and why does it return nil if it fails? Why not just compute the value once, "try to cache and forget", and return the value?

seancorfield15:07:16

@U2FRKM4TW In the normal case, it would only attempt that once. There are some caches that have pathological edge cases where this repeated attempt is necessary. TTL caches, for example, have an edge case where they can return true from has? and then immediately fail on lookup -- and return a nil value even when you supply a "not found" value.

seancorfield15:07:42

This is a problem that core.memoize works around in the same way.

seancorfield15:07:21

You do not want the computation done at all if it is already in the cache.

p-himik16:07:17

Thanks. I'm afraid I still don't understand how having retries with value reevaluation could help TTL caches. But after reading some docstrings in clojure.core.cache, I realized that they make me even more confused. So maybe some other time. :)

seancorfield16:07:12

@U2FRKM4TW It is not reevaluating the value. It uses a delay to only evaluate it once. But, yes, core.cache code is very complicated 😐

p-himik16:07:15

Ah, jeez, you're right. s/reevaluation/reinsertion/.

chrisblom15:07:03

i avoid core.cache and wrap one of the many caching libs for java. Keeping the whole cache in an atom can give perf. issues when contention is high due to retries caused swap!. I've been bitten badly by this a couple of time.

dharrigan15:07:13

which one to you wrap? Caffeine is quite popular. I was considering that one.

chrisblom17:07:31

yeah i've wrapped Caffeine

chrisblom17:07:38

also used this in java project a lot

chrisblom17:07:49

solid library, should do the trick

dharrigan18:07:58

Yeah, I use Caffeine for my Kotlin projects. I like it a lot.

dpsutton15:07:17

i go through that in the article. ended up computing something 20 million times when caching it over 400,000 calls. There is a lovely function (and which i conclude is the only api you need for core.cache) in clojure.core.cache.wrapped/lookup-or-miss which will compute at most once and try to set the cache at most 10 times before abandoning

dpsutton15:07:37

not saying don't take your strategy but the solution i found there should be the starting point of core.cache when evaluating

chrisblom15:07:30

i don't see how lookup-or-miss avoids retries caused by swap!

chrisblom15:07:04

when many threads are concurrently updating the cache

chrisblom15:07:23

i don't really see the point of keeping the cache in an atom, caches are mutable anyway

chrisblom15:07:12

so you might as well use something backed by ConcurrentHashMap, or one of the java libs, which scale much better w.r.t. the number of threads, and are easier to use correctly

chrisblom15:07:43

i guess the composition of different cache types is a nice feature of core.cached, but i never needed it in practice

dpsutton15:07:10

ah, one thing i'm noticing now that you mention it is that if this function retries 10 times and never gets to pull the value out of the cache it returns nil rather than the delayed computed value. which seems off to me. i had misread it earlier

chrisblom15:07:46

hmm, that is another issue

dpsutton15:07:32

yeah. i'd prefer a function that did the single swap with through cache, and if the lookup succeeds, return that value else return the computed value without looping

dpsutton15:07:56

kind of a fire and forget once into the cache. if there's so much churn on it, don't care if your value has already been evicted.

dpsutton15:07:04

seems like an artificial fight at that point

chrisblom15:07:55

well, that's the problem of using an atom

dpsutton15:07:13

i mean the swap! in a loop of 10 attempts

dpsutton15:07:26

but i see your point about the swap

dpsutton15:07:38

there's no bounded swap either, is there

phronmophobic16:07:29

you can use compare-and-set! to manually bound the number of attempts to update an atom

chrisblom15:07:01

no, it retries until the compare and set succeeds

chrisblom15:07:57

i guess you could hack it in by throwing an exception after n invocations of the function

chrisblom15:07:18

but that's overly complex IMO

chrisblom15:07:32

you're just working against the system at that point

dpsutton15:07:51

yeah. at that point you just abandon the atom

chrisblom15:07:05

for production usage, i also want the hit/miss rate exposed for metrics, last time i used core.cached I had to add this myself

phronmophobic16:07:29

you can use compare-and-set! to manually bound the number of attempts to update an atom

dominicm16:07:48

Seems like an agent would do a decent job of limiting contention.

chrisblom17:07:08

these are all workarounds IMO

chrisblom17:07:35

agents run async, which is not suitable for all use cases

chrisblom17:07:40

another problem with using swap! to update the cache is that you should not use functions with side-effects

andy.fingerhut17:07:50

I believe the technique adopted by core.cache is to represent the cache as an immutable collection, and updating the cache is thus replacing the value that is the contents of the atom with a new immutable collection.

andy.fingerhut17:07:03

The only side effect is the one that swap! is designed for

chrisblom17:07:19

This technique has some problems

andy.fingerhut17:07:57

I am not saying that there is no other way to design a cache -- just that it does not have the problem of using functions passed to swap! that have side effects.

chrisblom17:07:09

thats not true

chrisblom17:07:32

for example, if i want to cache a function that reads something from a database

phronmophobic17:07:33

most implementations that use core.cache with an atom wrap the calculation of the new value in a delay to ensure that the calculation is done at most once

phronmophobic17:07:01

or at least, they should

hiredman17:07:06

I believe what core.memoized does (layered on top of core.cache) is wrap everthing in a delay, and the delay may be swapped in multiple times, but then derefing the delay the computation is run only once

👍 6
chrisblom17:07:26

yes that is way to avoid it

chrisblom17:07:18

but it's not that obvious

chrisblom17:07:57

and the read me for the project has several examples that do not do this

chrisblom17:07:14

(cache/through-cache C1 my-key (partial jdbc/get-by-id db-spec :storage))

chrisblom17:07:51

in the readme, may actually do more harm than good under enough contention

chrisblom17:07:22

causing more reads the to the database than calls to this expression due to retries

chrisblom17:07:13

anyway, i dont want to be to negative, there are plenty of valid use cases for core.cache. My point is that under contention core.cache can behave in unexpected ways, so be careful.

dpsutton17:07:56

(cache/through-cache C1 my-key (partial jdbc/get-by-id db-spec :storage)) and this returns a cache that might not even have your value. so you need to call the function again

dpsutton17:07:16

ie think only the ttl cache is susceptible to this though

andy.fingerhut18:07:58

It is interesting that clojure.core/memoize mentions that it only supports referentially transparent functions, but the core.memoize library seems to make no mention of that restriction.

andy.fingerhut18:07:56

Yeah, no occurrences anywhere in the core.memoize repo files for "referential" "pure" "side" or "effect". And examples in docs/Using.md show wrapping of functions whose names strongly imply side effects, like fetch-user-from-db

chrisblom19:07:58

in core.memoize it's not really a problem as the function is wrapped in a delay, so it will run only once despite retries

emccue19:07:02

Does anyone know if there is an exemption for US citizens travelling to emigrate or take work in a european country?

p-himik19:07:25

I believe rules vary by country.

dominicm19:07:46

#jobs-discuss will likely get more hits

seancorfield19:07:47

@chrisblom That's why clojure.core.cache.wrapped exists with lookup-or-miss, to ensure the value-fetching function is only called once (at most).

seancorfield19:07:14

Using clojure.core.cache correctly is hard. Using clojure.core.cache.wrapped is a lot easier. @dpsutton and I were just discussing composition of caches being broken in several combinations due to the way most caches rely on the hit method for tracking 😐

chrisblom19:07:42

Ah ok, it also used the wrap with delay trick

seancorfield19:07:44

If you just use a single (non-composed) cache and you pretty much only use wrapped/lookup-or-miss then you're pretty safe.

seancorfield19:07:39

Yeah, someone raised a JIRA issue ages ago about the through function allowing for cache stampede -- and there's no way to fix it inside the cache itself, only in a well-behaved caller -- which is what that wrapped function is for.

seancorfield19:07:38

All the other stuff in the wrapped ns is just to allow for it to be a drop-in replacement for the original API, to make it easier to switch over.

seancorfield19:07:33

I very much doubt there is extensive use of core.cache outside the most basic cache types (basic, TTL -- which has its own "interesting" gotchas since has? can return true and then lookup can return nil / not-found).

seancorfield19:07:54

We use core.cache.wrapped very heavily at work, but only with basic and TTL caches.

chrisblom19:07:18

the wrapped namespace is great, i don't think it was there yet the last time I used core.cache

chrisblom19:07:17

i also used the delay trick, but had problems with edgecases if the function would throw

chrisblom19:07:22

i find the retryable delay in core.memoized an elegant solution actually

chrisblom19:07:37

what does the lookup-or-miss in core.cache.wrapped do when the wrapped function throws an exception?

chrisblom19:07:05

firing up the repl to find out...

chrisblom19:07:08

it does the right thing, nothing is cached if the fn throws

seancorfield19:07:50

You can provide a wrapper function (in the 4-arity version of lookup-or-miss) to control how that behaves.

seancorfield19:07:28

(and, of course, you're already passing the value function which can also control that)

seancorfield19:07:45

So there are multiple levels of control.

seancorfield19:07:04

(although I've never found a use for the wrapper function, as I recall)

seancorfield19:07:56

In the general case, I would expect folks would not want a value added to the cache if the value fn throws an exception tho'...?

chrisblom19:07:22

that was the mistake i made back then

seancorfield19:07:02

There's no doubt that core.cache is very hard to use correctly 🙂

seancorfield19:07:06

(we stopped using the main API completely at work and switch to the wrapped API instead)

phronmophobic19:07:10

i think highlights the fact that even though you want tools and constructs made out of simple stuff, you don't always want to directly use the simple stuff

andy.fingerhut20:07:42

I am not sure, but some of the hard-to-use API calls exposed from the core.cache and core.memoized library seem like they might best be categorized as "not simple", i.e. "complex", because using them in a correct fashion requires a sequence of calls, and/or calls made in the correct context (e.g. inside of a swap! function, or not), that intertwines them quite a bit.

andy.fingerhut20:07:10

I don't think "simple" means the same thing as "short implementation, by count of lines of code"

dpsutton20:07:12

and what drove me to look into it was because the api didn't make it obvious how it should be used. and thus there are lots of different styles of using it in the wild with many being wrong

phronmophobic20:07:16

several of the pitfalls emerge from using caches in conjunction with atoms and the fact that there are cache implementations that aren't idempotent (ie. the ttl cache)

phronmophobic20:07:38

one element of being "complex" is that it can be broken down into simpler pieces. I'm not sure how you would further simplify the cache protocol: lookup, has?, hit , miss, evict, and seed. I would be curious if someone has some thoughts here.

dpsutton20:07:10

my thesis is that so many pieces make it far more likely for it to be misused. and using those is almost always wrong

dpsutton20:07:47

to me the consumer api is the single lookup-or-miss. and even that can return nil. but everything else is a footgun

phronmophobic20:07:52

personally, I want tools and constructs made out of simpler pieces (which usually means more pieces) even if I don't use the simpler pieces directly

phronmophobic20:07:00

I think think it's good that swap! packages the CAS model in a way that makes it easier to use, but I'm also happy that it is built from simpler parts (eg. I can use compare-and-set! when it makes sense).

dpsutton20:07:06

i use a countdown latch in my testing. and i would never want its constituent pieces exposed. i want a single dumb api exposed of decrement and wait. same thing with a cache. i don't want to be in charge of hitting on successful lookups

andy.fingerhut20:07:26

There is also a notion of trying to break thing up into pieces that are orthogonal, or independent, and can be composed in fairly arbitrary ways. Breaking something up into pieces that are small, but can only be correctly combined in about 1% of the possible ways that one can imagine combining them in, definitely seems low on a measure of independence/orthogonality

phronmophobic20:07:52

it seems like most(all?) of the issues directed at the cache protocol are really incorrect usages of atoms

phronmophobic20:07:21

eg. (using mutable cache implementations like ttl and sending side effecting functions to swap!)

seancorfield20:07:45

I love the idea of pluggable, composable caching strategies, but I don't know how much people really need that much flexibility. We use just TTL and basic at work (mostly TTL) and we mostly just use wrapped/lookup-or-miss everywhere (although some places still use an explicit wrapped/lookup and wrapped/miss separately -- and a few places use wrapped/seed, mostly in testing contexts). We could live with a much, much simpler caching library.

andy.fingerhut20:07:59

I should say that I've done my fair share of defining and using APIs that have significant dependencies on in what order they must be called in, or else things can go wrong. For some kinds of functionality one wants to implement, it can be challenging to think of a way to create an API that does not have those properties.

dpsutton20:07:16

i think of ghadi's admonition to use a crypto library that doesn't let you do it wrong

ghadi20:07:14

I have the same feelings as @seancorfield about the caching stuff

phronmophobic20:07:26

a guiding principle that I return to often is "make the common case easy and the complex case possible"

ghadi20:07:41

maybe caching APIs should focus on scenarios, not the underlying SPI

ghadi20:07:06

"I need to refresh something every 15 minutes"

phronmophobic20:07:10

agreed. there are many different reasons to use caches that have different requirements

andy.fingerhut20:07:04

I am guessing SPI here does not mean "Serial Peripheral Interface", but Google isn't helping me find a better expansion right now

ghadi20:07:12

service provider interface

ghadi20:07:16

basically a protocol

Alex Miller (Clojure team)20:07:23

API is a top-end interface. SPI is a bottom-end interface. or at least that's how I think about it

notbad 6