Fork me on GitHub
#off-topic
<
2021-12-03
>
Stuart12:12:49

Something I just got hit with, I made a bad assumption. What would you think should be the result of calling a function Average() on an empty list of integers? I, mistakenly, assumed it was 0. In reality it throws an exception. My colleague said that he would have expected the exception. What do you think? 0 or exception?

Colin P. Hill12:12:36

I would expect an exception, or nil, or some other error result. The average function doesn't have an identity value.

Joe12:12:02

Exception! If average is sum(xs)/count(xs) then it's in the same category as dividing by zero.

☝️ 5
👍 2
1
Stuart12:12:34

argh! Of course! That makes so much sense. I hadn't thought of it like that

mlimotte13:12:34

Exception for sure. It would be up to the UI in a particular domain to decide to display 0 or “n/a” or other visualization.

Sam Ritchie14:12:18

In a language like Scala you might use an “Option” here, and for any aggregate that is not defined for multiple items

Sam Ritchie14:12:32

Or you make a special “nothing” object which is basically the same idea

Sam Ritchie14:12:57

For it to be defined, you need to be able to combine the “nothing” average with an existing average and get the existing thing back out

Phil Shapiro15:12:27

I could see NaN here instead of a divide by zero exception, since 0/0 is NaN.

vemv13:12:55

I'd treat the empty list as impurity and forbid it at the edges Technically Average() could throw something like an AssertionError in dev/test, and let it be undefined behavior in prod

Ben Sless13:12:32

Can you set it up such that the empty list won't even reach the average function?

☝️ 1
Stuart13:12:29

My problem is I'm querying a db table on a remote machine that logs requests and responses for a system, and I have to send back a value for the average duration between a number of requests and responses. This is to help identify when the service is running slow. I take the average and it gets floored, so an average of zero is good, means the request > response is < 1 sec. However, if there are no requests found at all, then the list will be empty. An average of zero here is bad, as there is something wrong if there are no requests. SO I think I need to send back something like -1 if its empty list to indicate error, but I'm thinking in the future would we ever want an average of averages ? This seems like this number would be nonsense. I could also send back an average and count, and then they can see if Average is zero and count is zero, then to trigger an error.

Stuart13:12:30

You would never really ever want to know an average of averages would you?

cdeszaq16:12:04

You might, but often this sort of aggregation is done in error rather than intentionally.

cdeszaq16:12:35

eg. “7-day average of daily average latencies” to give you a smoothing function

Ben Sless13:12:34

True, but I also wouldn't want averages, but percentiles. If you want to take the clojury approach then the average of nothing is nothing, i.e. return nil and let the recipient figure it out

1
cdeszaq16:12:30

Better than percentiles, you actually want the histogram. Just looking at key percentiles throws away a LOT of data. (granularity of the histogram is an interesting topic, but the idea is the same: Retain your data when aggregating)

Ben Sless17:12:21

Agree, raw data is best, but in a choice between average and a list of percentiles, I prefer the latter

cdeszaq18:12:14

Oh certainly agreed, a list of percentiles (ie. a histogram 😉 ) is very useful for understanding the shape of a data set (ie. the distribution). Average does have its uses though, in that it doesn’t throw away any data, whereas a single percentile throws away most of it. For latency tracking, a “trimmed mean” is often a really good statistic, as it has the benefits of Average (single number to interpret, not throwing away most of the data), but also deals w/ the most common problem with averages: Outliers.

Ben Sless18:12:56

Usually for latency I'd be satisfied with 50, 90, 99, up to 6 nines

cdeszaq18:12:30

Yes, it is often the case that these are what get looked at, though I think that is more of a tooling artifact than being driven by our goals. Usually we want to go faster for everyone, not just those that happen to feel these percentiles. From my observations, looking at specific percentiles tends to cause the latency histogram to have “bumps” and consolidate just below these signposts, primarily because latency improvements for requests in the faster 50% of the population don’t get “counted” when we look at what changes moved the needle. Don’t get me wrong, any attention to latency is good, but I think we can do better than just a handful of percentiles

Ben Sless19:12:26

"it depends" Maybe I'm spoiled by Gil Tene's talks on latency, but in certain scales those percentiles are a real pain, I care about the high percentiles, to an extent where I don't care about the bumps as much as I care about the tail in plenty of circumstances

Colin P. Hill13:12:18

An average of averages is a weighted average

Colin P. Hill14:12:24

I would say that if you have no data, then you have no average. It's simply not defined, and using a special value within the space of numbers to indicate that is a bit of a kludge. So, yeah, nil seems the way to go. Or some kind of special non-numeric sentinel value, or an empty option, or an omitted field, or whatever else makes sense for the communication language

1
Colin P. Hill14:12:51

On the idea of using -1 or 0, consider what will happen if you start wanting to report a metric where negative numbers or zero are actually valid. Well crap, now you've got to figure out some other approach for that metric, and then your system will have inconsistent and confusing semantics

Ben Sless15:12:50

Any users of Nix want to share experience and tips on setting it up for clojure development. Thinking in particular about pinning java versions per repository, etc instead of using jenv

Thomas16:12:46

I’m only getting started with Nix myself, but w/ my limited knowledge I’d use a shell.nix in the project’s directory. I’m currently trying https://github.com/nix-community/lorri to keep the current shell up-to-date wrt shell.nix.

Brett Rowberry16:12:39

I created a new repo on GitHub and selected Clojure for the .gitignore. It used this file https://github.com/github/gitignore/blob/218a941be92679ce67d0484547e3e142b2f5f6f0/Leiningen.gitignore

Brett Rowberry16:12:00

My editor is Calva and I'm doing a deps.edn project.

Brett Rowberry16:12:54

This file https://github.com/github/gitignore/blob/218a941be92679ce67d0484547e3e142b2f5f6f0/Clojure.gitignore suggests that I should make a new file and reference it for deps.edn projects.

Brett Rowberry16:12:03

Anybody have a different/better idea?

dpsutton16:12:56

I always just fill in the gitignore when i see files i don’t want tracked

3
dpsutton16:12:31

i don’t think any of those entries except for .cpcache are useful for me

Brett Rowberry16:12:42

Same, but if there's a template I'd like it to be generally useful.

Brett Rowberry16:12:03

I added

# Calva
.calva/
.clj-kondo/
.lsp/

dpsutton16:12:04

sure. that template looks geared towards the stuff that lein makes so it doesn’t seem that useful to me

dpsutton16:12:10

i think you want to ignore .clj-kondo/.cache and .lsp/.cache . Both of those directories can have useful stuff you want checked in

👍 1
dpsutton16:12:15

no idea about .calva

dpsutton16:12:37

that was such a confusing message. you want to ignore .clj-kondo/.cache. not just the root .clj-kondo since there are useful config files in there

Brett Rowberry16:12:51

here are my additions at this point

# Calva
.calva/output-window/
.clj-kondo/.cache/
.lsp/.cache/

Brett Rowberry16:12:07

thanks for the tips!

Fredrik16:12:36

If you haven't already, you should check out Gibo https://github.com/simonwhitaker/gibo for gitignore boilerplates. Starting up a project I do gibo dump clojure > .gitignore and get

pom.xml
pom.xml.asc
*.jar
*.class
/lib/
/classes/
/target/
/checkouts/
.lein-deps-sum
.lein-repl-history
.lein-plugins/
.lein-failures
.nrepl-port
.cpcache/

Gabriel Luchtenberg18:12:26

I often use the the gitignore file generated by this site: https://gitignore.io/

seancorfield22:12:12

If you create a Clojure project with deps-new, you get this https://github.com/seancorfield/deps-new/blob/develop/resources/org/corfield/new/app/root/.gitignore (it's the same for a lib)

borkdude21:12:54

Trying to find a reference from @alexmiller about multiple :require in one ns form not being supported/idiomatic. Should be easy now that we have history here on Slack... I think it came up in relation to the new :as-alias. Trying to find "evidence" in relation to this issue. https://github.com/clj-kondo/clj-kondo/issues/792

p-himik21:12:56

Interestingly enough, it's straight up an error in CLJS: https://clojure.atlassian.net/browse/CLJS-254

p-himik21:12:52

Oh, and the Q&A where I found the link ties back to the issue 792, heh.

p-himik22:12:05

https://groups.google.com/g/clojure-dev/c/6EEfdPhrWgk - no definitive answers, but some opinions from well known people.

p-himik22:12:03

Ah, found it, by Alex Miller: https://clojurians-log.clojureverse.org/clojure/2020-07-27/1595872486.471000 Where the linked article by Stuart Sierra says: > Use each clause at most once.

borkdude22:12:35

Yeah, but he said it quite literally recently somewhere

borkdude22:12:07

Thanks for these references btw