https://github.com/tonsky/clj-simple-stats, simple statistics for Clojure+Ring webapps, has reached v1.2.0 This update removes all the potential issues with GDPR related to cookies and IP addresses
We use duckdb at work, the same issue: the jar holds binaries for all architectures. We made our own internal package which keeps only linux/arch64
btw: sqilte-jdbc is only 13 megs or something
DuckDB does compression on DB file though. For example, stats for grumpy.websites weight 600 MB in DuckDB and 3 GB in SQLite. By paying more upfront, you are actually saving in the long run! Also queries are way more efficient in DuckDB. Didn’t really had a choice.
Hi, Niki! Thanks for sharing this. My jar is increased from 31M up to 108M. Statistics is a heavy thing! 🤣
that's duckdb
I manually removed binaries for windows and macosx from ubejar and now it is 49M.
Now that I think about it I don't like it too but… duckdb is really good at what it does
pretend it's a very compact electron app :)
---
clj-uuid 0.2.5 released
https://github.com/danlentz/clj-uuid
Just announcing a new release of clj-uuid -- Thanks for 3,000,000 downloads!
What’s new:
• New UUID versions — v6 (reordered time-based, sortable), v7 (unix time + crypto random, sortable), and v8 (custom/user-defined). And, introducing v7nc for
when you need v7's structure, you need it fast, and don’t need cryptographic randomness.
• Performance work — Rewrote the internals around ByteBuffer primitives and JVM intrinsics. Serialization-heavy workloads see 3-19x improvement.
Generation is now competitive with JUG and uuid-creator across the board, with faster v7nc, and v5 (SHA1) at parity with JUG.
• v7nc — At ~39ns, this may be the fastest UUID generator on the JVM — 1.26x faster than JUG’s TimeBasedEpochGenerator. Useful when you want sortable
time-based UUIDs without the overhead of SecureRandom.
[danlentz/clj-uuid “0.2.5”]
If you’re curious about the optimization approach, I wrote up some notes:
- https://github.com/danlentz/clj-uuid/blob/master/doc/perf-analysis.md — ByteBuffer vs shift/mask loops, JVM intrinsics
- https://github.com/danlentz/clj-uuid/blob/master/doc/uuid-generation-benchmarks.md — per-version timings
- https://github.com/danlentz/clj-uuid/blob/master/doc/apples.md — Comparisons with other libraries
Feedback welcome!
---
i have a new “ordered collections” library i hope you’ll also look at. just waiting on approval from mgmt to release its glacial
that performance increase is incredible
39ns is no joke. Great work
🎉
Uh oh. Your optimization work is stellar, but your benchmarking approach is unfortunately invalid. You MUST use a proper benchmarking framework like JMH for microoptimizations like this one. You MUST consume computation results, otherwise you may just be reading DCE noise. I'm pretty sure you will still come up ahead when you redo the benchmarks, but it's still important to do. You can start with Criterium, at least.
i like criterium and use it a lot. i am not sure how to publish benchmarks correctly (its all bs at some level) my goal was to just give a sense of the improvements. but i’m definitely open to increasing the rigor of the benchmarks if you want to open a new PR?
I don't mean that you have to include all the quartiles and stdevs from Criterium/JMH, it's fluff, average is fine. I mean that you can't dotimes an operation that takes 40ns to run without somehow consuming the output and trust the resulting number. Both Criterium and JMH do that (consume the output and thus prevent dead code elimination). 40ns may become 100ns when measured properly, so you can't get the sense of improvement in this way.
thats a reasonable objection
im not sure i can address it immediately. but my feeling is that we’re in the ballpark. i’m very happy to work with you on improving the benchmark (and who knows, maybe overall performance)
Again, I'm sure that re-benchmarking will demonstrate that you still achieved significant improvement, but given how much effort you already put into this, and that you have a dedicated benchmarking scripts/docs, it is a waste to have the "wrong" methodology there.
You could at least double-check with Criterium that it gives you the same ballpark timings as your dotimes scripts. If it does, then at least the results are correct, even if the measuring approach isn't.
make a PR?
y thats fair
Sometime in next life, maybe, sorry 😅 Swamped
lol i get it. but i appreciatye this feedback
i will address this. you’re right. gah. but it will have to wait until the next release / my next vacation lol 🫤
That's why I never get invited to parties
that said. my measuring approach was consistent across all libraries
so, in relative terms, might be reasonable
lol youre invited my friend you just need to talk me into hosting one
I have some examples of what can go wrong here: https://clojure-goes-fast.com/blog/using-jmh-with-clojure-part1/
oh wow thanks for this link
gold
off topic, but dunno where to go with it: @alexyakushev awesome writeup. I'm curious if you found any problems with jmh-clojure, or if you just found it unintuitive/unwieldy?
Excellent question. I've encountered jmh-clojure back in those days and experimented with it for a little bit. While it indeed seems like a thought-out project with very solid engineering behind it (its author also developed https://github.com/jgpc42/insn), I couldn't immediately build a mental picture of what's going on there, and thus I didn't trust my results I obtained with it. Micro/nano-benchmarking is such a sensitive act that you'd want to understand what you are doing because it is so easy to get wrong results. I keep jmh-clojure in the back of my mind to return to it someday and properly figure it out and add to my toolset, but I haven't yet in these 7 years I've known about it 😅.
Welp, there goes my weekend 🤷
yeah my hand-wavy assumption is that it gens classes with the proper annotations and runs JMH, but, yeah, if I were staking my career on it, I would need more info 😄
@joe.lane let me know what you find.
This is the same reason why I gave up on home-grown attempts to drive JMH from the REPL. I just feel better knowing that I'm launching JMH as a standalone process, like as Shipilev intended, and not injecting entropy into measurements.
As Shipilev Intended™
oh god now i need to google shipilev i felt so good about this release earlier
This topic is a certified org.openjdk.jmh.infra.Blackhole, enjoy your trip!
@danlentz I pray you actually don't, in your case I'm 99.9% sure Criterium will be enough to verify that your dotimes measurements are in fact fine.
I honestly cannot claim that I enjoyed my dive into jmh either. But what's done is done.
yeah criterium is almost certainly enough for a single fn call
But JMH is a must when you do disassembly diving, case in point: https://clojure-goes-fast.com/blog/performance-tidbit-instanceof/
yeah I'm not actually sure it was necessary in my case, but I wanted to see the impact of threading contention on certain operations. not sure whether that's in criterium's wheelhouse, but I saw other resilience libs setting up jmh harnesses, so that's what I went with.
https://github.com/potetm/fusebox/blob/c9172629538bd982e5e043a864065c3839afb0b0/dev/dev/jmh.clj#L131
Data-driven configuration that jmh-clojure offers is heaps more pleasant than JMH's annotation vomit, for sure.
yeah I'ma just cross my fingers that Joe comes back with the all clear on Monday
I started this in 2013 as an exercise to learn Clojure … and it continues to be.
I've always had a project on my personal backlog to store jmh-clojure's datastructures in Datomic to create a performance testing and regression prevention tool. Always just out of reach...