data-science

lvh 2024-09-22T14:40:52.908239Z

Hm, how is tablecloth.api/percentiles supposed to work? I tried something like:

(tc/percentiles all-data :infra-cost-pct :infra-cost [50 95 99 100])
But that blew up because it wanted the same cardinality as the ds itself (which yeah, that makes sense). but I don't know how you're ever supposed to use it. (note: I don't mean tcc/percentiles, that one I was able to use successfully). Error:
\
1. Unhandled java.lang.Exception
   Column size (4) should be exactly the same as dataset row count (3111).
   Consider `:cycle` or `:na` strategy.
Is this function just accidentally autogenerated or something and it doesn't actually make sense to use?

Daniel Slutsky 2024-09-25T20:38:51.025499Z

Created an issue: https://github.com/scicloj/tablecloth/issues/169 Thanks for this report, @lvh!

Daniel Slutsky 2024-09-22T14:58:08.937099Z

Hi, sorry maybe I don't understand. What blue up? Indeed the tcc namespace is mostly lifted from another library (`dtype-next`), and many of these functions still need better documentation.

lvh 2024-09-22T18:01:01.473959Z

Oops, sorry, I pasted the one that does work

lvh 2024-09-22T18:02:31.760819Z

I fixed the example and added the error

lvh 2024-09-22T18:03:17.957209Z

I think I understand what the function is supposed to do (add a column with percentiles) , and I understand the error (that's not the size of the ds), but I don't understand how that was ever supposed to work

lvh 2024-09-22T18:04:03.444729Z

I would understand if tc/percentiles added a row with the percentile location of each corresponding value for example, similar to rank except normalized to a percentile range, but then the signature is wrong

Daniel Slutsky 2024-09-22T21:34:33.600089Z

You're right, this looks like a mistake to me.

Daniel Slutsky 2024-09-22T21:36:20.428059Z

The best place to discuss such topics is the Zulip chat, where the library authors are more present, and we have better knowledge management of such threads. https://scicloj.github.io/docs/community/chat/ (or the Github Issues of the library). I'd encourage you to write there, since I imagine you are about to have more and more insights of this kind. But if that is inconvenient, then I will write there.

Daniel Slutsky 2024-09-22T21:39:21.376259Z

Another library with many such statistical functions is https://github.com/generateme/fastmath. It is now in transition (version 3-alpha), and the fastmath.stats namespace is fabulous. • https://generateme.github.io/fastmath/clay/stats.htmlhttps://generateme.github.io/fastmath/notebooks/notebooks/stats/ (currently more detailed, mostly still relevant)