This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-05-25
Channels
- # announcements (9)
- # asami (69)
- # babashka (151)
- # babashka-sci-dev (34)
- # beginners (90)
- # cider (21)
- # clj-on-windows (17)
- # clj-otel (4)
- # cljsrn (5)
- # clojure (27)
- # clojure-austin (3)
- # clojure-europe (87)
- # clojure-gamedev (1)
- # clojure-nl (3)
- # clojure-norway (8)
- # clojure-poland (2)
- # clojure-uk (3)
- # clojured (10)
- # clojurescript (50)
- # core-async (73)
- # cursive (28)
- # data-science (2)
- # datomic (17)
- # etaoin (1)
- # honeysql (6)
- # introduce-yourself (3)
- # jobs (1)
- # joyride (12)
- # malli (5)
- # nbb (14)
- # off-topic (18)
- # pathom (4)
- # podcasts-discuss (2)
- # polylith (30)
- # project-updates (3)
- # re-frame (33)
- # reitit (1)
- # remote-jobs (13)
- # shadow-cljs (59)
- # sql (12)
- # tools-build (7)
- # xtdb (36)
A common task at my organization is “numericalization”: For each non-numeric column in a CSV, replace each unique value in the column with a corresponding unique integer. Output the updated CSV and a table mapping column values to their corresponding integers. For example, input:
|---------|
| species |
|---------|
| cat |
| dog |
| bird |
| bird |
| cat |
Output:
|---------|
| species |
|---------|
| 0 |
| 1 |
| 2 |
| 2 |
| 0 |
{"species" {"cat" 0
"dog" 1
"bird" 2}}
Currently this is achieved via a script that leverages org.clojure/data.csv, but it’s a bit verbose. I’m wondering if there’s a simpler way, perhaps one that takes advantage of next.jdbc.Are you currently using a database? I'm a bit unclear on what your process is...
Might be a better task for something like https://github.com/scicloj/tablecloth, but I figured I’d check.
That sort of get-or-create
logic can be a bit messy in SQL/JDBC I think, assuming you need to make it thread-safe etc...
We have quite a few such instances, specifically for the use case you outline: turning strings into unique ID values.
We have quite a few slightly different get-or-create-<something>-id functions. Instances of that logic.
have you looked datalevin? it's based on lmdb - kv store. I think it should be a good fit for your use case
I’ll lave a look! Thanks, @U011NGC5FFY!