This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-05-03
Channels
- # announcements (21)
- # aws (6)
- # babashka (28)
- # beginners (39)
- # biff (1)
- # calva (23)
- # cider (5)
- # clj-kondo (108)
- # clojure (11)
- # clojure-europe (17)
- # clojure-nl (2)
- # clojure-nlp (10)
- # clojure-uk (8)
- # clojurescript (29)
- # community-development (4)
- # conjure (20)
- # css (3)
- # datalevin (9)
- # datomic (3)
- # events (2)
- # figwheel-main (11)
- # fulcro (36)
- # honeysql (7)
- # humbleui (5)
- # interceptors (4)
- # introduce-yourself (3)
- # jobs (1)
- # lsp (51)
- # malli (1)
- # meander (71)
- # minecraft (8)
- # other-languages (18)
- # pathom (15)
- # polylith (25)
- # portal (10)
- # re-frame (5)
- # reitit (15)
- # releases (1)
- # remote-jobs (1)
- # shadow-cljs (11)
- # tools-deps (27)
I'll take a deep look at what you're doing differently to make the transducer performant -- my take on this certainly wasn't (not on MS github anymore, sorry).
The only real performance advantage of using transducers—when possible—is the lack of intermediate collections during the transformations.
Yeah. I was hoping to be able to allow for incremental updates to the TF-IDF calculation without going over the entire collection again but couldn't find a way to do it.
I don’t think that’s possible as the document frequencies will change whenever your corpus changes and every tf-idf score is a product of the document frequency table.
I am just working an improving my own TFIDF implementation: https://github.com/scicloj/scicloj.ml.smile/blob/d70c7e3caff93935d05ab81ed6b2d1e4846ad42b/src/scicloj/ml/smile/nlp.clj#L281 To be released soon. I would definitely re-use something existing, so I will have a look.
Let's discuss here: https://github.com/kuhumcst/tf-idf/issues/1
Sorry to revive this old thread, but I put up my (by now pretty old) implementation on codeberg: https://codeberg.org/schaueho/tfidf
Just to summarize. It seems we have 3 implementations (at least) of TFIDF in clojure: https://github.com/kuhumcst/tf-idf https://codeberg.org/schaueho/tfidf https://github.com/scicloj/scicloj.ml.smile/blob/d70c7e3caff93935d05ab81ed6b2d1e4846ad42b/src/scicloj/ml/smile/nlp.clj#L281 The last one, mine, is very slow compared to at least the first. I have now a use case, where mine is "too slow", while the first one would be "fast enough", so I will come back to it.