Fork me on GitHub

I'll take a deep look at what you're doing differently to make the transducer performant -- my take on this certainly wasn't (not on MS github anymore, sorry).


The only real performance advantage of using transducers—when possible—is the lack of intermediate collections during the transformations.


Yeah. I was hoping to be able to allow for incremental updates to the TF-IDF calculation without going over the entire collection again but couldn't find a way to do it.


I don’t think that’s possible as the document frequencies will change whenever your corpus changes and every tf-idf score is a product of the document frequency table.


yes, exactly

Carsten Behring16:10:23

I am just working an improving my own TFIDF implementation: To be released soon. I would definitely re-use something existing, so I will have a look.


Sorry to revive this old thread, but I put up my (by now pretty old) implementation on codeberg:

Carsten Behring10:11:13

Just to summarize. It seems we have 3 implementations (at least) of TFIDF in clojure: The last one, mine, is very slow compared to at least the first. I have now a use case, where mine is "too slow", while the first one would be "fast enough", so I will come back to it.