This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-09-12
Channels
- # ai (1)
- # announcements (7)
- # babashka (32)
- # beginners (23)
- # biff (9)
- # calva (1)
- # cljs-dev (13)
- # clojure (32)
- # clojure-belgium (1)
- # clojure-chicago (15)
- # clojure-europe (24)
- # clojure-india (3)
- # clojure-nl (3)
- # clojure-norway (55)
- # clojure-uk (4)
- # clojurebridge (1)
- # clojurescript (5)
- # core-async (17)
- # data-science (9)
- # datomic (29)
- # events (3)
- # fulcro (16)
- # graalvm-mobile (4)
- # helix (15)
- # hyperfiddle (74)
- # introduce-yourself (1)
- # jobs (4)
- # kaocha (12)
- # leiningen (27)
- # lsp (16)
- # shadow-cljs (6)
- # spacemacs (20)
- # sql (27)
- # squint (7)
- # tools-deps (29)
- # vim (2)
- # xtdb (10)
has anyone ever built an “auto-tagger” feature. something like: Given a block of text, and a vector of tag names, return a vector of tag names that are similar to the block of text
I've done keyword extraction, I.e. given a block of text, pick a few words from that text that are representative of the content (with tf-idf), not sure if that's exactly what you're describing here though? tf-idf might be a good first step though. fastText is pretty convenient for getting word embeddings may be useful too.
Just python actually
I generate a csv with clojure and then call the python script as a subprocess. then I also call fastText as a subprocess
when I tried libpython-clj several years ago, it ran about 30% slower than calling python a subprocess 🤷 in general I find csv + subprocess is pretty convenient
looks like spark mllib can do tf-idf, I would definitely check that out first: https://spark.apache.org/docs/latest/mllib-feature-extraction.html I've been using spark mllib for its collaborative filtering algorithm, after previously using python for that, and it's awesome