Fork me on GitHub
#data-science
<
2022-09-16
>
vonadz13:09:16

Anyone know of any plug and play libs that can extract keywords or topic tags from text?

vonadz17:09:35

Thanks šŸ™‚ I replied in the gist

šŸ‘Œ 1
Rupert (All Street)15:09:36

Do you have a list of topic that you are interested in matching? You could start with clojure.striing/includes? or regex. Going down the data science route: ā€¢ If you know the categories, you could create a supervised classifier for each tag. ā€¢ If you don't know the categories, then you could detect rare terms with TF/IDF or use clustering like K-means. Finally. large language models like GPT-J are good one/few shot learners you could probably prompt it to tell you about the tags (no fine tuning required). A notable downside is they are slow to run especially if you don't have a GPU handy.

aaelony15:09:41

maybe this LDA implementation still works... https://github.com/davidandrzej/chisel

Carsten Behring15:09:13

Do you have some more details what you mean by "extract topics" ?