Fork me on GitHub
#clojure-europe
<
2020-12-06
>
orestis08:12:44

Thanks @paul.legato. What I was thinking is more like “hash tagging” to let people write in the topic of their post.

paul.legato18:12:21

Unless they’re highly motivated to do so, most people won’t bother even with hash tagging, in my experience.

orestis08:12:31

An interesting thing that I saw: Power BI is one term, but PowerBI is also a term and they are the exact same term. Obviously Elasticsearch doesn’t pick that up and I’m not sure ngrams will work there, without a custom glossary of terms...

paul.legato18:12:02

Yes, there’s a huge rabbit hole of such optimizations that can be done. Your stemmer step might take care of making those things identical, or you could use some sort of string distance comparison like Levenshtein distance instead of strict equality when deciding whether the term matches some other term.

orestis08:12:06

(How many more examples are there that I don’t even know? Hence why I think users are best placed to highlight terms on their own)

paul.legato18:12:42

They certainly are best placed to do so. The problem is rather that most users don’t care.

paul.legato18:12:54

They view the machine as a black box into which they dump random information and magically get useful results out. Every extra step you introduce into that process cuts compliance by half, unless they’re highly motivated (e.g. their boss orders them to do it.)

paul.legato18:12:58

If you can’t force them to do the tagging out-of-band like that, it’s probably a non-starter unless the process you are building delivers 100x value over the next-best solution / whatever they’re doing now. 2x or 10x likely isn’t enough.

paul.legato18:12:30

Many shops fake it with humans on staff paid to do the tagging, so customers don’t have to.