Fork me on GitHub
Steven Deobald07:10:10

Reading this thread has left me with another naive, open-ended question: Are any of you familiar with how linguistics systems deal with the raw components of language? The best tool available for Pali (at the moment) is the DPR: ...and although it's incredibly detailed, it just sort of brute-forces word compounds against a massive dictionary. That dictionary includes components used in compounds, but the component relationships and relationships to Sanskrit and Latin are just hard-coded within word definitions, as far as I can tell. Wiktionary has some very basic understanding of word components: Their SPARQL endpoint seems to be down (or inaccessible from Kashmir) at the moment: I don't think this granularity would ever apply to the data on but it will ultimately be required for Pariyatti's sister project, I see but I'm never sure about the significance of projects like this. Are there others I should be reading about?


Linguistics isn’t really my area of expertise, so please take my comments with a pinch of salt. The biggest model I know of in the knowledge representation (KR) of linguistics is wordnet. It’s a long standing project to essentially provide a machine readable thesaurus of natural language terms, and give some idea of their proximity to each other etc. My understanding is that it’s a great dataset, with bindings into many ecosystems and it’s still widely used, especially to add some knowledge of synonyms etc to search engines etc. Being KR these days it’s probably considered old hat, with ML language models taking centre stage; however I think there’s a lot of progress in hybrid approaches that combine KR and ML; so I don’t think it will go anywhere anytime soon. Like you I’d be somewhat sceptical of the long term viability and maintenance of the OKFN stuff. They have their fingers in a lot of pies, and I suspect like many people are forced to chase income streams. That’s not to say that they don’t do good work; they absolutely do.

Steven Deobald10:10:22

Oh yeah, wordnet... I remember that.