I added JSON-LD support to the core endpoints of my Clojure web service, so now my Python MCP server which uses this web service is very happy to get semantically meaningful data back. LLMs that consume web services really seem to like both JSON and well-defined semantics, so JSON-LD is a pretty good fit.
Very cool. Can you share what kind of data it is, or anything? Very interesting either way. Any tips for doing this? 😀
It's for the Danish WordNet which is basically a huge graph of relations between the different word senses used in the Danish language. As for tips: I was already using Transit to transmit the different RDF resources as Clojure maps to my consuming CLJS web app + had an optional Turtle format download for them which used Donatello. The only real change was converting the Clojure map into JSON in way that would make it valid JSON-LD and with the keys presented in a semantic order.
I basically just go through the Clojure map and collect all prefixes, then I put the mapping between prefixes and URIs in the "@context" key of the JSON data we are building. The rest is mostly just renaming the keys in the Clojure map. One thing I had to figure out was how to present supporting data, e.g. relation labels, and I figured that the most logical way was to put those in a "@graph" at the end. I also append a short "rdfs:comment" to help the consuming LLM make more sense of the data.
Code is here: https://github.com/kuhumcst/DanNet/blob/master/src/main/dk/cst/dannet/db/export/json_ld.clj
The MCP server itself is mostly vibe-coded in Python. I researched best practices and have been prompting Claude it to analyse my Clojure code and build the MCP server based on that.
I just did a naïve EDN->JSON conversion at first, but hit a wall, so I figured that the added context of using JSON-LD will make it much easier for the LLM to understand what the data represents... which it did! And that makes a lot of sense too, as any LLM will no doubt have lots of implicit knowledge about RDF and the structure of RDF/JSON-LD, while it has to essentially resort to guessing from its immediate context whenever it's just a random JSON blob.
Super interesting, thanks for writing that up! More to follow, I’m sure, once I have digested this 😀