The narrower the domain, the fewer examples you need. Training an assistant narrowly scoped to “Clojure code” would be far less burdensome to set up than a general purpose assistant. Wild guess — I’ll bet it would already be significantly useful just training it on the chatlogs and whatever Clojure code happens to be on GitHub, if not perfect.
Someone was just discussing this https://clojurians.slack.com/archives/C0CB40N8K/p1682376798438999
So there's probably two kinds of code assistants here. 1. A conversational, peer programming code assistant 2. A linting, code completion assistant
We'll probably want both, because the conversation one probably won't need to be as fast as the completion one
I bought the GPT copilot thing and just the code completion is worth it
For good conversational peer programming, from what I'm reading about current open source models, you'll have a hard time getting human-like creativity out of a sub-50B model. But perhaps the code completion bot could be one of these 7B models fine tuned for code.
It'll take some experimentation
#clojurellm now exists