So, I had this random idea after playing with some agentic LLM tools. The current approach of using many MCPs for different tasks feels both messy and not terribly efficient. It got me thinking that you could just use a Babashka nREPL instead. You'd have a similar lifecycle to MCP, but the target is the REPL itself. It effectively becomes a single tool where the LLM can just write ad-hoc scripts to solve whatever task you give it. It could read the file system, access databases, or pull stuff from the web, etc. from a single endpoint. On top of that, you could leverage REPL state to reuse functions and results that have already been created. You could even have a web UI sitting on top of all that to render and nicely format the results, like displaying graphs, tables, etc. I slapped together a quick prototype to test the idea. It's completely vibe coded (hence being written in Js), but it illustrates the concept. I suspect you could get a lot of mileage out of a REPL-based tool like this, especially compared to what current MCP-based tools are able to do. Curious if anybody else has played around with a similar concept here? https://github.com/yogthos/repl-talk
Interesting idea. Fellow Canadian @bhauman may have thoughts about it too :)
would love to chat about it with@bhauman looking at clojure-mcp was what got me thinking about this originally :) This whole thing wasn't really well designed, but it was interesting to see what the loop would look like. I think the key benefit of the repl can be to track state and accumulate helper functions. I haven't really explored that much yet, but that tends to be a real problem where the LLM doesn't have a persistent state to work on which results in duplicating work, and having to shuffle data from one tool to another. I expect you can get the feedback loop to be a lot tighter as well. Another thing I noticed that seems to work well is that you can use the errors from the REPL to create a loop where LLM fixes the code, if it gets an error you send back the code it wrote with the error, and have it iterate. This way even smaller local models can converge on an answer over time.
I had a similar idea and wrote something that works in babashka. It's bash based instead of repl, but it works pretty well for smaller local models. Most models 7B or larger have plenty of bash training to use a bash tool quite well. https://gitlab.com/michaelwhitford/agentus/-/blob/main/src/us/whitford/agentus/agents/ouroboros.clj?ref_type=heads
Yeah calling out to bash is a handy trick as well, I'm largely curious if you could get similar benefits we see using the repl workflow with the models driving it. The key benefit with the REPL is being able to build up state over time and having an instant feedback loop where you write a functions, see the output instantly, and iterate. My intuition is that we can leverage that to compensate for the stateless nature of the models, and also by allowing models to work on a much smaller scope. They just need to think about the specific function they're writing at each step, and iterate on that.
Ouroboros runs in a full agent loop, it works and writes it's own bash scripts as tools. It works very well for my chatbot.
nice, I'll have to play with it 🙂
I just want to point out that this is not a new idea, its an idea that's been kicking our this channel for a long time. Evaluation in general has proven a good fit for LLMs... I think providing a tool (MCP or direct) which is an SCI repl where you expose the functions that you want to make available to the LLM makes a bunch of sense. Bash tools are a great fit, we have to pay attention to nesting evaluation env inside another evaluation env and think about levels of escaping that the LLM needs to generate etc...
This is my current goto tool https://github.com/bhauman/clojure-mcp-light
and I've added this note to clojure mcp https://github.com/bhauman/clojure-mcp?tab=readme-ov-file#important-update
But for editing Clojure it seems that for now we need to provide tools (MCP or otherwise) or hook into the existing editing tools via hooks like Claude Hooks. Yesterday I tried using gemini-cli, crush, and codex-cli and they all failed hard while trying to edit Clojure... and they failed quickly. And gemini-cli, crush, and codex-cli don't provide hooks for their editing tools so you have to plug in something like ClojureMCP to see if you can get them to work. That being said I am going to see if providing a clj-repair-parens give them a way out of the ParenEditDeathLoop which is the phenomena where once the paren error exists it can not be corrected by the LLM and the LLM will keep churning. And this in turn, begs the question, can we create a clj-edit-tool that can effectively be used from the Bash tool. And maybe we can system prompt/command/skill the LLM to use that. One other thing: folks seem to be speaking about MCP as if its some poorly thought out thing, it does have its place in allowing us to write tools once and then provide them to different LLM clients. It just so happens that when an LLM client already has a bash tool we can already write bash tools to do many things. But you can use an MCP to write a better bash tool with no permissions blocking that runs all your code in the sandboxed environment of your dreams, and that tool can be used in any client.
I'm still on the "I don't want random code to affect my machine" side of things, but paired with https://clojurians.slack.com/archives/C06MAR553/p1763063788960999 maybe this will be an easier setup than clojure-mcp + docker was?
hm I see claude code has a sandboxing feature https://code.claude.com/docs/en/sandboxing
I realize MCP is more practical for just getting things done, I was mostly curious to explore a bit whether you have the LLM drive it. From my playing around it seems that if you use clj-kondo and send both the kondo error and the code the LLM produced then it tends to fix it pretty reliably, even with small local models like qwen8b
the most interesting aspect for me has been the ability to build up state and share it across different calls to the LLM, like if say you have it pull some data from the db, and then you can have it use a def to bind it, and you can reference it in subsequent actions
you can do the same thing with functions, like if it writes a function to do something once, it can be reused later
do you mean, because of the REPL?
(not sure if I follow, I'm still quite a noob when it comes to LLM tooling)
yeah cause LLM itself has no state
and I think the repl state could work as sort of an external memory for it
doesn't it just look at the state of the previous file it edited? (while the REPL serves as a runtime reflection of that state)?
or does it really interrogate the REPL with source , ns-publics, docetc?
I haven't got that far with it yet, but the use case I was playing with wasn't so much code editing, but rather having the agent do stuff on the system
on a related note, there's also this tool that's interesting https://github.com/universal-tool-calling-protocol/code-mode
progressive tool discovery is interesting... SCI is such a natural fit for the Clojure community, I don't see why we couldn't make a wrapper library that provided this via a single tool...
yeah sci could definitely be a good fit, I just really like the idea of having the llm talk to a single tool where it can just write little scripts to accomplish tasks
that's what I'm envisioning for SCI. You make an eval environment tool as a single tool for the LLM built on top of SCI then you configure it to expose the functions you want to expose to the environment. You can also make fun composable tools that operate on lazy sequences fork/join and all kinds of fun functional stuff.
yeah exactly
instead of using a bunch of tools on the shell, you could make a bunch of tools that all run within a single sci runtime
it seems like llms are a great fit for functional style as well since it inherently reduces the scope they need to consider
or you make a library and you perhaps mark the functions you want to expose with metatdata 🙂
yeah there's lots of ways you could cut it, I think it'd be interesting to track the functions dynamically so when your llm adds new ones locally they stick around
and this way you can grow your very own environment organically over time
incidentally, there's this idea of residential programming, a fun talk on it you might enjoy https://www.youtube.com/watch?v=Kgw9fblSOx4
but basically what he demos in the talk is using the db to track and version individual functions instead of entire programs
but of course this is not that much different than the llm writing bash tools...
for itself.
yeah same basic idea
that's why I think the state management aspect of it is the most interesting
basically I'm envisioning a repl driven workflow the way human devs use the repl
you write a function play with it till it works the way you want, get a result, then move on to do the next step, and so on
so you get the llm to break up a task into a series of steps, and then it can work on implementing each one, and when it works right, move to the next
yes, you do know that's how this all started for me is having the LLM work in the REPL exclusively before writing things to file..
its just that overtime the models got better and better and better to where that step became just something to entertain myself with
the models are definitely a lot more capable now, but I still find lots of times where I wish they could iterate faster than they do
when the code works in one shot it's great, but when they make mistakes it's sometimes a pain to explain what went wrong without them actually looking at the output and iterating
oh yes of course the REPL is essential
also this was an interesting paper https://arxiv.org/abs/2509.16198
it basically suggests that you can use a graph to describe the project structure and that helps the LLM stay consistent on large scale tasks they normally fail at
and I wonder if the repl state could be used as that
cause you can functions and relationships available in the repl
do you use Claude Code? and if so do you use the plan-mode?
I've been using cursor, and it has a similar feature in it
not sure how well it compares, but it defines steps and then checks them off as it goes, it definitely seems to work better
yeah I'd have to say no... Claude Code is different altogether now... its performance is way beyond where I thought it was
I'd say way way
yeah a friend of mine uses it, and he's been pretty impressed with it
I'd give it a real try before creating a bunch of tools... I'm finding it starting without context and producing real tangible results in a remarkably short period of time.
I'm still somewhat leery about wholly relying on a subscription tool though, ideally I'd like to be able to run local models to do stuff
so I am curious about how much tooling can help a small model do more
yes and that's a valid pursuit but you may want to know what the top performing experience is as a bench mark. ask @cfleming I've invested a bunch of time into improving perf when Claude Code was miles ahead in efficacy.
yeah that's fair, this stuff is changing so rapidly now in general, like if you tried these tools even half a year ago it's a completely different world now
who knows what things will look like six months down the road 🙂
jobless 🙂
haha I think there's going to be a need for a human in a loop for a while yet because the model can't evaluate correctness in semantic sense
but the nature of work is likely going to change from writing code by hand to doing more business analysis tasks
it's like we're living through a version of the industrial revolution here
I'm not really afraid of that but I'm throwing cljs bugs in a 20k line plus complex codebase at Claude Code and it simply finds the source of the bugs albeit with a little coaching here and there... but its outperforming me for sure
oh yeah I find these things are amazing at doing code analysis
I've been doing stuff like throwing cursor a service and telling it to show me a sample curl for calling an endpoint and sample data
like before I'd have to actually run the service and interrogate it, but now it can do it from just looking at the code
I've also found asking it to make mermaidjs diagrams is a handy trick
might be useful for claude as well
cause then you can look at the diagram of the plan it produced and say change step x
and that formalizes it a lot more than just talking to it about it in plain english
There is something there just on the edge of my awareness. Use a git repo as a memory, and vector database, and code repo, and AI state to feed a state machine/statechart to control the agent. Each action checked in to the repo, so you can reify it into a graph and traverse it from any other agent, branching as needed. Similarity search is too good for narrowing down the tool selection to be left out, but it's complicated and finicky with different embedding models and vector databases, and similarity algorithms. Using git repos as memory alone would be a big boost for existing systems. I can see people creating git repos alongside their libraries, frameworks, etc, and you point your agent at the memory repo when you want the AI to use it.
I have been playing with using a git repo commit hook to add embeddings to my vector database, but it feels very "tacked-on" and fragile.
hmm I wonder if datascript might be a good fit for the db here
or xtdb since it has history, and then you don't even need git