babashka

yogthos 2025-11-15T23:12:15.810009Z

So, I had this random idea after playing with some agentic LLM tools. The current approach of using many MCPs for different tasks feels both messy and not terribly efficient. It got me thinking that you could just use a Babashka nREPL instead. You'd have a similar lifecycle to MCP, but the target is the REPL itself. It effectively becomes a single tool where the LLM can just write ad-hoc scripts to solve whatever task you give it. It could read the file system, access databases, or pull stuff from the web, etc. from a single endpoint. On top of that, you could leverage REPL state to reuse functions and results that have already been created. You could even have a web UI sitting on top of all that to render and nicely format the results, like displaying graphs, tables, etc. I slapped together a quick prototype to test the idea. It's completely vibe coded (hence being written in Js), but it illustrates the concept. I suspect you could get a lot of mileage out of a REPL-based tool like this, especially compared to what current MCP-based tools are able to do. Curious if anybody else has played around with a similar concept here? https://github.com/yogthos/repl-talk

borkdude 2025-11-15T23:16:50.646529Z

Interesting idea. Fellow Canadian @bhauman may have thoughts about it too :)

yogthos 2025-11-15T23:27:27.431159Z

would love to chat about it with@bhauman looking at clojure-mcp was what got me thinking about this originally :) This whole thing wasn't really well designed, but it was interesting to see what the loop would look like. I think the key benefit of the repl can be to track state and accumulate helper functions. I haven't really explored that much yet, but that tends to be a real problem where the LLM doesn't have a persistent state to work on which results in duplicating work, and having to shuffle data from one tool to another. I expect you can get the feedback loop to be a lot tighter as well. Another thing I noticed that seems to work well is that you can use the errors from the REPL to create a loop where LLM fixes the code, if it gets an error you send back the code it wrote with the error, and have it iterate. This way even smaller local models can converge on an answer over time.

michaelwhitford 2025-11-16T00:21:54.212169Z

I had a similar idea and wrote something that works in babashka. It's bash based instead of repl, but it works pretty well for smaller local models. Most models 7B or larger have plenty of bash training to use a bash tool quite well. https://gitlab.com/michaelwhitford/agentus/-/blob/main/src/us/whitford/agentus/agents/ouroboros.clj?ref_type=heads

yogthos 2025-11-16T00:29:01.275259Z

Yeah calling out to bash is a handy trick as well, I'm largely curious if you could get similar benefits we see using the repl workflow with the models driving it. The key benefit with the REPL is being able to build up state over time and having an instant feedback loop where you write a functions, see the output instantly, and iterate. My intuition is that we can leverage that to compensate for the stateless nature of the models, and also by allowing models to work on a much smaller scope. They just need to think about the specific function they're writing at each step, and iterate on that.

michaelwhitford 2025-11-16T00:32:13.527629Z

Ouroboros runs in a full agent loop, it works and writes it's own bash scripts as tools. It works very well for my chatbot.

yogthos 2025-11-16T00:33:07.698419Z

nice, I'll have to play with it 🙂

bhauman 2025-11-16T18:50:28.966909Z

I just want to point out that this is not a new idea, its an idea that's been kicking our this channel for a long time. Evaluation in general has proven a good fit for LLMs... I think providing a tool (MCP or direct) which is an SCI repl where you expose the functions that you want to make available to the LLM makes a bunch of sense. Bash tools are a great fit, we have to pay attention to nesting evaluation env inside another evaluation env and think about levels of escaping that the LLM needs to generate etc...

🙏 2
bhauman 2025-11-16T18:54:16.832109Z

This is my current goto tool https://github.com/bhauman/clojure-mcp-light

bhauman 2025-11-16T18:54:41.797789Z

and I've added this note to clojure mcp https://github.com/bhauman/clojure-mcp?tab=readme-ov-file#important-update

bhauman 2025-11-16T18:55:13.602609Z

But for editing Clojure it seems that for now we need to provide tools (MCP or otherwise) or hook into the existing editing tools via hooks like Claude Hooks. Yesterday I tried using gemini-cli, crush, and codex-cli and they all failed hard while trying to edit Clojure... and they failed quickly. And gemini-cli, crush, and codex-cli don't provide hooks for their editing tools so you have to plug in something like ClojureMCP to see if you can get them to work. That being said I am going to see if providing a clj-repair-parens give them a way out of the ParenEditDeathLoop which is the phenomena where once the paren error exists it can not be corrected by the LLM and the LLM will keep churning. And this in turn, begs the question, can we create a clj-edit-tool that can effectively be used from the Bash tool. And maybe we can system prompt/command/skill the LLM to use that. One other thing: folks seem to be speaking about MCP as if its some poorly thought out thing, it does have its place in allowing us to write tools once and then provide them to different LLM clients. It just so happens that when an LLM client already has a bash tool we can already write bash tools to do many things. But you can use an MCP to write a better bash tool with no permissions blocking that runs all your code in the sandboxed environment of your dreams, and that tool can be used in any client.

borkdude 2025-11-16T18:56:31.072649Z

I'm still on the "I don't want random code to affect my machine" side of things, but paired with https://clojurians.slack.com/archives/C06MAR553/p1763063788960999 maybe this will be an easier setup than clojure-mcp + docker was?

borkdude 2025-11-16T19:17:25.115199Z

hm I see claude code has a sandboxing feature https://code.claude.com/docs/en/sandboxing

yogthos 2025-11-16T20:00:25.290799Z

I realize MCP is more practical for just getting things done, I was mostly curious to explore a bit whether you have the LLM drive it. From my playing around it seems that if you use clj-kondo and send both the kondo error and the code the LLM produced then it tends to fix it pretty reliably, even with small local models like qwen8b

yogthos 2025-11-16T20:02:10.644899Z

the most interesting aspect for me has been the ability to build up state and share it across different calls to the LLM, like if say you have it pull some data from the db, and then you can have it use a def to bind it, and you can reference it in subsequent actions

yogthos 2025-11-16T20:03:52.797509Z

you can do the same thing with functions, like if it writes a function to do something once, it can be reused later

borkdude 2025-11-16T20:04:22.277049Z

do you mean, because of the REPL?

borkdude 2025-11-16T20:04:36.109579Z

(not sure if I follow, I'm still quite a noob when it comes to LLM tooling)

yogthos 2025-11-16T20:04:49.055899Z

yeah cause LLM itself has no state

yogthos 2025-11-16T20:05:33.953679Z

and I think the repl state could work as sort of an external memory for it

borkdude 2025-11-16T20:06:11.011609Z

doesn't it just look at the state of the previous file it edited? (while the REPL serves as a runtime reflection of that state)?

borkdude 2025-11-16T20:06:25.230079Z

or does it really interrogate the REPL with source , ns-publics, docetc?

yogthos 2025-11-16T20:07:40.281789Z

I haven't got that far with it yet, but the use case I was playing with wasn't so much code editing, but rather having the agent do stuff on the system

yogthos 2025-11-16T20:16:54.919929Z

on a related note, there's also this tool that's interesting https://github.com/universal-tool-calling-protocol/code-mode

bhauman 2025-11-16T20:30:18.434809Z

progressive tool discovery is interesting... SCI is such a natural fit for the Clojure community, I don't see why we couldn't make a wrapper library that provided this via a single tool...

yogthos 2025-11-16T20:31:29.948939Z

yeah sci could definitely be a good fit, I just really like the idea of having the llm talk to a single tool where it can just write little scripts to accomplish tasks

bhauman 2025-11-16T20:43:12.120889Z

that's what I'm envisioning for SCI. You make an eval environment tool as a single tool for the LLM built on top of SCI then you configure it to expose the functions you want to expose to the environment. You can also make fun composable tools that operate on lazy sequences fork/join and all kinds of fun functional stuff.

yogthos 2025-11-16T20:43:35.402069Z

yeah exactly

yogthos 2025-11-16T20:43:54.366109Z

instead of using a bunch of tools on the shell, you could make a bunch of tools that all run within a single sci runtime

yogthos 2025-11-16T20:44:24.434589Z

it seems like llms are a great fit for functional style as well since it inherently reduces the scope they need to consider

bhauman 2025-11-16T20:44:29.190509Z

or you make a library and you perhaps mark the functions you want to expose with metatdata 🙂

yogthos 2025-11-16T20:45:07.748339Z

yeah there's lots of ways you could cut it, I think it'd be interesting to track the functions dynamically so when your llm adds new ones locally they stick around

yogthos 2025-11-16T20:45:23.950059Z

and this way you can grow your very own environment organically over time

yogthos 2025-11-16T20:46:02.927489Z

incidentally, there's this idea of residential programming, a fun talk on it you might enjoy https://www.youtube.com/watch?v=Kgw9fblSOx4

yogthos 2025-11-16T20:46:25.175629Z

but basically what he demos in the talk is using the db to track and version individual functions instead of entire programs

bhauman 2025-11-16T20:46:44.576089Z

but of course this is not that much different than the llm writing bash tools...

bhauman 2025-11-16T20:46:52.693409Z

for itself.

yogthos 2025-11-16T20:47:04.369159Z

yeah same basic idea

yogthos 2025-11-16T20:47:23.254799Z

that's why I think the state management aspect of it is the most interesting

yogthos 2025-11-16T20:47:49.743619Z

basically I'm envisioning a repl driven workflow the way human devs use the repl

yogthos 2025-11-16T20:48:08.706049Z

you write a function play with it till it works the way you want, get a result, then move on to do the next step, and so on

yogthos 2025-11-16T20:48:41.839759Z

so you get the llm to break up a task into a series of steps, and then it can work on implementing each one, and when it works right, move to the next

bhauman 2025-11-16T20:49:27.001869Z

yes, you do know that's how this all started for me is having the LLM work in the REPL exclusively before writing things to file..

bhauman 2025-11-16T20:50:00.823939Z

its just that overtime the models got better and better and better to where that step became just something to entertain myself with

yogthos 2025-11-16T20:51:10.362769Z

the models are definitely a lot more capable now, but I still find lots of times where I wish they could iterate faster than they do

yogthos 2025-11-16T20:51:58.855959Z

when the code works in one shot it's great, but when they make mistakes it's sometimes a pain to explain what went wrong without them actually looking at the output and iterating

bhauman 2025-11-16T20:52:30.729169Z

oh yes of course the REPL is essential

yogthos 2025-11-16T20:53:35.018769Z

also this was an interesting paper https://arxiv.org/abs/2509.16198

yogthos 2025-11-16T20:54:25.598989Z

it basically suggests that you can use a graph to describe the project structure and that helps the LLM stay consistent on large scale tasks they normally fail at

yogthos 2025-11-16T20:54:55.314599Z

and I wonder if the repl state could be used as that

yogthos 2025-11-16T20:55:07.918709Z

cause you can functions and relationships available in the repl

bhauman 2025-11-16T20:56:10.147529Z

do you use Claude Code? and if so do you use the plan-mode?

yogthos 2025-11-16T20:56:46.030659Z

I've been using cursor, and it has a similar feature in it

yogthos 2025-11-16T20:57:11.166429Z

not sure how well it compares, but it defines steps and then checks them off as it goes, it definitely seems to work better

bhauman 2025-11-16T20:57:36.729179Z

yeah I'd have to say no... Claude Code is different altogether now... its performance is way beyond where I thought it was

bhauman 2025-11-16T20:57:53.845099Z

I'd say way way

yogthos 2025-11-16T20:57:59.095579Z

yeah a friend of mine uses it, and he's been pretty impressed with it

bhauman 2025-11-16T20:59:29.651989Z

I'd give it a real try before creating a bunch of tools... I'm finding it starting without context and producing real tangible results in a remarkably short period of time.

yogthos 2025-11-16T21:00:05.576609Z

I'm still somewhat leery about wholly relying on a subscription tool though, ideally I'd like to be able to run local models to do stuff

yogthos 2025-11-16T21:00:19.944959Z

so I am curious about how much tooling can help a small model do more

bhauman 2025-11-16T21:03:07.127589Z

yes and that's a valid pursuit but you may want to know what the top performing experience is as a bench mark. ask @cfleming I've invested a bunch of time into improving perf when Claude Code was miles ahead in efficacy.

yogthos 2025-11-16T21:03:53.112479Z

yeah that's fair, this stuff is changing so rapidly now in general, like if you tried these tools even half a year ago it's a completely different world now

yogthos 2025-11-16T21:04:14.073139Z

who knows what things will look like six months down the road 🙂

bhauman 2025-11-16T21:04:22.434149Z

jobless 🙂

😆 1
yogthos 2025-11-16T21:04:57.554079Z

haha I think there's going to be a need for a human in a loop for a while yet because the model can't evaluate correctness in semantic sense

yogthos 2025-11-16T21:05:17.550289Z

but the nature of work is likely going to change from writing code by hand to doing more business analysis tasks

yogthos 2025-11-16T21:06:00.253329Z

it's like we're living through a version of the industrial revolution here

bhauman 2025-11-16T21:06:18.169889Z

I'm not really afraid of that but I'm throwing cljs bugs in a 20k line plus complex codebase at Claude Code and it simply finds the source of the bugs albeit with a little coaching here and there... but its outperforming me for sure

yogthos 2025-11-16T21:06:51.630509Z

oh yeah I find these things are amazing at doing code analysis

yogthos 2025-11-16T21:07:14.368479Z

I've been doing stuff like throwing cursor a service and telling it to show me a sample curl for calling an endpoint and sample data

yogthos 2025-11-16T21:07:31.948029Z

like before I'd have to actually run the service and interrogate it, but now it can do it from just looking at the code

yogthos 2025-11-16T21:07:47.509709Z

I've also found asking it to make mermaidjs diagrams is a handy trick

yogthos 2025-11-16T21:07:53.007119Z

might be useful for claude as well

yogthos 2025-11-16T21:08:05.493109Z

cause then you can look at the diagram of the plan it produced and say change step x

yogthos 2025-11-16T21:08:27.114519Z

and that formalizes it a lot more than just talking to it about it in plain english

👍 1
michaelwhitford 2025-11-17T03:22:06.547889Z

There is something there just on the edge of my awareness. Use a git repo as a memory, and vector database, and code repo, and AI state to feed a state machine/statechart to control the agent. Each action checked in to the repo, so you can reify it into a graph and traverse it from any other agent, branching as needed. Similarity search is too good for narrowing down the tool selection to be left out, but it's complicated and finicky with different embedding models and vector databases, and similarity algorithms. Using git repos as memory alone would be a big boost for existing systems. I can see people creating git repos alongside their libraries, frameworks, etc, and you point your agent at the memory repo when you want the AI to use it.

michaelwhitford 2025-11-17T03:27:55.068499Z

I have been playing with using a git repo commit hook to add embeddings to my vector database, but it feels very "tacked-on" and fragile.

yogthos 2025-11-17T04:53:57.979579Z

hmm I wonder if datascript might be a good fit for the db here

yogthos 2025-11-17T04:54:50.258629Z

or xtdb since it has history, and then you don't even need git