ai

2024-10-04T08:26:39.277209Z

I’ve always thought Clojure has a competitive advantage because it is a better designed language, but what if that effect is cancelled out by the training data of LLMs. When you write in a boring but popular framework like React you get better Copilot suggestions which will make you more productive overall even though the underlying technology is not as good.

bdbrodie 2024-10-04T11:23:09.449529Z

re: copilot. We had a Github representative visiting the accelerator where I sit. Couple of things mentioned worth sharing: The completions views in the code editor window are still using GPT 3.5 for many users. If you pay for business or higher, you start getting GPT-4o-mini. The “chat” window, however, uses gpt-4-turbo at a minimum, so higher quality. Enterprise has some amazing planning features coming soon, I think I have to pay for it, just to see what they’re cooking

2024-10-04T15:27:03.786389Z

ok

Benjamin 2024-10-10T09:39:27.076509Z

• arguably, copilot is useful for python / typescript / react, because of verbosity and lack of generality. Clojure addresses these shortcoming in it's own proven, powerful way • I don't know if this is true, but generating clojure code might lead to higher quality code, • 1. because the code examples out there are more high quality • 2. because ai is prompted to use an elegant, functional lisp with focus on immutability. The resulting code benefits from (some of) the properties that make Clojure powerful • interactivity is still hard and uncharted growth area for current mainstream languages (and workflows). Clojure already a provable, working, simple way of interactive programming, which only ads to and is not replaced by generating code imo.

👍 1
🎯 1
Rupert (Sevva/All Street) 2024-10-07T11:50:35.985099Z

I don't think it will be a hindrance to Clojure at all. • Programming skills are transferrable. e.g. An LLM that is trained on python can more quickly pick up clojure than if it was never trained on Python. • LLMs are getting very good that they can learn a lot from even a single example during training! You don't need vast repetitive datasets on every programming language. • LLMs can do a lot "in context" - so in future if you created a brand new programming language or DSL - you could just put the documentation in your prompt and then the LLM would be able to use your language or DSL even if it has never seen it during training. • Clojure is a small language (very little syntax) so it's easy for an LLM to pick up.

bdbrodie 2024-10-07T12:04:47.105639Z

I can attest to this as well. Sonnet 3.5 and 4o both do a great job with eletric v3, which neither of the models have seen before. Just need to provide one or two examples. Completely changed my mental model for what is going to be possible, when we can put together reliable agents, that are self-healing, self-improving, etc

👍 2
jiriknesl 2024-10-04T08:57:02.525189Z

That’s true. LLMs know Python and JavaScript much better. I hope GitHub Next team will train on smaller languages too. There are still 77900 Clojure repos in GitHub which isn’t little. But there are 10.2 millions of Python repos, 20.3M JS repos, most likely many backend languages contain some JS too (PHP, Ruby, Python…) In edge case situations which require true engineering, it doesn’t matter a lot. In generic webapp/mobile app development, it might matter a lot.

👍 1
Oliver Marks 2024-10-04T09:54:58.857049Z

I wonder how compatibility effects that, one thing with react and python the way you write it has changed how many of those repos for python are for python 2 or for older versions of react surely they would be part of the training data, seems old clojore code and libraries have less breaking changes. probably something some one has solved by only using repos with up to date languages and libraries. Better support for more niche languages would certainly be nice.

👍 1
2024-10-04T11:19:09.835369Z

The early gpt-4 made a lot of mistakes with clojure, but the current sota write really good clojure. I have them hooked up to the repl, and sonnet-3.5 can steer itself clear and debug. It's obvious to me that they train it in such environments. The "reasoning" that o1 does is already something that sonnet-3.5 does if you hook it up to a repl. It sees output and steers towards stated goals without getting stuck in endless loops (as much..).

bdbrodie 2024-10-04T11:19:17.603619Z

Sonnet 3.5 is surprisingly capable, especially when used by wrappers like Cursor. Including more code in the training set was one of the main goals for the model and it helped on general reasoning significantly. Also, quality over quantity matters for the latest models. Until proven otherwise, I believe functional code will continue to compose better than imperative code. And transformers are still just getting started, there are so many promising research efforts underway.

2024-10-04T11:23:47.477909Z

The point of boilerplate has been brought up in the #clojurellm channel a while back and they seemed to have the assumption that LLMs mostly provide value of removing boilerplate. I think the opposite is true for now, as you get so much value out of the data focus of clojure, not having to paste a ton of type information everywhere. Part of the success of LLM's in coding is because they are so to the point and many programmers can't even consider not putting their code behind a bunch of unnecessary layers of abstraction.

👍 1
👍🏽 1
2024-10-04T11:37:54.067519Z

its an interesting field, particularly @b.brodie’s point about functional composability. As models advance they will always have a bigger mainstream language base to train off. This may keep code generation better for mainstream languages even as Clojure generation abilities continue to advance

2024-10-04T11:40:01.550519Z

As for the point of frameworks like react, I find that if you ask an llm to write clojurescript, it quickly defaults to a bunch of hiccup/reagent. But that is not what I use, I use helix (closer to react) and if there's one thing that LLM's are good at it's "transpiling". I find that even giving one example react component in my "style" will make the better LLM's pick up nearly all the syntactic differences and just able to one-shot port big components, even as big as some of these multi-library ones: https://ui.aceternity.com/components/text-reveal-card

2024-10-04T11:50:50.920609Z

I think also the part of bigger codebases and running out of quality human data is not that important for coding domains. It seems that textual quality human data is basically mostly exhausted, but places where you can check the results you can generate vast amounts of synthetic data on. So for coding that provides a ton of leverage. Also, an interesting fact is that training on math and coding makes models improve on other domains too, you'd assume that training in one language somewhat transfers to others.

bdbrodie 2024-10-04T12:12:37.656239Z

Couldn't agree more. Synthetic data will be a game changer, so many silos that can be opened with realistic (but not real) data