https://www.youtube.com/watch?v=SKBG1sqdyIU, coming in ~January. This is a pretty significant advance in capabilities, especially in math and science. One data point that's extremely relevant for programmers: it now scores high enough on https://www.youtube.com/watch?v=SKBG1sqdyIU to be in the top 200 programmers in the world (and makes similar leaps on other programming benchmarks like SWE-bench). It doubles the performance of o1 on https://arcprize.org/blog/oai-o3-pub-breakthrough (poor performance on ARC-AGI is something I've https://www.lesswrong.com/posts/k38sJNLk7YbJA72ST/llm-generality-is-a-timeline-crux#ARC_AGI as some of the strongest evidence against general reasoning capabilities of LLMs) and 1.5xes the performance of the best system trained specifically for ARC-AGI. I think we're now clearly at the point of models being able to program at skilled human level; IMHO it's very much worth thinking about how you want to handle that development from a career perspective.
> My main concern is: can AGI solve new problems? I mean those problems never tackled before. Isn't it bound to the domain of its knowledge? @mdallastella - I am an amateur mathematician, and I have a couple of pet theory areas that I enjoy thinking about, that I would love to implement code for, but that there's still a lot of hammock and research time to do. I thought to unpack some of my partial theory to an LLM, in the form of the question, Consider A, looked at this way; A has a B defined, of course; so then, over in area C, couldn't we use B for C? I got some good output from it, and thought about why. I think that there are an enormous number of problems that might have solutions which exist, not in the form of prior research, but in developing a connection between two areas. (That is sometimes considered to be the fundamental quality of being creative.) And so I predict a number of sudden.... well I wrote 'solutions' but let's instead say 'creations', because plenty of the creations will be problems..... a number of sudden creations prompted by the fact that, some approaches to problems require linking two areas of conceptspace together, and in some cases it might be kind of obvious how to do it, but someone had to think to think of the two things at once and that way of selectable context A and context B to create a new union context is something that LLMs were superhuman at quite early. (You can falsify my claim if you find a person who can, with zero preparation, rap using Chaucerian language about marginal tax rate changes). Where was I..... oh yeah, so, the Google Is New era was a time that you could get a degree of attention by taking two concepts and smooshing them together. Mine was "Punk" and "Mathematics," Garrett Lisi is a physicist but also a surfer and media loved that. I think that's happening again on a different layer: genuinely new solutions on the basis, "people have never asked two experts to talk about that before"
Two concepts as objects, paired -> A New Attention Two concepts as verbs, paired -> A New _? Grammar? Conversation? Dunno, refuse to decide yet π
IMHO it's very much worth thinking about how you want to handle that development from a career perspective.1. I think there is still time (perhaps years): the performance in benchmarks which are quite constrained/narroww by GPT-O3 doesn't exactly prove it can do well in larger/complex tasks yet 2. I have a theory along the lines of "Turing complete" - a huge swathe of jobs can be theoretically be replaced by software given sufficient time/resources - therefore if you solved programming you've basically solved all those other jobs too. Which makes me suspect that software will be the last job to fall to AI - not the first. 3. Many jobs we could switch too (e.g. PM, Designer, Analyst etc) might also be impacted by AI (perhaps even more so). And if they are not (e.g. Plumber) - they could become hugely oversubscribed (bringing down salaries). We might see the cliff edge ahead of us for programming - but instead of trying to take a different path - instead it might be more logical to take a leap of faith and see what's over the other side! 4. There's lots of cases where something that is only 99% accurate - is actually 100% useless (the last 1% is important - and can take 80% of the time to achieve).
Thanks for these insights. :/ > .. take a leap of faith and see what's over the other side! Any idea how we might look over it and try to see?
@rupert thanks for those points. Another reason to be uncertain about impacts, copying from something I said in a discussion in another slack: > what will happen to all the programmers? For, I dunno, ~30 years, big increases in programmer productivity didn't lead to https://alcor-bpo.com/software-engineer-unemployment-rate-statistics-and-dispelling-myths/ or https://4dayweek.io/salary/software-engineering-europe-vs-united-states; those both grew as code got more and more powerful and the world wanted more and more of it. But that seemed like it may have started to change https://layoffs.fyi/, before models were good at coding. So with an enormous jump in programmer productivity from AI, I could imagine either a big drop in the number of programmers (with those who remain being more like the leaders of teams of virtual devs) or a new wave of expansion as near-human-level AI starts to accelerate the economy. re: specific points: > the performance in benchmarks which are quite constrained/narroww by GPT-O3 doesn't exactly prove it can do well in larger/complex tasks yet Of course! But it's clearly able to handle larger and more complex tasks as time goes on; already it's far past where it was a year ago -- and I expect it to be way more capable than this in another year or two. At this point I mostly have Claude-3.5-Sonnet write a namespace at a time based on specifications I give about what functions I want and what their ins/outs should be, along with unit tests to confirm that the behavior is right, and then I don't really care about the internals of each function. That said, I'm mostly writing code for my own research projects these days, none of which needs to be part of a large, complex codebase that many people will look at. I think of the take on microservices that they should each be small enough that if they give you trouble you can just throw them away and quickly rewrite them; that's somewhat the approach I've been starting to take with code; as long as the behavior is right and it's not a performance problem, I can largely ignore the internals in the same way that I can ignore the compiled code. I spent fifteen years caring deeply about the craftsmanship of my code, so it's not the most natural perspective for me. But I can accomplish much more in a shorter time if I let the AI care about the implementation details. YMMV obviously. > I have a theory along the lines of "Turing complete"...if you solved programming you've basically solved all those other jobs too. Which makes me suspect that software will be the last job to fall to AI - not the first. I find that somewhat unconvincing because human programmers haven't solved programming in that sense either; we're all incredibly limited and as Rich has pointed out we can only keep a small handful of balls in the air at once. AI doesn't have to solve programming in a strong sense to effectively replace humans; it just needs to be some combination of cheaper and better. > Many jobs we could switch too (e.g. PM, Designer, Analyst etc) might also be impacted by AI (perhaps even more so). And if they are not (e.g. Plumber) - they could become hugely oversubscribed (bringing down salaries). 100%. I definitely don't feel like I know what jobs are going to look like in a decade. I mostly just think it's really important for programmers, some of whom have felt like LLMs are just stochastic parrots that won't be able to do their jobs for decades if ever, to look up and realize that that picture no longer looks very realistic at all and they need to be thinking hard about how they plan to handle the changes. > There's lots of cases where something that is only 99% accurate - is actually 100% useless (the last 1% is important - and can take 80% of the time to achieve). Absolutely. In those kinds of situations we have a lot of tools that we can apply whether the code is being produced by a human or an LLM. It's worth noting that plenty of researchers are exploring hybrid systems where an LLM is paired with formal methods like TLA+ in order to give correctness guarantees. I'm certainly not saying all the programming jobs are about to instantly disappear. But I think programmers need to realize that things are moving very, very fast. One of the headline benchmarks that o3 did well on is Frontier Math, intended to be extremely challenging even for math PhDs. When it was released -- six weeks ago -- Terence Tao said he expected it to resist AI solving 'for several years at least'; now o3 has jumped from 5% (what the previous best model got) to 25%. I think we're seeing the same level of incredibly rapid advance in programming that we are in math.
> I mostly just think it's really important for programmers, some of whom have felt like LLMs are just stochastic parrots that won't be able to do their jobs for decades if ever, to look up and realize that that picture no longer looks very realistic at all and they need to be thinking hard about how they plan to handle the changes.
I'm back to hanging out on IRC most days (if I can remember to restart irc after a reboot!) and I raised the subject of AI-assisted coding and the folks there were extremely dismissive. Several said some variant of "Yeah, tried it a while back and it was terrible!". I cautioned them that it was advancing incredibly fast and they should revisit it from time to time. Then they switched to "Maybe if they build ethical AI I will take another look..."
I was very skeptical when ChatGPT first launched to the public. When Microsoft announced they were going to integrate it into Bing, I tried it and, yeah, it was pretty bad... lots of hallucinations on purely factual stuff (I picked English opera as a topic since all the material out there about it is fairly dry and accurate -- so I expected the LLM to mirror that). But I kept an eye on it... and once Windows Copilot was available, I started using it instead of search -- because it could gather up multiple results and summarize them and provide links to relevant information far faster than I could research anything via plain ol' search.
And now I'm using it almost every day as an assistant in my editor, to save me searching through documentation and API references etc, for information I need. And I'm just starting to use it to edit my code and add new functions, and review code I write to suggest improved names and docstrings. Previously, I only used it occasionally, and pretty much only to sketch out tests.
It's pretty scary/amazing how fast it has improved. Very much a case of "ignore it at your peril" at this point. The main question is whether it will continue to improve and at what rate -- when will we hit a plateau, or even a wall?
Even then, it's already useful enough. Most AI companies are burning through a lot of money on it tho' -- can it be a sustainable business model?
> The main question is whether it will continue to improve and at what rate -- when will we hit a plateau, or even a wall? I'll be surprised if it doesn't continue at an extremely fast pace, for a few reasons: β’ I think it's starting to be a Moore's law sort of phenomenon, where the pressure/motivation to keep it going keeps resulting in new techniques to make it happen. β’ Synthetic data is looking pretty strong at this point -- early research (er, by which I mean like 9 months ago) suggested it might lead to diminishing returns, but people seem to have found ways around that. β’ And most of all, I think we're at the point where each new advance speeds up progress as it takes more and more work off the shoulders of top researchers, letting them focus entirely on the parts of the job that AI can't do yet. Initially AI just took care of writing routine reports, and then was able to help with brainstorming, now is able to handle routine code as well, and the process is continuing.
@daslu - to rephrase a well known quote, I think βthe only predictable way to predict the future is make itβ. @eggsyntax Great response. I do believe AI will have a big impact and could really change work for many people. But I'm not sure yet what most developers should do about it yet. > IMHO it's very much worth thinking about how you want to handle that development from a career perspective Letβs see a few options: β’ βChange jobs to something related (e.g. PM, Analyst etc)β - as discussed this may not help β’ "Change jobs to something that AI can't do (e.g. Plumber)" - again as discussed this may not help. β’ "Quit your job and try and join an AI startup to try achieve AGI before anyone else" - perhaps not a bad idea, but not an option for everyone. β’ "Keep your existing day job but embrace AI in it." - if AI is ends up being very effective then everyone will embrace it eventually. I donβt think early adopters (e.g. those 6-24 months ahead) will be massively advantaged compared to the late adopters (there's lots of other factors e.g. experience, programming talent, soft skills etc that will differentiate programmers). Were these the kinds of things you had in mind or something else?
I have another option -- since I'm near the end of my career: β’ "Quit your job and watch the whole thing burn to the ground!" π (No, I'm not quitting my job, but I do expect to retire in the next year, and I'm going to enjoy playing with AI & OSS while I sit in my rocking chair on my porch and see the IT industry transformed π€£ )
But, yeah, realistically: option 4 above -- embrace AI in your job. I really don't understand developers who are still holding out at this point and saying AI is useless...
@mathpunk What I meant was: LLM / AGI are passive problem solvers: you can use them as a tool to explore possible solutions or how to apply different knowledge from a domain to another domain, but they are not doing it on their own unless suggested to.
> LLM / AGI are passive problem solvers I think that while this is true of LLMs proper, we'll increasingly continue to see LLMs scaffolded into agentic systems that at minimum go out and do tasks that take multiple steps, and may be long-running agents with goals (truth_terminal is a bizarre but thought-provoking recent example of a long-running agent which among other things now owns a bunch of cryptocurrency, see articles eg https://techcrunch.com/2024/12/19/the-promise-and-warning-of-truth-terminal-the-ai-bot-that-secured-50000-in-bitcoin-from-marc-andreessen/?guccounter=1 and https://www.lesswrong.com/posts/buiTYy75KJDhckDgq/truth-terminal-a-reconstruction-of-events). And I expect AGI to be fully agentic (definitionally it's human-level, and humans are agentic, although of course AGI will have a different set of strengths and weaknesses than humans).
@seancorfield great points, I completely agree with the "ignore at your own peril point". It's moving fast, it will radically change software development. I still have serious doubts that it can engineer, but it will become the tooling of the future. That being said I'm skeptical about o3, o1 isn't that good, and feels like it's just some elaborate chain of thought prompting. I'd rather get more high quality low level LLMs than these API with I don't know what happening in the back
I ask it to write a chess engine in Clojure. It did. A simple one: text UI, "random (but valid) moves by the chess AI"
> Were these the kinds of things you had in mind or something else? Yeah -- I think basically: anything but ignoring it. I sort of went with option 1 by switching into AI safety research, although that wasn't my primary motivation. If I hadn't done that, I think my personal strategy would be: 1. Try to embrace AI & find ways to be dramatically more productive / effective by using it. I do think that there may well come a point where we see tons of layoffs, where the people 6 - 24 months ahead will stay employed and many others won't. 2. Also have a backup plan for when/if that doesn't work, probably your # 1, not sure what exact version. 3. Try to put more money into index funds as a hedge; if AI becomes good enough to put tons of programmers out of work, I expect that we'll also see substantial economic acceleration that should drive markets up.
> Do you think that the UI/UX around AI systems will reach a point where the managers could just explain to the AI what they want, and have the AI build it completely? What do you think the IT world would look like in such a scenario? Eventually, yeah, that's my guess. Maybe an interactive system where managers explain what they want, the AI asks follow-up questions as needed ('Is all the inventory in one location or are there multiple warehouses?'), the AI produces an insta-prototype with mocked data, the managers use it & provide feedback ('Oh right, I forgot to mention that we should be able to delete items from inventory'), and then when the managers are happy with the prototype it gets converted to production code. I guess I'd imagine that for a while before that, there'd still be a specialist handling the manager -> requirements -> spec translation, but it would increasingly look like one specialist per software project rather than a whole team. Tons of guesswork there of course! And I do think there are still substantial open questions about how general and how agentic these systems can be. It's not that I'm certain of any of this, just that I think this sort of a picture is a probable enough outcome that programmers should be thinking hard about it. It's worth noting that even a year ago a lot of good programmers were very skeptical that these systems would ever be of much use for programming, seeing them as (at best) a glorified autocomplete that just grabbed the closest piece of code in the training data. > Perhaps solving those really hard math problems is less of an indication of general reasoning than I attribute to it I definitely don't mean that the performance on math is unimpressive! Just that ARC-AGI is especially impressive in my eyes because it was one of the best examples of the apparent limitations of LLMs, and was specifically designed to test certain kinds of generality & ability to understand novel problems. > (1) O3 does not score that great on ARC-AGI - the high score is from a custom O3 model fine tuned on ARC-AGI. There are other narrow AIs that do quite well on ARC-AGI https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3?commentId=egnGoeCJbvn3kg5GB this was basically a misunderstanding due to poor communication on OpenAI's part -- it was just the regular o3, and not fine tuned specifically on ARC-AGI; what they did do was include 300 of the 400 public ARC-AGI training examples in the training data corpus (where of course they were a tiny fraction of the corpus).
Thanks for this discussion. Another continuing question we may ask is how we may supports our friends and communities here -- and how we may support Clojure -- in preparing for these uncertain times. Studying together and creating spaces for discussion are things we know how to do. Maybe we can try to be clever about choosing a decent learning path for a study group, that will actually be helpful for people and for ourselves.
My main concern is: can AGI solve new problems? I mean those problems never tackled before. Isn't it bound to the domain of its knowledge? If it's so, AGI is a powerful exploring tool and can do the boring parts of programming, while we can focus on problem solving.
That is a truly excellent question. There are a couple of papers showing that they do as well as human scientists at suggesting novel hypotheses to investigate, but it's unclear IMO whether they can combine that with rigorous out-of-domain reasoning. I'm leading a https://docs.google.com/document/d/1f7ky9DP5c02OftKaImoX9Dy_NHENlQG5mAQGxWhaXV4 to investigate that exact question starting in a couple of weeks, and in particular whether they can do science on novel, randomized, toy domains. That said, even if it turns out that LLMs on their own can't, it's quite plausible that they can given some combination of: β’ Inference-time compute (ie RL-trained extended chains of thought like we see in o1 / o3) β’ Access to GOFAI-ish reasoning tools (eg planners) β’ Other various techniques currently under investigation
Really interesting, thanks for sharing! π
I think AI can solve new problems and it's a mistake of the category "stochastic parrot" to think otherwise. The most interesting accomplishment of o3 is that frontier-math benchmark that it does much better on. Once that benchmark saturates mathematicians are out of a job. Mathematicians operate purely in a verifiable symbolic space, whereas programmers still have some time left, a big part of our jobs is to interface with the real world, gather requirements etc.
> The most interesting accomplishment of o3 is that frontier-math benchmark that it does much better on. Once that benchmark saturates mathematicians are out of a job. In my view, as important as that benchmark is, it's overshadowed by the improvement on ARC-AGI, which is explicitly designed to be a test of general reasoning. When/if these systems are capable of general reasoning, they're likely to be able to contribute substantially to AI (and other) research, at which point AI progress increase will be rapid and accelerating (more thoughts on that https://www.lesswrong.com/posts/k38sJNLk7YbJA72ST/llm-generality-is-a-timeline-crux). > Mathematicians operate purely in a verifiable symbolic space, whereas programmers still have some time left, a big part of our jobs is to interface with the real world, gather requirements etc. I suspect that those parts of the job aren't likely to stay out of range for long (except maybe insofar as they rely on personal relationships between people). It seems plausible to me that current systems could already draw out requirements from managers a lot more patiently than I can π
> It seems plausible to me that current systems could already draw out requirements from managers a lot more patiently than I can π Do you think that the UI/UX around AI systems will reach a point where the managers could just explain to the AI what they want, and have the AI build it completely? What do you think the IT world would look like in such a scenario?
(it seems to me that in a world where non-programmers could direct AI to write software that works reliably, the actual tech used under the hood stops mattering at all -- some "blub" language that AI can best manipulate would be the ideal choice at that point, yes?)
> In my view, as important as that benchmark is, it's overshadowed by the improvement on ARC-AGI, which is explicitly designed to be a test of general reasoning. Hmm yeah I'm not too deep into arc vs the hard math benchmark, I've listened to Chollet talk about arc quite a lot and done some of the examples. He always has a grounded perspective so good to listen to with all the hype going on. Perhaps solving those really hard math problems is less of an indication of general reasoning than I attribute to it, and math actually plays into the strengths of LLM's knowing a vast "bag of tricks" and combining them relevantly (or not so relevant with a TON of test time compute, as they needed for the frontier math one). > It seems plausible to me that current systems could already draw out requirements from managers a lot more patiently than I can π Yes, that's where I think it will be going though, programmers doing less programming and more requirements engineering. I wouldn't be surprised if programming can eventually be turned into English->blub/machine-code. I've seen what sonnet 3.5 can already do if you give it shell, a repl and a half decent spec of what you're trying to solve, it's stunning. But it does make a world of difference if that half decent spec makes it to half decent, and not just "draw the rest of the owl". Then again. I'm indeed not sure how long it will take to just give internal communication + codebase + databases to an llm based system and have it understand how to draw the rest of the owl just from that. At that point programmers might really be in trouble.
For now I am getting a ton of value out of llm + shell/repl tools and several specialized workflows.
> AI systems will reach a point where the managers could just explain to the AI what they want - @seancorfield I think there will likely be a range of UIs (e.g. UIs for non programmers may not even show the code that is generated). One UI that may work is keeping things almost exactly as we have them today! PM can create an Issue (e.g. in GitHub), have a back an forth conversation in the comments section with the AI and then the AI can submit a PR. > itβs overshadowed by the improvement on ARC-AGI @eggsyntax A couple of things on this. (1) O3 does not score that great on ARC-AGI - the high score is from a custom O3 model fine tuned on ARC-AGI. There are other narrow AIs that do quite well on ARC-AGI (2) ARC-AGI is a bunch of problems that were designed to be hard for AIs but easy for people- the creators will likely publish a successor to it at some point. > Once that benchmark saturates mathematicians are out of a job. @bbss The frontier math benchmark is certainly hard (even for strong mathematicians) - but it only contains calculations (e.g. the answer is always a number) and doesnβt contain derivations or purely symbolic answers - so it only represents a fraction of the capabilities of a mathematician.
Okay, more than ready to admit that sentence was way too strongly worded π .