Amazonโs Deep Java Library seems to be broadly similar to PyTorch - it even claims to be able to load some PyTorch models. Anyone have any opinions of DJL and how much it overlaps (or doesnโt) with PyTorch? https://docs.djl.ai/
@kimi.im (author of https://github.com/scicloj/clj-djl) has had some experience with it. Also, it may be worth asking at the https://scicloj.github.io/docs/community/chat/ chat, where a few AI-minded people are present.
Funny how circular this stuff gets real fast. What if instead of manually porting stuff from python+pytorch, we just train/fine tune a model to learn neanderthal/deep-diamond/clj-djl really well, in particular ways to express things from a pytorch torch in one of those instead. Are the current best approaches to that fine tuning llama? This just came up on a search https://lightning.ai/pages/community/tutorial/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama/ There's also https://huggingface.co/blog/codeparrot but that's from ancient times (late 2021) Alternatively is just doing advanced prompting stuff with gpt4 good enough for this? My results have been not amazing trying it with ChatGPT, I'm waiting on gpt4 api access though, maybe more is possible with that when it comes to prompt engineering?
I think you really need to figure out what you want to do. If all you want to do is fool around with this stuff (*NN) in Clojure, then DeepDiamond is probably what you want. If you want pytorch but in clojure, then libpython-clj may be what you want - assuming it is able to pull that stuff in. If you want to actually implement a transformer (architecture) in Clojure, again, DeepDiamond (and probably Neanderthal) are what you should be reaching for. If you have daydreams of building an LLM the likes of Bert or GPT or llama or some such, you really need to realize that is not really plausible. The resource requirements are too vast. While "Open"AI is now really ClosedAI, there have been some estimates that training GPT4 likely used in the neighborhood of 75,000 Nvidia Tesla P40s running 24/7 for some months.
Agree that some startups will be doing more than just inference. There's at least two types of fine tuning, (1) update all weights in the network (2) freeze nearly all weights and just update a a subset of weights - LoRA is an example of this. Option (2) obviously requires less compute, but both are viable options for startups currently. Having said that most startups won't need to run fine tuning anywhere near as often as they run inference logic (which will be used much more frequently). So if I had to prioritise the work, I would certainly port the inference part to Clojure first - then port the training/fine tuning code later.
makes sense
Well, maybe - at this point, I'm not sure you will be able to get your hands on any of the latest models, as all this stuff has effectively "closed up". Older stuff should be available though. Again, it really depends on what your goals are.
There's some open stuff coming out. It's not as good at longer range instructions like gpt4, but I'm thinking these smaller models might be able to be turned into coding assistants, perhaps without a massive model.
Open Assistant's pythia model is being worked on to bring it up to parity with llama, I believe
Yes, there's also https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models in progress.
Red Pajama is an open version of the llama original training dataset https://www.together.xyz/blog/redpajama
So it's only a matter of time before a fully public version of llama is out
Prolly a month or two
I heard stabilities initial model releases are pretty bad
Something might have gotten messed up in the training
If you are just interested in "coding assistants" might make more sense to just use an existing high end model via the APIs they offer.
I want a free clojure assistant that runs locally in vs code
So preferrably sub 1B model
We might not be there yet
But I'm hoping we get like a 10x increase in efficiency over the next year
OTOH, it is not all that crazy to implement a transformer in DeepDiamond, for more constrained tasks that are de novo.
I'm not all that interested in LLMs, but transformers can be used in many other domains.
Like this is running inference in the browser, against WebGPU: https://mlc.ai/web-llm/
It's super slow
But if we get a few orders of magnitude increase in efficiency, you could imagine doing client side inference for a lot of stuff
I still think you would be better off going the API route, even if it doesn't offer the exact kind of delivery you would like.
For some stuff for sure
The narrative that all AI needs to be run in the cloud seems rather convenient for cloud companies. The progress in Open Source AI + the potential for neural network architecture optimisations + hardware specialisation improvements - means that local AI could certainly be viable.
Yeah, I think it'll be a year
Define "AI" - I think in many cases it already is
But LLMs? not so much
For LLM like functionality yeah
We need Small Language Models that can compete with LLMs on narrow tasks
But LLM is just a neural network - so nothing stopping it running locally like the others.
There's probably a lot of space for improvement and optimization
One possibility is to go the FNET route for more "optimal" capability. Not sure if the "Big Boys" are on that or not
Yeah, but you need a few good GPUs to get decent token speed out of current LLMs
even at inference time
It's the training resources that kill you
Sure - they're expensive for an individual - but many startups can raise a few million dollars can train small LLMs. Fine tuning of open models can be done for not very much money.
LLMs are so general that most startups don't need to train from scratch - they are better off using an open source LLM.
For sure, you can start making money now on it
that's best done via the api route
For sure, for a small team that can't solve the problem of catching up with Open AI right now, sure. But what about Open AI stealing your business idea?
probably nothing you can do about that
seriously
And I think open source ecosystem will pull ahead of Open AI in the next year
Like linux vs windows
OTOH, why would they want to? it would need to be pulling a boat load of$$
what do you mean?
its not anything like linux vs windows - the resource requirements are vastly differeent
There are open source models like pythia that prove that open source LLM is viable already.
It's been shown that high quality fine tuning on less powerful open source LLMs is also very effective.
Many people have usecases like "Extract the product name from this sentence", "Does this sentence contain a double negative?" - which just needs a small LLM, not the full power of GPT4.
I mean like in terms of community buy in, people will want to buy into the open option when possible, if there's reasonable tradeoffs
For those sorts of tasks you don't really need LLMs at all
They also have these methods where they train a smaller model from a larger model and that can end up being better than just training the small model on the same larger amount of data. So there might be various was to compress these networks
for a lot of these smaller tasks you likely do not need this stuff at all. LLMs are the latest "hammer"
It's true (perhaps my examples weren't very good) - but there are still usecase where the quality of open source LLM is enough - and the open source ones will keep getting better.
I haven't run the numbers recently but I think you can get a good regime of fine tuning going for a few hundred bucks a month on lambdalabs https://lambdalabs.com/service/gpu-cloud
in the cloud - which is certainly ok by me
Wutcu got against local inf, mayn
But they're just using standard GPUs that can be purchased and run locally if its more economical to do that.
actually, absolutely nothing ๐
I'm just kidding
But yeah, you can run and "own" your own models on there. Just pay for pure compute and not Open AI's API fees
if you have a decent datacenter, yes, you might be able to sort of compete.
We understand that LLM from scratch is expensive - but it turns out that if you have an open source LLM model, then running and fine tuning are not necessarily that expensive. You can do fine tuning of small open source LLMs with 4 to 8 GPUs in a few hours. You can just use the cloud GPU providers so you don't even have to own the GPUs - total cost could be less than $500 - and you may only have to do this very infrequently (e.g. annually). Savings over Cloud AI API can be significant and you get more flexibility in controlling how the model generates text.
I agree that many companies should just use Cloud AI APIs - but I wouldn't try to discourage those that want to or need to do local AI instead.
Yup. And you can pursue both paths in parallel. Build on top of OpenAI APIs and other public inf providers, while also building out a private inf capability
And if OpenAI accelerates in capability away from open source models, you won't be missing out. But if open source models pull ahead, you'll be ready go private
I'm not here to throw a wet towel on you. I think my major points are: 1. if you need LLM, best to probably use apis 2. you likely don't even need an LLM for most (all?) these small tasks 3. a transformer is not= LLM So, you can have local "AI" with transformer capabilities w/o any LLM
Open AI have GPT4 - but most of there users don't use it because GPT3.5 is good enough and cheap enough. There is no GPT5 currently being trained - that was confirmed a week ago. So Open Source could reach parity with GPT3.5 this year.
Can't agree with number 1. It really depends
We're just at the beginning of our use of LLM models - we don't know yet enough about how they will be used (e.g. a chat maybe just one call to an AI API - but if you build an AI agent that may take 100s or 1000s of calls each).
They're good at coding bots, natural language interface to documentation for customers, analytics, data enrichment, there's a large and growing number of use cases for an LLM. If the LLM feature isn't core to your business, like just a documentation portal for your actual product, then sure, just use text priming and prompt massaging to get a simple documentation bot working through the API.
But if you're like a data enrichment company, a private LLM might be something you should be investing in
I think education is going to be heavily disrupted by LLMs
Possibly in both senses of the word!
lol
Data cleaning and enrichment services. Creativity applications
Currently you have one teacher per, say, 20 students - but soon you can have one AI teacher per student going at the student's own pace. That could be huge.
exactly
To every child in the world at almost no cost!
That's different
And possibly the same quality of AI teacher. How do the private schools justify their extra cost (aside from better builidngs)
yeah, we'll probably want a human in the loop at some level, but LLMs will definitely be able to teach k-12 within the next decade
The schools probably won't build there own AI teachers - there will be just a few companies supplying them to all the schools.
Yeah, some zucker-gates-page ngo no doubt
Trying to keep innovation going in the open source space will be key there
UNESCO has published a clear and helpful "Quick Start Guide to ChatGPT and AI in Higher Education", [https://www.iesalc.unesco.org/wp-content/uploads/2023/04/ChatGPT-and-Artificial-Intelligence-in-higher-education-Quick-Start-guide_EN_FINAL.pdf]
That's just using the AI available today.
Chat GPT can already generate infite multiple choice questions in various subjects
There could be an "infinite 4clojure" website that generates truly unique questions on the fly, that could give out genuine accreditations with zero human teachers in the loop
And customised to the child e.g. "I want answers explained by Pokemon characters" . "Make the science questions into a story set in the Lord of the Rings universe"
Yeah, custom childrens movies and games authored on the fly, bringing in custom educational content taylored to what areas of the subject the student least understands
Include Spaced Repetition techniques that the AI is constantly keeping in mind.
Yeah, the tutor would have to be able to keep a stable theory of mind of the student going
So likely webcam enabled interaction with the bot, for understanding the mind state of the student
These kids are going to be impressively educated! However, I do wonder if some may find it not so motivating if/when they feel that no matter how good they are -they may never be as good as their teacher in certain subjects. e.g. they dreamed of being a journalist or a fiction writer.
They'll have new dreams I think, stuff we haven't thought of
Hopefully.
Isn't that like a question of whether there will be more to discover or less to discover?
Probably undecidable but I figure there will be more
We may see kids discovering things in grade school, assisted by AI
Yes - it could be like that. The more negative side is AGIs taking jobs and out competing for the interesting work available. e.g. I want to be a script writer, but AGI writes much more interesting film scripts for much less money that no one will hire me to do that job.
If I had to choose between watching a good movie made just by AI, and a good move made by Rupert with the assistance of an AI, I'd probably click on the Rupert one first, just because I know I may be the only one to ever watch the AI only one. I think we'll have a lot more independent content creators
And there'll be a "nostalgia effect" for content connected to real humans
Yeah - I'm generally optimistic on the AI future we're headed into.
Same. We need a way to make the jobs transition gradual, or it might get kinda hectic. But I'm optimistic
yeah, but we'll probably have fine tuning work flows that don't require full training runs. So'll we'll probably be able to take models trained in a python env over thousands of gpus and then take it and fine tune it in another language
As an FYI, DeepDiamond has all the usual tools in the box for "real" work: weight decay, momentum, adaptive learning rates, dropout, SGD, batching, convolutions, et. al. So, you can certainly build real things with it using typical repl level interactive development (even on the GPU). It does not have auto diff. If you need that, pytorch, or better yet, Julia is what you should be looking at.
I spent a few weeks doing stable diffusion text-to-image stuff using python. It was great. There are new techniques coming out every other day and you could try them in like an hour. For better or worst you don't have to know a thing about how they work to try them out. You can use the pre-trained models and libraries built up on techniques. But I can't get past my strong distaste for python and its ecosystem/dependency hell (which is particularly bad with all this new fast changing stuff i think), etc. I just want to be able to do this stuff in something that doesn't have to use python at all. libpython-clj is very cool and I have used it a bit, but unfortunately it doesn't really get you past needing to know python well and be very deep into the ecosystem. I realize this is irrational, and I could get more done if I could somehow just accept using python. But I tried this for weeks and after that am strongly motivated to at least explore the possibility of not using python, enough to at least figure out the cost/benefit of such alternatives. My working assumption at the moment is that the big pretrained models are just data and could maybe be converted to something usable outside of python. And then another big assumption is that all these new training techniques are "not that hard" for some definition of that. > As an FYI, DeepDiamond has all the usual tools in the box for "real" work Yeah my first step towards this is getting through Dragan's deep learning book, which I've been reading as fast as possible
Like being able to try the python stuff out fast is great, but then afterward you get it working, you have someone's Google Collab which has its own ad-hoc dependency management that git clones some fork of some big library and does a bunch of other stuff, and doesn't really expose any functions for reuse, its goal is to just be a quick and dirty notebook-as-a-gui. So then if I want to take the next step and build something on top of this, it's sort of a dead end. I could take the notebook code and scripts it uses and spend a while understanding what they're doing, and write my own script with more emphasis on reusability, or just build what I want directly. But then that basically involves just levelling up my python skills a lot. And it's unclear to me at this point if it's possible to spend the same amount of time but work from the bottom up in clojure, learn the underlying techniques instead and end up with something halfway decent. I'm skeptical at this point, but at the very least I know I would enjoy it more
What's your goal? e.g. Creating whole new neural network architectures? Training new models? Running inference? Inference code is much simpler than training code to write/use. A lot of simple python code can be written into libpython-clj with basically the same line count. That's good if you want to make use of the existing python ecosystem. Rewriting some of the inference code into pure clojure (including using DeepDiamond/CUDA etc) is certainly doable but will take a while and with the pace of the current innovation - it may take work to keep it up-to-date. I've written neural network training and inference code from scratch in the past. I agree that working with Python ML code is a pain! There's often manual dependency installation steps, there's loads of copy and pasted code (within a project and between projects), little documentation/comments, slow performance and no multithreading etc. It may be that you need to drop one level deeper- e.g. (just work with huggingface transformers or tensorflow libraries) which are more stable than the libraries on top of them. We did that and the experience became better for us. @carsten.behring advice of creating a polygot codebase seems like quite good too. Keep the absolute minimum necessary in Python (via Libpython-clj, command line or HTTP) and everything else in Clojure.