ai

2023-04-17T17:57:28.177319Z

Amazonโ€™s Deep Java Library seems to be broadly similar to PyTorch - it even claims to be able to load some PyTorch models. Anyone have any opinions of DJL and how much it overlaps (or doesnโ€™t) with PyTorch? https://docs.djl.ai/

๐Ÿ‘ 1
Daniel Slutsky 2023-04-17T18:06:30.146989Z

@kimi.im (author of https://github.com/scicloj/clj-djl) has had some experience with it. Also, it may be worth asking at the https://scicloj.github.io/docs/community/chat/ chat, where a few AI-minded people are present.

๐Ÿ‘ 1
2023-04-17T19:37:45.684499Z

Funny how circular this stuff gets real fast. What if instead of manually porting stuff from python+pytorch, we just train/fine tune a model to learn neanderthal/deep-diamond/clj-djl really well, in particular ways to express things from a pytorch torch in one of those instead. Are the current best approaches to that fine tuning llama? This just came up on a search https://lightning.ai/pages/community/tutorial/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama/ There's also https://huggingface.co/blog/codeparrot but that's from ancient times (late 2021) Alternatively is just doing advanced prompting stuff with gpt4 good enough for this? My results have been not amazing trying it with ChatGPT, I'm waiting on gpt4 api access though, maybe more is possible with that when it comes to prompt engineering?

jsa-aerial 2023-04-17T21:09:16.985129Z

I think you really need to figure out what you want to do. If all you want to do is fool around with this stuff (*NN) in Clojure, then DeepDiamond is probably what you want. If you want pytorch but in clojure, then libpython-clj may be what you want - assuming it is able to pull that stuff in. If you want to actually implement a transformer (architecture) in Clojure, again, DeepDiamond (and probably Neanderthal) are what you should be reaching for. If you have daydreams of building an LLM the likes of Bert or GPT or llama or some such, you really need to realize that is not really plausible. The resource requirements are too vast. While "Open"AI is now really ClosedAI, there have been some estimates that training GPT4 likely used in the neighborhood of 75,000 Nvidia Tesla P40s running 24/7 for some months.

Rupert (Sevva/All Street) 2023-04-24T07:15:46.323459Z

Agree that some startups will be doing more than just inference. There's at least two types of fine tuning, (1) update all weights in the network (2) freeze nearly all weights and just update a a subset of weights - LoRA is an example of this. Option (2) obviously requires less compute, but both are viable options for startups currently. Having said that most startups won't need to run fine tuning anywhere near as often as they run inference logic (which will be used much more frequently). So if I had to prioritise the work, I would certainly port the inference part to Clojure first - then port the training/fine tuning code later.

john 2023-04-24T15:38:02.305039Z

makes sense

jsa-aerial 2023-04-24T15:45:55.397909Z

Well, maybe - at this point, I'm not sure you will be able to get your hands on any of the latest models, as all this stuff has effectively "closed up". Older stuff should be available though. Again, it really depends on what your goals are.

john 2023-04-24T15:47:34.632929Z

There's some open stuff coming out. It's not as good at longer range instructions like gpt4, but I'm thinking these smaller models might be able to be turned into coding assistants, perhaps without a massive model.

john 2023-04-24T15:48:25.297999Z

Open Assistant's pythia model is being worked on to bring it up to parity with llama, I believe

Rupert (Sevva/All Street) 2023-04-24T15:49:29.036519Z

Yes, there's also https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models in progress.

john 2023-04-24T15:49:46.610639Z

Red Pajama is an open version of the llama original training dataset https://www.together.xyz/blog/redpajama

john 2023-04-24T15:50:12.614089Z

So it's only a matter of time before a fully public version of llama is out

john 2023-04-24T15:50:22.121599Z

Prolly a month or two

john 2023-04-24T15:50:59.910489Z

I heard stabilities initial model releases are pretty bad

john 2023-04-24T15:51:14.429149Z

Something might have gotten messed up in the training

jsa-aerial 2023-04-24T15:53:04.621109Z

If you are just interested in "coding assistants" might make more sense to just use an existing high end model via the APIs they offer.

john 2023-04-24T15:53:41.163189Z

I want a free clojure assistant that runs locally in vs code

john 2023-04-24T15:54:00.755599Z

So preferrably sub 1B model

john 2023-04-24T15:54:39.479549Z

We might not be there yet

john 2023-04-24T15:55:02.954589Z

But I'm hoping we get like a 10x increase in efficiency over the next year

jsa-aerial 2023-04-24T15:55:05.156489Z

OTOH, it is not all that crazy to implement a transformer in DeepDiamond, for more constrained tasks that are de novo.

jsa-aerial 2023-04-24T15:56:44.327179Z

I'm not all that interested in LLMs, but transformers can be used in many other domains.

john 2023-04-24T15:57:36.969429Z

Like this is running inference in the browser, against WebGPU: https://mlc.ai/web-llm/

john 2023-04-24T15:57:54.130519Z

It's super slow

john 2023-04-24T15:58:51.155279Z

But if we get a few orders of magnitude increase in efficiency, you could imagine doing client side inference for a lot of stuff

jsa-aerial 2023-04-24T16:00:34.912479Z

I still think you would be better off going the API route, even if it doesn't offer the exact kind of delivery you would like.

john 2023-04-24T16:01:14.814369Z

For some stuff for sure

Rupert (Sevva/All Street) 2023-04-24T16:01:58.605369Z

The narrative that all AI needs to be run in the cloud seems rather convenient for cloud companies. The progress in Open Source AI + the potential for neural network architecture optimisations + hardware specialisation improvements - means that local AI could certainly be viable.

john 2023-04-24T16:02:42.333339Z

Yeah, I think it'll be a year

jsa-aerial 2023-04-24T16:02:51.616439Z

Define "AI" - I think in many cases it already is

jsa-aerial 2023-04-24T16:03:18.033859Z

But LLMs? not so much

john 2023-04-24T16:03:40.501929Z

For LLM like functionality yeah

john 2023-04-24T16:04:05.501189Z

We need Small Language Models that can compete with LLMs on narrow tasks

Rupert (Sevva/All Street) 2023-04-24T16:04:27.177329Z

But LLM is just a neural network - so nothing stopping it running locally like the others.

john 2023-04-24T16:04:36.490849Z

There's probably a lot of space for improvement and optimization

jsa-aerial 2023-04-24T16:04:53.613569Z

One possibility is to go the FNET route for more "optimal" capability. Not sure if the "Big Boys" are on that or not

john 2023-04-24T16:05:08.932089Z

Yeah, but you need a few good GPUs to get decent token speed out of current LLMs

john 2023-04-24T16:05:26.623509Z

even at inference time

jsa-aerial 2023-04-24T16:05:26.858199Z

It's the training resources that kill you

Rupert (Sevva/All Street) 2023-04-24T16:06:32.857159Z

Sure - they're expensive for an individual - but many startups can raise a few million dollars can train small LLMs. Fine tuning of open models can be done for not very much money.

Rupert (Sevva/All Street) 2023-04-24T16:07:12.111159Z

LLMs are so general that most startups don't need to train from scratch - they are better off using an open source LLM.

john 2023-04-24T16:07:36.052879Z

For sure, you can start making money now on it

jsa-aerial 2023-04-24T16:07:58.806609Z

that's best done via the api route

john 2023-04-24T16:09:25.047609Z

For sure, for a small team that can't solve the problem of catching up with Open AI right now, sure. But what about Open AI stealing your business idea?

jsa-aerial 2023-04-24T16:09:58.725129Z

probably nothing you can do about that

jsa-aerial 2023-04-24T16:10:04.514569Z

seriously

john 2023-04-24T16:10:08.718579Z

And I think open source ecosystem will pull ahead of Open AI in the next year

john 2023-04-24T16:10:31.749659Z

Like linux vs windows

jsa-aerial 2023-04-24T16:10:42.659819Z

OTOH, why would they want to? it would need to be pulling a boat load of$$

john 2023-04-24T16:11:16.599669Z

what do you mean?

jsa-aerial 2023-04-24T16:11:34.215949Z

its not anything like linux vs windows - the resource requirements are vastly differeent

Rupert (Sevva/All Street) 2023-04-24T16:11:57.898399Z

There are open source models like pythia that prove that open source LLM is viable already.

Rupert (Sevva/All Street) 2023-04-24T16:12:31.086349Z

It's been shown that high quality fine tuning on less powerful open source LLMs is also very effective.

Rupert (Sevva/All Street) 2023-04-24T16:13:22.112289Z

Many people have usecases like "Extract the product name from this sentence", "Does this sentence contain a double negative?" - which just needs a small LLM, not the full power of GPT4.

john 2023-04-24T16:13:33.124639Z

I mean like in terms of community buy in, people will want to buy into the open option when possible, if there's reasonable tradeoffs

jsa-aerial 2023-04-24T16:15:19.461169Z

For those sorts of tasks you don't really need LLMs at all

john 2023-04-24T16:15:27.786579Z

They also have these methods where they train a smaller model from a larger model and that can end up being better than just training the small model on the same larger amount of data. So there might be various was to compress these networks

jsa-aerial 2023-04-24T16:16:40.691079Z

for a lot of these smaller tasks you likely do not need this stuff at all. LLMs are the latest "hammer"

Rupert (Sevva/All Street) 2023-04-24T16:18:19.838939Z

It's true (perhaps my examples weren't very good) - but there are still usecase where the quality of open source LLM is enough - and the open source ones will keep getting better.

john 2023-04-24T16:18:53.310629Z

I haven't run the numbers recently but I think you can get a good regime of fine tuning going for a few hundred bucks a month on lambdalabs https://lambdalabs.com/service/gpu-cloud

๐Ÿ‘ 1
jsa-aerial 2023-04-24T16:21:47.373369Z

in the cloud - which is certainly ok by me

john 2023-04-24T16:22:33.877669Z

Wutcu got against local inf, mayn

Rupert (Sevva/All Street) 2023-04-24T16:23:15.142119Z

But they're just using standard GPUs that can be purchased and run locally if its more economical to do that.

jsa-aerial 2023-04-24T16:23:15.475569Z

actually, absolutely nothing ๐Ÿ˜„

john 2023-04-24T16:23:26.763219Z

I'm just kidding

john 2023-04-24T16:23:54.527229Z

But yeah, you can run and "own" your own models on there. Just pay for pure compute and not Open AI's API fees

jsa-aerial 2023-04-24T16:24:42.418419Z

if you have a decent datacenter, yes, you might be able to sort of compete.

Rupert (Sevva/All Street) 2023-04-24T16:28:19.432009Z

We understand that LLM from scratch is expensive - but it turns out that if you have an open source LLM model, then running and fine tuning are not necessarily that expensive. You can do fine tuning of small open source LLMs with 4 to 8 GPUs in a few hours. You can just use the cloud GPU providers so you don't even have to own the GPUs - total cost could be less than $500 - and you may only have to do this very infrequently (e.g. annually). Savings over Cloud AI API can be significant and you get more flexibility in controlling how the model generates text.

Rupert (Sevva/All Street) 2023-04-24T16:30:42.133309Z

I agree that many companies should just use Cloud AI APIs - but I wouldn't try to discourage those that want to or need to do local AI instead.

john 2023-04-24T16:30:51.478719Z

Yup. And you can pursue both paths in parallel. Build on top of OpenAI APIs and other public inf providers, while also building out a private inf capability

john 2023-04-24T16:32:46.792819Z

And if OpenAI accelerates in capability away from open source models, you won't be missing out. But if open source models pull ahead, you'll be ready go private

jsa-aerial 2023-04-24T16:33:39.423049Z

I'm not here to throw a wet towel on you. I think my major points are: 1. if you need LLM, best to probably use apis 2. you likely don't even need an LLM for most (all?) these small tasks 3. a transformer is not= LLM So, you can have local "AI" with transformer capabilities w/o any LLM

Rupert (Sevva/All Street) 2023-04-24T16:33:39.830109Z

Open AI have GPT4 - but most of there users don't use it because GPT3.5 is good enough and cheap enough. There is no GPT5 currently being trained - that was confirmed a week ago. So Open Source could reach parity with GPT3.5 this year.

john 2023-04-24T16:35:11.944019Z

Can't agree with number 1. It really depends

Rupert (Sevva/All Street) 2023-04-24T16:36:07.054359Z

We're just at the beginning of our use of LLM models - we don't know yet enough about how they will be used (e.g. a chat maybe just one call to an AI API - but if you build an AI agent that may take 100s or 1000s of calls each).

john 2023-04-24T16:38:49.151439Z

They're good at coding bots, natural language interface to documentation for customers, analytics, data enrichment, there's a large and growing number of use cases for an LLM. If the LLM feature isn't core to your business, like just a documentation portal for your actual product, then sure, just use text priming and prompt massaging to get a simple documentation bot working through the API.

john 2023-04-24T16:39:46.106369Z

But if you're like a data enrichment company, a private LLM might be something you should be investing in

john 2023-04-24T16:45:11.350319Z

I think education is going to be heavily disrupted by LLMs

Rupert (Sevva/All Street) 2023-04-24T16:45:44.840069Z

Possibly in both senses of the word!

john 2023-04-24T16:45:55.630739Z

lol

john 2023-04-24T16:46:33.353509Z

Data cleaning and enrichment services. Creativity applications

Rupert (Sevva/All Street) 2023-04-24T16:46:33.372259Z

Currently you have one teacher per, say, 20 students - but soon you can have one AI teacher per student going at the student's own pace. That could be huge.

john 2023-04-24T16:46:50.013199Z

exactly

john 2023-04-24T16:47:02.537189Z

To every child in the world at almost no cost!

john 2023-04-24T16:47:11.578219Z

That's different

Rupert (Sevva/All Street) 2023-04-24T16:47:41.605789Z

And possibly the same quality of AI teacher. How do the private schools justify their extra cost (aside from better builidngs)

john 2023-04-24T16:48:36.226169Z

yeah, we'll probably want a human in the loop at some level, but LLMs will definitely be able to teach k-12 within the next decade

Rupert (Sevva/All Street) 2023-04-24T16:48:54.133729Z

The schools probably won't build there own AI teachers - there will be just a few companies supplying them to all the schools.

john 2023-04-24T16:49:35.479399Z

Yeah, some zucker-gates-page ngo no doubt

john 2023-04-24T16:50:11.492099Z

Trying to keep innovation going in the open source space will be key there

Rupert (Sevva/All Street) 2023-04-24T16:51:03.442539Z

UNESCO has published a clear and helpful "Quick Start Guide to ChatGPT and AI in Higher Education", [https://www.iesalc.unesco.org/wp-content/uploads/2023/04/ChatGPT-and-Artificial-Intelligence-in-higher-education-Quick-Start-guide_EN_FINAL.pdf]

Rupert (Sevva/All Street) 2023-04-24T16:52:27.867789Z

That's just using the AI available today.

john 2023-04-24T16:52:32.781059Z

Chat GPT can already generate infite multiple choice questions in various subjects

john 2023-04-24T16:53:50.445179Z

There could be an "infinite 4clojure" website that generates truly unique questions on the fly, that could give out genuine accreditations with zero human teachers in the loop

Rupert (Sevva/All Street) 2023-04-24T16:54:56.838989Z

And customised to the child e.g. "I want answers explained by Pokemon characters" . "Make the science questions into a story set in the Lord of the Rings universe"

john 2023-04-24T16:56:13.171569Z

Yeah, custom childrens movies and games authored on the fly, bringing in custom educational content taylored to what areas of the subject the student least understands

Rupert (Sevva/All Street) 2023-04-24T16:56:58.234069Z

Include Spaced Repetition techniques that the AI is constantly keeping in mind.

john 2023-04-24T16:57:22.536649Z

Yeah, the tutor would have to be able to keep a stable theory of mind of the student going

john 2023-04-24T16:57:55.531889Z

So likely webcam enabled interaction with the bot, for understanding the mind state of the student

Rupert (Sevva/All Street) 2023-04-24T16:58:38.452659Z

These kids are going to be impressively educated! However, I do wonder if some may find it not so motivating if/when they feel that no matter how good they are -they may never be as good as their teacher in certain subjects. e.g. they dreamed of being a journalist or a fiction writer.

john 2023-04-24T16:59:48.003459Z

They'll have new dreams I think, stuff we haven't thought of

Rupert (Sevva/All Street) 2023-04-24T17:00:17.201169Z

Hopefully.

john 2023-04-24T17:02:05.262179Z

Isn't that like a question of whether there will be more to discover or less to discover?

john 2023-04-24T17:02:24.716309Z

Probably undecidable but I figure there will be more

john 2023-04-24T17:03:21.112699Z

We may see kids discovering things in grade school, assisted by AI

Rupert (Sevva/All Street) 2023-04-24T17:05:19.191279Z

Yes - it could be like that. The more negative side is AGIs taking jobs and out competing for the interesting work available. e.g. I want to be a script writer, but AGI writes much more interesting film scripts for much less money that no one will hire me to do that job.

john 2023-04-24T17:07:59.191179Z

If I had to choose between watching a good movie made just by AI, and a good move made by Rupert with the assistance of an AI, I'd probably click on the Rupert one first, just because I know I may be the only one to ever watch the AI only one. I think we'll have a lot more independent content creators

john 2023-04-24T17:08:45.071039Z

And there'll be a "nostalgia effect" for content connected to real humans

Rupert (Sevva/All Street) 2023-04-24T17:10:25.255899Z

Yeah - I'm generally optimistic on the AI future we're headed into.

john 2023-04-24T17:12:54.808299Z

Same. We need a way to make the jobs transition gradual, or it might get kinda hectic. But I'm optimistic

john 2023-04-23T23:04:47.690729Z

yeah, but we'll probably have fine tuning work flows that don't require full training runs. So'll we'll probably be able to take models trained in a python env over thousands of gpus and then take it and fine tune it in another language

jsa-aerial 2023-04-17T21:22:40.205939Z

As an FYI, DeepDiamond has all the usual tools in the box for "real" work: weight decay, momentum, adaptive learning rates, dropout, SGD, batching, convolutions, et. al. So, you can certainly build real things with it using typical repl level interactive development (even on the GPU). It does not have auto diff. If you need that, pytorch, or better yet, Julia is what you should be looking at.

2023-04-17T21:27:37.039909Z

I spent a few weeks doing stable diffusion text-to-image stuff using python. It was great. There are new techniques coming out every other day and you could try them in like an hour. For better or worst you don't have to know a thing about how they work to try them out. You can use the pre-trained models and libraries built up on techniques. But I can't get past my strong distaste for python and its ecosystem/dependency hell (which is particularly bad with all this new fast changing stuff i think), etc. I just want to be able to do this stuff in something that doesn't have to use python at all. libpython-clj is very cool and I have used it a bit, but unfortunately it doesn't really get you past needing to know python well and be very deep into the ecosystem. I realize this is irrational, and I could get more done if I could somehow just accept using python. But I tried this for weeks and after that am strongly motivated to at least explore the possibility of not using python, enough to at least figure out the cost/benefit of such alternatives. My working assumption at the moment is that the big pretrained models are just data and could maybe be converted to something usable outside of python. And then another big assumption is that all these new training techniques are "not that hard" for some definition of that. > As an FYI, DeepDiamond has all the usual tools in the box for "real" work Yeah my first step towards this is getting through Dragan's deep learning book, which I've been reading as fast as possible

2023-04-17T21:39:15.715249Z

Like being able to try the python stuff out fast is great, but then afterward you get it working, you have someone's Google Collab which has its own ad-hoc dependency management that git clones some fork of some big library and does a bunch of other stuff, and doesn't really expose any functions for reuse, its goal is to just be a quick and dirty notebook-as-a-gui. So then if I want to take the next step and build something on top of this, it's sort of a dead end. I could take the notebook code and scripts it uses and spend a while understanding what they're doing, and write my own script with more emphasis on reusability, or just build what I want directly. But then that basically involves just levelling up my python skills a lot. And it's unclear to me at this point if it's possible to spend the same amount of time but work from the bottom up in clojure, learn the underlying techniques instead and end up with something halfway decent. I'm skeptical at this point, but at the very least I know I would enjoy it more

Rupert (Sevva/All Street) 2023-04-18T16:33:08.452359Z

What's your goal? e.g. Creating whole new neural network architectures? Training new models? Running inference? Inference code is much simpler than training code to write/use. A lot of simple python code can be written into libpython-clj with basically the same line count. That's good if you want to make use of the existing python ecosystem. Rewriting some of the inference code into pure clojure (including using DeepDiamond/CUDA etc) is certainly doable but will take a while and with the pace of the current innovation - it may take work to keep it up-to-date. I've written neural network training and inference code from scratch in the past. I agree that working with Python ML code is a pain! There's often manual dependency installation steps, there's loads of copy and pasted code (within a project and between projects), little documentation/comments, slow performance and no multithreading etc. It may be that you need to drop one level deeper- e.g. (just work with huggingface transformers or tensorflow libraries) which are more stable than the libraries on top of them. We did that and the experience became better for us. @carsten.behring advice of creating a polygot codebase seems like quite good too. Keep the absolute minimum necessary in Python (via Libpython-clj, command line or HTTP) and everything else in Clojure.