data-science

phronmophobic 2024-11-21T15:06:11.640509Z

Hi @daslu, @hoppy and I have made some headway into understanding what an ffi wrapper for STAN with bridgestan might look like. However, it unfortunately seems like the architecture of bridgestan means that there's not much benefit over cmdstan. More details in ๐Ÿงต

phronmophobic 2024-11-21T15:10:35.192679Z

Here's the basic flow for a typical usage of bridge stan: 1. write a .stan file with a model description 2. Use the cli stanc tool to compile the model to a shared library 3. load the shared library via the JVM 4. construct the model and pass it data via a JSON filename or string 5. run the model Typically, wrapping a native library has the benefits of improving performance and simplifying deployment. Passing data via a json string negates the benefits of shared memory and means performance is probably not much better. Requiring stanc from a local stan installation means that deploying models isn't much easier unless you can precompile the model ahead of time. If you need to tweak and recompile models, then you need to shell out to stanc regardless.

phronmophobic 2024-11-21T15:12:45.830059Z

We were wondering whether there are any other benefits of a bridgestan approach for existing use cases.

Daniel Slutsky 2024-11-21T15:43:45.308439Z

Hi! That is great. Thanks for these notes. For using Stan "the usual way", indeed it may not add much. Bridgestan will mostly be helpful in writing our own inference algorithms, and can be a foundation for a Clojure library for Bayesian Statistics. I will write more a bit later.

hoppy 2024-11-21T15:51:23.755029Z

so the use case of "I wanna bundle my prebaked stan model with some higher-level library is what we would be shooting at?" I see some point in that.

๐Ÿ‘ 1
โœจ 1
Daniel Slutsky 2024-11-21T15:52:15.226319Z

that's a perfect way to explain it!

hoppy 2024-11-21T15:54:00.097669Z

I've done something similar before with Sqlite extension modules. The thing that makes this hard is that you are effectively proposing shipping a boatload of different platform specific libs and picking the right one at runtime. You would need your stan models compiled for quite a few platforms, and hope it all mates up on the target.

Daniel Slutsky 2024-11-21T15:54:35.859609Z

Ohh I see

hoppy 2024-11-21T15:55:12.799889Z

stan-model -> stanc -> g++ -> dirty native .so

Daniel Slutsky 2024-11-21T15:56:02.616569Z

That is a good question. I will try to learn more about how they did it in other languages.

hoppy 2024-11-21T15:57:42.306669Z

It looks like the intent is that you pull bridgestan and it pulls the pieces together for you to do the native build yourself with minimal ceremony. You just need a viable g++ toolchain, for your platform, available

๐Ÿ‘ 1
hoppy 2024-11-21T15:58:03.660789Z

I would do that, then run my python to snarf in the .so with an import.

hoppy 2024-11-21T15:59:05.714219Z

the "bridge" part is effectively a loader with a "clean" interface that in turns loads the stanc generated gnarl

hoppy 2024-11-21T16:02:54.618359Z

it goes circular - want to avoid native compilation, so we strap on a prebuilt library that has virutally no chance of working on the target unless it's built natively.

Daniel Slutsky 2024-11-21T16:07:03.950129Z

As we discussed, this is not high priority for our short-term needs. We thought it would be an easy use case for exploring interop, but it turns out more complicated. Maybe the next step should be to justify it with a use case, e.g. by first using the BridgeStan bindings of another language, and connecting to them from Clojure by some way of communication between two processes.

Daniel Slutsky 2024-11-21T16:08:28.101679Z

(To motivate our interest in it, there is a currently going discussion in the Zulip chat about a topic called Active Inference, where our program not only infers but also acts. Then, observations are interlaced with actions, in a changing environment. To explore such systems in Clojure, BridgeStan seems like the right level of abstraction we could hope for. But this is not a high priority as far as I understand.)

Daniel Slutsky 2024-11-21T16:10:11.502749Z

------------ So, if this seem troublesome on the implementation side, I would suggest puting it on hold till we demonstrate what we want (which may take some time). Does it make sense?

hoppy 2024-11-21T16:11:21.954689Z

I would bring up CmdStan in this context, and ask what sort of use case breaks this or makes it ugly

โž• 1
hoppy 2024-11-21T16:12:31.820399Z

we can certainly make the bridgestan c-shim into an FFI type deal, which would at least give you the benefit of not having to restart an executable on every call into the model.

hoppy 2024-11-21T16:13:03.444339Z

you would still be left with having to have a native-built .so for your box in hand - it would just give you a way to manage lifecycle a little better.

Daniel Slutsky 2024-11-21T16:14:56.660489Z

> I would bring up CmdStan in this context, and ask what sort of use case breaks this or makes it ugly Exactly. I'm proposing we will explore some use cases (like active inference) where https://scicloj.github.io/cmdstan-clj/ are not good enough. The Active Inference project can be one such case.

hoppy 2024-11-21T16:16:46.804299Z

I lack knowledge of the quantity of back and forth traffic between "app" and "model". I think if I had to attack the problem bridgestan was trying to solve, I would have approached it as some sort of RPC thing with language neutrality on the app side - rather than cranking out a bunch of native bindings - invalidate of course if we are slinging K's or M's over the interface

Daniel Slutsky 2024-11-21T16:17:46.047989Z

That makes sense. We can at least do that for a proof-of-concept.

Daniel Slutsky 2024-11-21T16:18:07.692009Z

Many thanks for looking into this, and I'm sorry that it went complicated.

hoppy 2024-11-21T16:18:24.940309Z

meh - software does that

Daniel Slutsky 2024-11-21T16:19:57.868989Z

And if you are curious about the use cases on the statistics side, I'd be glad to look together.

hoppy 2024-11-21T16:21:07.029509Z

yes, I'd be interested to see this action

๐Ÿ‘ 1
Daniel Slutsky 2024-11-21T16:21:17.118239Z

a currently going discussion in the Zulip chat about a topic called Active Inferencefor reference: https://clojurians.zulipchat.com/#narrow/channel/151924-data-science/topic/active.20inference

hoppy 2024-11-21T16:21:49.808179Z

I need to reincarnate my zulip creds

๐Ÿ™ 1
Daniel Slutsky 2024-11-21T16:22:47.361009Z

> yes, I'd be interested to see this action great, I'll write some thoughts later

hoppy 2024-11-21T16:26:38.498589Z

title it "why not cmdstan"?

hoppy 2024-11-21T16:26:51.417589Z

then maybe we can solve that problem

Daniel Slutsky 2024-11-21T17:37:20.070029Z

Just started a Zulip topic thread on that: https://clojurians.zulipchat.com/#narrow/channel/151924-data-science/topic/why.20not.20cmdstan.3F.20.2F.20do.20we.20need.20bridgestan.3F There, we can keep chatting with a few of the relevant people.