sci

whilo 2024-11-24T09:19:24.535979Z

In what ways do people manipulate and make use of the interpreter state (context) of sci? I have seen sci/fork . I would be interested to fork a full runtime state and exploit Clojure's memory semantics. It seems that doing this with sci might be a fairly clean way.

john 2024-11-25T12:47:41.915409Z

@whilo does datahike do memory mapping?

whilo 2024-11-25T17:09:24.341019Z

You mean as in memory mapped files?

john 2024-11-25T17:11:36.949819Z

My understanding of memory mapping is more-so for in memory data between two different runtimes? But I don't know much about it.

john 2024-11-25T17:12:42.639919Z

But if it's mapped to a file, does that mean updates on data structures would atomically save to disk?

whilo 2024-11-25T17:17:50.596159Z

Datahike's memory model is that after a transaction (commit) everything is atomically stored in the underlying store and is efficiently accessible to datahike instances outside of the VM. All they need is access to the store, e.g. (distributed) file system or S3. Datahike stores the persistent data structure fragments individually in the underlying store, which means that readers can pick up deltas efficiently without having to reason about full snapshots/commits.

john 2024-11-25T17:20:36.899519Z

Interesting. I'll def be using that in the near future. Thanks

👍 1
whilo 2024-11-25T17:27:56.677229Z

I am able to fork btw. Atm. I walk over def s with data structures assigned and also copy atoms (ideally it should handle all mutable memory types). For a smallish SCI context this takes around a millisecond on my machine, so it could still be a bit cheaper to fork.

john 2024-11-25T17:31:09.808379Z

I basically want randomish access to terabytes or petabytes of data, while only keeping megabytes of it in memory. But I don't mean to derail this convo - does datahike have a slack channel here?

john 2024-11-25T17:32:26.566909Z

Nm found it

borkdude 2024-11-24T09:20:11.418319Z

what is "clojure's memory semantics"?

whilo 2024-11-24T09:20:36.705259Z

copy-on-write

whilo 2024-11-24T09:21:14.409939Z

i would handle mutable bits of state manually

whilo 2024-11-24T09:21:26.841889Z

i want to be able to execute code speculatively

whilo 2024-11-24T09:21:37.711299Z

in the background

whilo 2024-11-24T09:22:50.975679Z

for instance, i can track invocations and time spent per defn macro and could jit compile/partially evaluate functions to speed sci up

whilo 2024-11-24T09:25:27.191639Z

datahike has a git-api already, if i keep things in this memory model i could even snapshot sci, save it in a database and run its continuation on another machine

borkdude 2024-11-24T09:26:56.158479Z

I don't know how you would do snapshotting, since the SCI state is an object that references functions. You might be able to do it, but SCI doesn't support it out of the box

whilo 2024-11-24T09:27:29.881139Z

as long as the functions are stateless and recompilable you just need to retain the code

whilo 2024-11-24T09:27:43.567439Z

unfortunately neither clojure nor sci do this atm. as far as i can see

whilo 2024-11-24T09:28:20.040639Z

in sci it might be interesting to keep the analyzed value in the Eval nodes to retain pointers to syntax

borkdude 2024-11-24T09:29:00.309099Z

SCI doesn't only execute stuff from code though, it's meant to interact with the host system so many functions aren't executed from source at all

borkdude 2024-11-24T09:29:16.886409Z

e.g. clojure.core itself isn't executed from source

whilo 2024-11-24T09:29:23.339229Z

the easiest approach is to just assume that a base environment is given

whilo 2024-11-24T09:29:33.733019Z

right, it does not have to be self-contained

whilo 2024-11-24T09:29:42.003259Z

i will provide a runtime anyway

whilo 2024-11-24T09:29:54.231689Z

datahike also needs to be loaded etc.

whilo 2024-11-24T09:30:40.420519Z

i just want to fork (and maybe merge) runtime contexts of the full interpreter in general

whilo 2024-11-24T09:32:07.511679Z

this won't work if it shares mutable code and breaks if the context is reset. but my understanding is that it can work

whilo 2024-11-24T10:05:19.047229Z

the reason to retain code, e.g. for defns, is to be able to reinterpret/recompile it given the runtime context. clojure would be treated like black box native primitives

whilo 2024-11-24T10:06:53.675009Z

there are also non-standard forms of interpretation, e.g. type checking, that could be executed at runtime, given the types that are actually observed

Macroz 2024-11-24T10:18:24.107079Z

I thought copy-on-write means an entirely different thing vs. persistent data structures

Macroz 2024-11-24T10:23:11.414959Z

As for the original question. I use in quite a simple way. In a web app, there are certain extension points where functionality can be specified by the maintainer-users of the "white label" product we are creating. So anyone can host their own instance. It's possible to write plugins outside of the normal development process (as long as there exists an extension point), that use SCI for the execution. The plugin has some setup code that is run after the code is loaded (from filesystem). This uses the sci/fork once to have a separate environment per plugin. Then each use of the plugin (e.g. extension point of user logging in) also sci/forks the plugin and provides more context (user attributes etc.).

❤️ 1
Macroz 2024-11-24T10:26:43.278909Z

I think the only problem is that users are likely more familiar with other programming languages, and we ourselves use CLJ/CLJS in the product with all the bells and whistles already. Therefore a more sane version of this plugin would be to e.g. call a shell script with agreed upon stdin and stdout use. Then anyone can use anything.

Macroz 2024-11-24T10:28:11.605539Z

I have experimented with SCI in another case also, that is a long-time dream of mine. I am basically building an editor so SCI or a variant would provide the equivalent of elisp for Emacs. Because it's easy to reach users in a browser, I'm developing prototypes in it for now, and SCI works for this just fine.

Macroz 2024-11-24T10:29:18.852669Z

In this scenario, I will be providing a lot of host functionality to the scripts. There'll be graphics manipulation stuff, a kind of a graph data model etc. Still I think the use is pretty conventional.

whilo 2024-11-24T18:17:19.503779Z

ZFS or other copy-on-write memory systems do the same thing that Clojure does when it updates its persistent data structures https://hypirion.com/musings/understanding-persistent-vector-pt-1. Datomic/DataScript/Datahike do the same with extension to durable media.

whilo 2024-11-24T18:19:02.149479Z

I am working on an AI system (assistant) that is a simulation engine instantiating a fairly general calculus of intelligence https://github.com/whilo/simmis. I am working from two ends, one immediate practicality through helping with memory management https://simm.is/screen/491680819, the other a solid distributed simulation semantics to be able to run speculatively in the background.

Macroz 2024-11-24T18:21:24.220419Z

But Clojure does not do a copy on write, it makes an eager "copy" that shares most of the original structure

Macroz 2024-11-24T18:22:50.772039Z

Like if you have a tree, it re-creates the path from the root to the target, pointing to the shared sub-trees and not copying but creating the new top-level

Macroz 2024-11-24T18:23:51.735129Z

If I do a modification then the new tree is already created, there will be no copy

Macroz 2024-11-24T18:26:14.287019Z

Every "user" of a data structure just point to the same immutable data

whilo 2024-11-24T18:45:02.580379Z

ZFS does neither, nor does Linux when it forks a process. Copy-on-write is a description of the semantics, not the implementation usually.

whilo 2024-11-24T18:45:31.555729Z

Immutable or persistent are less precise in my experience, because the data structures are not just immutable and persistent is an overloaded term.

whilo 2024-11-24T18:46:24.776079Z

I get what you are saying though. The edge Clojure has is that it can create cheap copies for its own data structures, which is much leaner than these other options.

whilo 2024-11-24T18:47:13.424759Z

Nonetheless I think Clojurians have not explored full stack integrations with those a lot yet as far as I know. I think ZFS could be very powerful to turn mutable external systems into copy-on-write semantics, e.g. mutable index datastructures/databases.

whilo 2024-11-24T23:53:37.954929Z

Is your argument that copy-on-write is somehow lazy? The closest to this I have seen is the hitchhiker-tree on which we built datahike initially, but it still does copy-on-write eagerly on each write operation (because you have to put the written information somewhere into the copy) https://github.com/datacrypt-project/hitchhiker-tree.

Macroz 2024-11-25T06:27:28.242599Z

My mental model is probably loaded with all those languages and libraries that make a physical copy of the (whole) data when there is a modification, whereas functional persistent data structures make a shared data structure instead with just enough newness. I would always call them functional persistent data structures and not CoWs. However these things were not taught at the university when I was there so my definitions may be out of date. 🤷

Macroz 2024-11-25T06:31:01.541929Z

E.g. in the C++ world CoW often seams to be the word for the memory copy in case of write and no sharing or functional data structure is present.

whilo 2024-11-25T07:23:47.677089Z

Fair, maybe persistent is best, but nobody out of functional programming associates this with the right notion of persistence, they all think about persistent on disk. CoW has the benefit that it is conceptually clear what is meant to everybody and having a language that does this cheaply by default is maybe a better sales pitch.

Macroz 2024-11-25T07:24:15.005989Z

Agreed, persistent or durable can be confusing to someone too.

whilo 2024-11-25T07:24:47.648019Z

I also started to describe datahike as "git-like".

Macroz 2024-11-25T07:25:04.878059Z

One could perhaps emphasize that no actual copying is done ever, it's even faster than naive CoW.

whilo 2024-11-25T07:25:40.984439Z

Yes

Macroz 2024-11-25T07:26:19.712579Z

Btw. Datahike is cool stuff 👍

whilo 2024-11-25T07:26:42.313699Z

Thank you 🙂

whilo 2024-11-25T07:27:06.837699Z

Still much to do, but it is coming together somewhat nicely.

whilo 2024-11-25T07:29:27.926019Z

Are you interested in using similar memory semantics for a full Clojure runtime? I think it actually could work with SCI. I also thought about doing it in Clojure directly, but it is difficult to keep the general JVM/Clojure environments (forks) isolated from each other, I think. Maybe I am wrong though and I don't need SCI for that.

whilo 2024-11-25T07:31:00.189119Z

SCI is going to slow down the simulator, I think it would require some JITing or just lifting of pure code to the JVM context to be reasonable, but I think this can be done automatically.

whilo 2024-11-24T09:24:11.077289Z

And a question out of more theoretic interest in staging towers of interpreters, is sci metacircular?

borkdude 2024-11-25T10:13:29.722379Z

SCI probably isn't suited for pure academic work, it is just a pragmatic project, initially born to get Clojure working in a graalvm native image (babashka) and later also to make eval work in CLJS, with limitations. e.g. implementing custom datatypes doesn't work very well in SCI

whilo 2024-11-25T17:13:15.141229Z

I know, your philosophy makes sense. I am not saying that you should provide or guarantee it, just that it would be nice if Clojure would be a bit more friendly to (scheme/racket like) research ideas.

whilo 2024-11-25T17:13:23.119139Z

You mean deftype is not working?

borkdude 2024-11-25T17:13:36.764479Z

yes

borkdude 2024-11-25T17:13:55.114999Z

well it does work, but not fully

borkdude 2024-11-25T17:14:57.857709Z

of course SCI does work on itself if you provide its own namespaces via the namespace configuration it's exposed in bb for example

borkdude 2024-11-25T17:15:14.541479Z

$ bb -e '(sci.core/eval-string "(+ 1 2 3)")'
6

whilo 2024-11-25T17:19:27.237829Z

As long as this lifting is transparently working on the nested context that is fine for me. It means you cannot change the core primitives, but since it is Lisp you can kind of easily work around that.

whilo 2024-11-25T17:21:20.948549Z

I was thinking a lot about how to do non-standard interpretations and compilations on a SCI context yesterday. Some old school AGI systems synthesize towers of languages, but I think you can maybe get there also if you can swap the context on the outside and have some special syntax to steer that from the inside.

borkdude 2024-11-24T09:24:59.767619Z

sci is metacircular, my latest Strangeloop talk clarifies this a bit.

whilo 2024-11-24T09:26:05.177049Z

Nice, I will take a look.

borkdude 2024-11-24T09:27:53.515959Z

One note: SCI cannot execute its own code, so it's not self-hosted, which is a different concept than meta-circularity

whilo 2024-11-24T09:56:30.916739Z

nice talk

whilo 2024-11-25T07:34:46.587029Z

@borkdude do you have a sense of how big the gap is to self-hosting?

whilo 2024-11-25T07:35:35.424319Z

It is not critical for me, but I think it is a strong theoretical property that would allow to explore some more theoretical work from the Scheme community.