In what ways do people manipulate and make use of the interpreter state (context) of sci? I have seen sci/fork . I would be interested to fork a full runtime state and exploit Clojure's memory semantics. It seems that doing this with sci might be a fairly clean way.
@whilo does datahike do memory mapping?
You mean as in memory mapped files?
My understanding of memory mapping is more-so for in memory data between two different runtimes? But I don't know much about it.
But if it's mapped to a file, does that mean updates on data structures would atomically save to disk?
Datahike's memory model is that after a transaction (commit) everything is atomically stored in the underlying store and is efficiently accessible to datahike instances outside of the VM. All they need is access to the store, e.g. (distributed) file system or S3. Datahike stores the persistent data structure fragments individually in the underlying store, which means that readers can pick up deltas efficiently without having to reason about full snapshots/commits.
Interesting. I'll def be using that in the near future. Thanks
I am able to fork btw. Atm. I walk over def s with data structures assigned and also copy atoms (ideally it should handle all mutable memory types). For a smallish SCI context this takes around a millisecond on my machine, so it could still be a bit cheaper to fork.
I basically want randomish access to terabytes or petabytes of data, while only keeping megabytes of it in memory. But I don't mean to derail this convo - does datahike have a slack channel here?
Nm found it
what is "clojure's memory semantics"?
copy-on-write
i would handle mutable bits of state manually
i want to be able to execute code speculatively
in the background
for instance, i can track invocations and time spent per defn macro and could jit compile/partially evaluate functions to speed sci up
datahike has a git-api already, if i keep things in this memory model i could even snapshot sci, save it in a database and run its continuation on another machine
I don't know how you would do snapshotting, since the SCI state is an object that references functions. You might be able to do it, but SCI doesn't support it out of the box
as long as the functions are stateless and recompilable you just need to retain the code
unfortunately neither clojure nor sci do this atm. as far as i can see
in sci it might be interesting to keep the analyzed value in the Eval nodes to retain pointers to syntax
SCI doesn't only execute stuff from code though, it's meant to interact with the host system so many functions aren't executed from source at all
e.g. clojure.core itself isn't executed from source
the easiest approach is to just assume that a base environment is given
right, it does not have to be self-contained
i will provide a runtime anyway
datahike also needs to be loaded etc.
i just want to fork (and maybe merge) runtime contexts of the full interpreter in general
this won't work if it shares mutable code and breaks if the context is reset. but my understanding is that it can work
the reason to retain code, e.g. for defns, is to be able to reinterpret/recompile it given the runtime context. clojure would be treated like black box native primitives
there are also non-standard forms of interpretation, e.g. type checking, that could be executed at runtime, given the types that are actually observed
I thought copy-on-write means an entirely different thing vs. persistent data structures
As for the original question. I use in quite a simple way.
In a web app, there are certain extension points where functionality can be specified by the maintainer-users of the "white label" product we are creating. So anyone can host their own instance. It's possible to write plugins outside of the normal development process (as long as there exists an extension point), that use SCI for the execution. The plugin has some setup code that is run after the code is loaded (from filesystem). This uses the sci/fork once to have a separate environment per plugin. Then each use of the plugin (e.g. extension point of user logging in) also sci/forks the plugin and provides more context (user attributes etc.).
I think the only problem is that users are likely more familiar with other programming languages, and we ourselves use CLJ/CLJS in the product with all the bells and whistles already. Therefore a more sane version of this plugin would be to e.g. call a shell script with agreed upon stdin and stdout use. Then anyone can use anything.
I have experimented with SCI in another case also, that is a long-time dream of mine. I am basically building an editor so SCI or a variant would provide the equivalent of elisp for Emacs. Because it's easy to reach users in a browser, I'm developing prototypes in it for now, and SCI works for this just fine.
In this scenario, I will be providing a lot of host functionality to the scripts. There'll be graphics manipulation stuff, a kind of a graph data model etc. Still I think the use is pretty conventional.
ZFS or other copy-on-write memory systems do the same thing that Clojure does when it updates its persistent data structures https://hypirion.com/musings/understanding-persistent-vector-pt-1. Datomic/DataScript/Datahike do the same with extension to durable media.
I am working on an AI system (assistant) that is a simulation engine instantiating a fairly general calculus of intelligence https://github.com/whilo/simmis. I am working from two ends, one immediate practicality through helping with memory management https://simm.is/screen/491680819, the other a solid distributed simulation semantics to be able to run speculatively in the background.
But Clojure does not do a copy on write, it makes an eager "copy" that shares most of the original structure
Like if you have a tree, it re-creates the path from the root to the target, pointing to the shared sub-trees and not copying but creating the new top-level
If I do a modification then the new tree is already created, there will be no copy
Every "user" of a data structure just point to the same immutable data
ZFS does neither, nor does Linux when it forks a process. Copy-on-write is a description of the semantics, not the implementation usually.
Immutable or persistent are less precise in my experience, because the data structures are not just immutable and persistent is an overloaded term.
I get what you are saying though. The edge Clojure has is that it can create cheap copies for its own data structures, which is much leaner than these other options.
Nonetheless I think Clojurians have not explored full stack integrations with those a lot yet as far as I know. I think ZFS could be very powerful to turn mutable external systems into copy-on-write semantics, e.g. mutable index datastructures/databases.
Is your argument that copy-on-write is somehow lazy? The closest to this I have seen is the hitchhiker-tree on which we built datahike initially, but it still does copy-on-write eagerly on each write operation (because you have to put the written information somewhere into the copy) https://github.com/datacrypt-project/hitchhiker-tree.
My mental model is probably loaded with all those languages and libraries that make a physical copy of the (whole) data when there is a modification, whereas functional persistent data structures make a shared data structure instead with just enough newness. I would always call them functional persistent data structures and not CoWs. However these things were not taught at the university when I was there so my definitions may be out of date. 🤷
E.g. in the C++ world CoW often seams to be the word for the memory copy in case of write and no sharing or functional data structure is present.
Fair, maybe persistent is best, but nobody out of functional programming associates this with the right notion of persistence, they all think about persistent on disk. CoW has the benefit that it is conceptually clear what is meant to everybody and having a language that does this cheaply by default is maybe a better sales pitch.
Agreed, persistent or durable can be confusing to someone too.
I also started to describe datahike as "git-like".
One could perhaps emphasize that no actual copying is done ever, it's even faster than naive CoW.
Yes
Btw. Datahike is cool stuff 👍
Thank you 🙂
Still much to do, but it is coming together somewhat nicely.
Are you interested in using similar memory semantics for a full Clojure runtime? I think it actually could work with SCI. I also thought about doing it in Clojure directly, but it is difficult to keep the general JVM/Clojure environments (forks) isolated from each other, I think. Maybe I am wrong though and I don't need SCI for that.
SCI is going to slow down the simulator, I think it would require some JITing or just lifting of pure code to the JVM context to be reasonable, but I think this can be done automatically.
And a question out of more theoretic interest in staging towers of interpreters, is sci metacircular?
SCI probably isn't suited for pure academic work, it is just a pragmatic project, initially born to get Clojure working in a graalvm native image (babashka) and later also to make eval work in CLJS, with limitations. e.g. implementing custom datatypes doesn't work very well in SCI
I know, your philosophy makes sense. I am not saying that you should provide or guarantee it, just that it would be nice if Clojure would be a bit more friendly to (scheme/racket like) research ideas.
You mean deftype is not working?
yes
well it does work, but not fully
of course SCI does work on itself if you provide its own namespaces via the namespace configuration it's exposed in bb for example
$ bb -e '(sci.core/eval-string "(+ 1 2 3)")'
6As long as this lifting is transparently working on the nested context that is fine for me. It means you cannot change the core primitives, but since it is Lisp you can kind of easily work around that.
I was thinking a lot about how to do non-standard interpretations and compilations on a SCI context yesterday. Some old school AGI systems synthesize towers of languages, but I think you can maybe get there also if you can swap the context on the outside and have some special syntax to steer that from the inside.
sci is metacircular, my latest Strangeloop talk clarifies this a bit.
Nice, I will take a look.
One note: SCI cannot execute its own code, so it's not self-hosted, which is a different concept than meta-circularity
nice talk
@borkdude do you have a sense of how big the gap is to self-hosting?
It is not critical for me, but I think it is a strong theoretical property that would allow to explore some more theoretical work from the Scheme community.