2020-11-17 asami | Clojure Slack Archive

asami 2020-11-17

quoll 2020-11-17T16:34:21.100600Z

The plan at this point is to implement the block abstraction over various types of storage. Mapped block files are specifically for a single-machine setup on the JVM. IndexdDB are how we’re doing a single machine setup on Javascript.

quoll 2020-11-17T16:36:34.101600Z

My hope is that if we implement blocks on a system that does distribution, then scaling out gets managed for us.

borkdude 2020-11-17T16:42:28.102Z

@quoll Have you also looked at the https://github.com/replikativ/konserve stuff?

borkdude 2020-11-17T16:43:04.102600Z

I'm not deeply familiar with it, just wondered, as this is used by datahike

quoll 2020-11-17T16:43:25.102900Z

Nope 🙂 But I can

quoll 2020-11-17T16:44:06.103600Z

The design was informed by changes I was trying to make in Mulgara, along with some thinking about what Datomic does with provisioning

quoll 2020-11-17T16:49:13.105200Z

One issue is that querying has been written synchronously. That’s making working with things like IndexedDB awkward. I see that konserve is doing everything asynchronously

mpenet 2020-11-17T17:55:40.105700Z

Yes datahike is full async by default.

mpenet 2020-11-17T17:57:02.107900Z

If i recall correctly they have flags to make the runtime sync/async (at hh-tree level at least) , then use macros to do error handling & co since they cannot rely on just one way of doing things

mpenet 2020-11-17T17:57:28.108500Z

Add cljs support into the mix and that can get quite confusing (but necessary)

mpenet 2020-11-17T17:58:22.109400Z

Doing io with some async facade is a good approach in this context tho

whilo 2020-11-17T19:20:51.124900Z

Hey @quoll, @borkdude and @mpenet. This is a good point to introduce myself. I am Christian, one of the architects of Datahike. We, the team at lambdaforge, have had a look into your storage layout whitepaper lately and your work looks very good. We have some questions about your design choices and would love to compare and combine your AVL ideas with the hitchhiker-tree concepts to mutually improve our storage layers. Regarding konserve, we have designed it exactly to be a minimum viable portable abstraction for asynchronous storage. Since you map your AVL tree with binary layouts directly, konserve's edn serialization facilities might not be needed, but other than that it should be usable and is a potential point of collaboration. As @mpenet has pointed out, to port our stack we have introduced a restricted core.async based monadic async-await alike asynchronous DSL that we can compile away for the JVM because there synchronous IO turns out to be much faster. Based on this abstraction @grounded_sage has a running prototype of Datahike in cljs since yesterday and we are very interested in your thoughts and work on asynchronous programming in cljs.

grounded_sage 2020-11-17T19:20:56.125100Z

@grounded_sage has joined the channel

whilo 2020-11-17T19:22:05.126Z

Since we seem to have very similar long-term goals in terms of reach and functionality and our core objective is not the promotion of the current implementation details of Datahike, we are in fact open to collaborate on any level you see fit including a potential joint project and shared funding.

👍 4

quoll 2020-11-17T19:22:38.126500Z

I haven’t been on the CLJS work recently. That’s being worked on by @noprompt. I’m trying to wrap up the JVM code so that I can join him

quoll 2020-11-17T19:26:11.129400Z

Well, in my case, a lot of it grew out of frustration with the Mulgara codebase (since it is all Java), and wanting to make some changes to the design. I have a strong bias to basing this work on the Mulgara architecture, since that implementation has a record of being very fast. In particular, the AVL trees turned out to be a very effective choice

quoll 2020-11-17T19:28:18.130400Z

I saw David Greenberg’s talk on hitchhiker trees at Strangeloop, and I’ve wanted to try them out. I haven’t yet though

whilo 2020-11-17T19:28:47.131400Z

That makes sense. @mpenet has helped to improve the performance of the hitchhiker-tree quite a bit btw.

quoll 2020-11-17T19:29:21.132600Z

Right now, my opportunity to work on these has been fortuitous for me. I was doing it in my evenings and weekends. But then I mentioned it at work, and my manager, and eventually HIS manager thought it sounded like a great idea, and told me to work on it

whilo 2020-11-17T19:29:36.133Z

Since we already also have our implementation we can compare it with the AVL tree.

👍 1

quoll 2020-11-17T19:30:35.133900Z

Well, the AVL trees are implemented here: https://github.com/threatgrid/asami/blob/storage/src/asami/durable/tree.cljc

quoll 2020-11-17T19:32:24.135800Z

It stores the node balance in the top two bits of the “left” pointer. I’m thinking I should probably change this to the topmost bit of both the left and the right. That’s because these numbers will never be negative, so there’s no issue with using those bits, and it makes it easier to switch between 64 bits and 32 bits if we want to

whilo 2020-11-17T19:31:05.134800Z

Does this mean you are constraint with what Asami can become as a project by your management?

quoll 2020-11-17T19:32:46.136200Z

Sort of. It’s a weird situation 🙂

quoll 2020-11-17T19:39:12.142Z

The story is that I was building a rules engine, and showed my manager. He loved it and told me to work on it during working hours. I said that it was Open Source, and he was happy to do that. That became Naga. He also said that he wasn’t interested in using a commercial database (I was planning on Datomic as my first back-end). “Can you build your own?” So I built a minimal in-memory store. Then he asked for more features. One of them was to port to ClojureScript. Then more features. And more. Eventually, I decided to pull it out and turn the storage into its own project. That is Asami. Other members of the team use Asami, and I try to be as responsive as possible to them. Working with them it became clear that they were more familiar with the Datomic API, and so I wrapped the Graph/query API in a new namespace that started to present something similar to what Datomic looks like. Most of this is being driven by what I want to do next. But my primary focus is always to make it useful for my team.

whilo 2020-11-17T19:41:24.142200Z

Ok, that sounds reasonable to me.

quoll 2020-11-17T19:45:30.143900Z

Right now, the primary focus is durability. As soon as I can get a release out for that, I’ll be moving back to the public API, for functions like with, and also to clear the bug/feature backlog

whilo 2020-11-17T19:48:24.144500Z

Would you be interested then in comparing the durability bits and see whether we can help each other there?

quoll 2020-11-17T19:48:37.144700Z

sure

whilo 2020-11-17T19:52:30.146300Z

Which format of discussion would be most appropriate in your opinion? We are currently doing a lot of shared programming/discussion sessions and we could do one of those together, for example.

quoll 2020-11-17T19:53:58.147100Z

It will depend on timing 🙂 I’m on the East coast of the USA (near Washington DC) (UTC-5)

noprompt 2020-11-17T19:54:43.147400Z

West Coast PST

whilo 2020-11-17T19:57:33.148900Z

I am in Vancouver, BC, (PST) and the rest of the team is in Europe (CET). So mornings in PST work well at the moment, e.g. 8 or 9 am.

whilo 2020-11-17T19:58:26.149200Z

@noprompt Would this work for you?

noprompt 2020-11-17T19:59:39.150Z

Unfortunately, no. I have 3 children I’m responsible for at those times. 🙂

noprompt 2020-11-17T20:00:27.151200Z

However, I am content to communicate asynchronously here and elsewhere. Paula and I are also on the same team.

grischoun 2020-11-17T20:04:29.154700Z

Hi. I am Chrislain and also a member of the lambdaforge team. I’ll be happy to join the call.

noprompt 2020-11-17T20:07:01.156100Z

Will konserve stick to a hard dependency on core.async?

whilo 2020-11-18T19:38:21.165500Z

@noprompt, @alekcz360 Besides removing the core.async dependency we might need to add additional protocols to facilitate low-level block based access to konserve.

alekcz 2020-11-18T19:43:39.165700Z

Removing core.async shouldn't be an issue. The code is structures well enough for it to be done painlessly. I think it'd be worthwhile. Though providing an optional namespace to provide that kind of interface for current users.

alekcz 2020-11-18T19:48:16.165900Z

@whilo @noprompt I'd need some guidance with the block based storage bit. I'm not yet familiar with how asami's storage works.

whilo 2020-11-18T19:50:00.166300Z

Yes, we should discuss this together, I think.

whilo 2020-11-17T21:35:34.159200Z

The interface can be made callbacks (which is general) easily (also with core.async). If you do not want to the internals to use core.async then we would need to rewrite everything in a CPS/callback style. Which programming model would you prefer?

noprompt 2020-11-17T21:42:16.159400Z

Personally, I prefer CPS/callback style and in particular the pattern of using promise style [resolve reject] as it is trivial to implement combinators, etc. while minimizing logic common when using “handlers” i.e.

(fn [error value] (if error ,,,))

whilo 2020-11-17T21:52:34.159700Z

I see. Have you had bad experiences with core.async?

noprompt 2020-11-17T22:05:46.159900Z

Yes, however, those experiences are few.

noprompt 2020-11-17T22:09:04.160100Z

My perspective is that of a consumer. It is occasionally not desirable to as a consumer to take on core.async as a dependency.

borkdude 2020-11-17T22:09:35.160300Z

Fully agree!

borkdude 2020-11-17T22:10:13.160500Z

Build the tooling in a low level way that optionally people can build core.async on top of that. Callbacks are fine for this.

noprompt 2020-11-17T22:10:44.160700Z

Also, CPS/Callbacks merely rely on functions and assume little else which is very inclusive.

👍 1

borkdude 2020-11-17T22:12:44.161Z

I made the "mistake" to couple babashka.pods async functions to core.async, but I changed my mind and switched to callbacks. I don't want to force core.async on consumers.

👍 1

borkdude 2020-11-17T22:13:40.161200Z

Core.async is quite heavy, you're pulling in tools.analyzer, etc. And who knows, a few years from now there's going to be another Clojure async thing. Callbacks will still be there.

borkdude 2020-11-17T22:13:59.161400Z

Maybe project loom will bring interesting things in this regard

mpenet 2020-11-17T22:37:52.161700Z

Loom is jvm only. I quite like core.async personally but for konserve callbacks might be good enough. Callbacks are ok as long as you're not doing a lot of composition with async values, when you are things become hairy fast imho.

mpenet 2020-11-17T22:39:25.161900Z

That said konserve just needs some form of Promise, could be completablefuture on jvm and js/Promise on cljs otherwise. Or just callbacks

borkdude 2020-11-17T22:42:34.162100Z

I quite like core.async btw, that's not the point

noprompt 2020-11-17T22:42:46.162300Z

Speaking from experience, I was confronted with a decision earlier this year to base a workflow interpreter we use on top of promises but ultimately decided that functions accepting resolve and reject callbacks gave use the most cross platform flexibility because, again, just functions.

noprompt 2020-11-17T22:44:17.162500Z

Yeah, that’s my feeling too @borkdude. It’s a great dependency for an app but not necessarily for a library unless, say, it was specifically meant augment/enhance.

mpenet 2020-11-17T22:51:14.162700Z

@borkdude not sure what you mean, I think we're all saying essentially the same thing

borkdude 2020-11-17T22:53:29.162900Z

Agreed :)

mpenet 2020-11-17T22:55:06.163100Z

I think on hh-tree the discussion about this same issue was that we actually needed better ways to compose async values, but that's not needed at konserve level, hh-tree/datahike could turn whatever konserve returns into what it wants.

whilo 2020-11-18T05:33:50.163300Z

Let me first provide some background on my position before replying to async for konserve. I agree that callbacks are more portable and can be made composeable (continuations are a universal framework of computation), but they still require a lot of things to be defined to have similar semantics between libraries and runtimes, i.e. error handling, the way the event loop schedules them (on the JVM a threadpool integration) and how synchronisation happens atomically between two different processes, which is still implicitly in the responsibility of the library maintainer porting code. I think the relationship of the Clojure community to core.async is a bit contradictory in this respect, because these things are language level abstractions that need to be standardized or you always need to manually glue the interfaces together and make sure that the different forms of parallelisms between libraries do not deadlock or screw each other over. In that sense I think Clojure should have got first class async semantics in its compiler instead of bailing out to CPS as the general interface. Even the language of "lean" callbacks, JS, has now done that with async and await. The biggest annoyance with callbacks for me is still the inversion of control flow that is forced on me as a programmer (few programmers like to program directly in CPS style) and most importantly the forced sequential nature of a monadic promise based approach. With core.async I can write down nested compositions of functions doing aggregated immutable reads concisely, which I did a lot for replikativ and kept the codebase very concise and easy to verify that way, e.g. for fetching tree fragments over the wire or mutating durable storage in a critical section, while the callbacks would have cluttered a lot of my code. In that sense I am not happy with having to use callbacks as a library developer. I know this suggestion has been made by the Clojure core team and I think it is debatable that this is a good approach because applications tend to factor out and become libraries. I think it is just evading the problem from their side to be honest and I initially thought 7 years ago when it came out that core.async would have been deeper integrated by now.

whilo 2020-11-18T05:38:51.163500Z

Having said all this konserve was initially implemented deliberately with callbacks. By now the control flow in the filestore became a bit more complicated with using the asynchronous NIO API, so we decided to use core.async internally to make the error handling easier and have better understandable sequential code. But if you would like to not have the core.async dependency in general we can definitely remove it from the interface and either factor out the filestore with the dependency on it or maybe @noprompt can help me to port it to promises, if this makes sense.

grounded_sage 2020-11-17T21:28:02.156700Z

I’m happy to join the call as well.

grounded_sage 2020-11-17T21:33:49.159100Z

@noprompt as far as I am aware we are open to async alternatives. Our main focus has been keeping the codebase cross platform with as little divergences as possible and good error handling. But I would defer to @whilo for a more detailed answer to your question.

👍 1

Clojurians Log v2

asami 2020-11-17