holy-lambda

whilo 2022-12-16T23:08:49.601729Z

Hey everyone! I have a beginner's question. Is there a way to run a singleton service for AWS lambdas? I am thinking about hosting the Datahike transactor for a group of functions that can run their queries locally inside the lambda, but need to coordinate their transactions with the transactor for strong consistency.

whilo 2023-03-04T23:18:32.081469Z

I have implemented https://github.com/replikativ/konserve-s3 and https://github.com/replikativ/datahike-s3 taking inspiration from @steveb8n’s link above. I still need to figure out how to release the two projects with our deployment pipeline, but you can just use the github SHAs in deps.edn for now. @viesti I would be down to pair and see how you would wire it up with lambda if you have some time. I don't have enough time myself right now to get into holy-lambda and the AWS stack unfortunately. Latency is as expected higher than with local storage, but if you do not write a lot, caches can stay warm and queries would perform with one roundtrip only (checking whether the DB has changed). Maybe latency is also much better if you access from AWS directly.

πŸ‘ 1
πŸŽ‰ 1
whilo 2023-03-04T23:53:17.438529Z

Lmk what you think πŸ™‚

whilo 2023-03-28T09:15:15.427359Z

With this PR we now have close to optimal latency and automatic transaction batching under backpressure https://github.com/replikativ/datahike/pull/618 (I still need to clean it up a bit for it to be merged). This should benefit the S3 backend the most as it has high latency, but can handle high throughput. Any suggestions of how I could build a demo project as simple as possible, ideally as a starter template for holy-lambda?

πŸ‘ 1
viesti 2023-03-28T09:21:39.851629Z

I have some ideas but haven't gotten them out of my head :D

whilo 2023-03-28T09:25:37.393849Z

hehe

viesti 2023-03-28T09:26:14.849819Z

starter template is a neat idea, I think that if the template is serverless, then it should contain the single writer setup (which I haven't yet gotten around to try out), which is in a significant part, about creating the infra with some tool (I prefer terraform)

viesti 2023-03-28T09:29:18.676839Z

also although holy-lambda does great things to make life easier when using native-image, now that https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html is around, the actual making of lambda could be simpler with just using the jvm11 runtime and making a class that implements the lambda entrypoint requires by that runtime so in a template, holy-lambda kind of optional even but, if the demo would be an app with a frontend and say a rest api, then the ring adapter in holy-lambda is really useful

viesti 2023-03-28T09:29:49.763639Z

but I guess first have to go and try that single writer setup :)

viesti 2023-03-28T09:30:19.522789Z

S3 backend also useful outside lambda I think, but should definitely be tried out in a Lambda :)

whilo 2023-03-28T09:33:06.148279Z

yes, it is interesting for us, because we cannot easily offer a hosted service right now, but might get contracts and support by offering datahike on lambda (that is just a guess by me) and it is good starting point to then offer the writer as a EC2 instance

viesti 2023-03-28T09:33:37.857139Z

ooh, interesting :)

whilo 2023-03-28T09:33:43.007379Z

i am fine also with snapstart, honestly i am n00b on aws, my mind is mostly on distributed persistent data structures

viesti 2023-03-28T09:34:46.147609Z

too crowded and not enough time for one mind to contain it all :)

whilo 2023-03-28T09:37:25.297179Z

yeah, i can maybe figure it out between 30 and 40 o'clock πŸ˜‰

whilo 2023-03-28T09:37:43.680339Z

but i think i should learn a bit more about it now

whilo 2023-03-28T09:38:29.727119Z

so i would be done to also take pointers if you are super busy or pair if you have some time at some point

viesti 2023-03-28T09:42:26.365439Z

basically, the aws's own jvm11 runtime (suggest vjm11 over jvm8) takes a uberjar with a class that implements com.amazonaws.services.lambda.runtime.RequestStreamHandler interface, found in com.amazonaws/aws-lambda-java-core {:mvn/version "1.2.1"}, which you need to include into the uberjar

viesti 2023-03-28T09:42:55.153299Z

implement that and you have a uberjar with a lambda compatible entrypoint

viesti 2023-03-28T09:44:26.291489Z

make a lambda on aws console, upload the jar, then name the handler class, sounds a bit minimalistic, but that's the start πŸ˜„

viesti 2023-03-28T09:46:15.631629Z

not sure if it helps, but my mind was focused on hacking on a bit different thing, there's some bits that you could steal from this, if it helps and terraform suits you :) https://github.com/viesti/clj-lambda-sideloader/tree/main/example

viesti 2023-03-28T09:47:08.147279Z

been incrementing a slack reminder couple of weeks for a weekend to look into this datahike thing πŸ˜„

whilo 2023-03-28T09:50:07.450069Z

hehe, no worries

whilo 2023-03-28T09:50:21.318929Z

thanks, these steps sound doable

whilo 2023-03-28T09:51:22.376679Z

what is crac stuff?

whilo 2023-03-28T09:52:05.568299Z

found it

whilo 2023-03-28T09:54:58.471579Z

to get started with datahike it should be enough to copy this snippet and use it in a project with your S3 settings https://github.com/replikativ/datahike-s3#run-datahike-in-your-repl

πŸ‘Œ 1
whilo 2023-03-28T09:55:13.191189Z

if not then i need to fix and simplify it

viesti 2023-03-28T16:49:42.845559Z

Hmm, tried it out a bit, d/delete-database seems to delete the whole bucket, which was a bit unexpected I think πŸ™‚

viesti 2023-03-28T16:52:23.501859Z

I’m thinking that could there be a prefix in the store configuration, and then the prefix would be used to β€œname” a database, so you could then remove all files under a prefix if needed

viesti 2023-03-28T16:53:19.165939Z

I think S3 buckets are quite long-lasting things, re-creating a bucket with the same name (if you deleted it accidentally) can take some time, since AWS reserves also DNS name for a bucket

viesti 2023-03-28T16:53:32.546119Z

soetimes at least

viesti 2023-03-28T16:54:00.574759Z

but anyway, managed hello world in lambda yay, can put the code & terraform to github soon

viesti 2023-03-28T16:55:08.967239Z

ah, the other thing, deleting a bucket is quite, hmm, heavy operation, one would not want to grant that for a backend (though I failed to limit delete access in my test πŸ˜„)

viesti 2023-03-28T17:53:01.485139Z

whipped out something quick and dirty https://github.com/viesti/clj-lambda-datahike

whilo 2023-03-28T17:56:22.381749Z

❀️

whilo 2023-03-28T17:56:27.727919Z

that is awesome!

whilo 2023-03-28T17:56:54.580829Z

i agree about deleting the bucket and will look into prefixing keys

whilo 2023-03-28T18:01:20.603209Z

how would you carve out the singleton lambda for transact? i think splitting the example into a transact lambda and two different query lambdas would be a good starting point for a template

viesti 2023-03-28T18:19:16.764439Z

we could also have same lambda source code, but say an environment variable that toggles the deployed instance to work as transactor or query node

viesti 2023-03-28T18:19:50.030319Z

so deploy two lambda function instances, but configure them differently

whilo 2023-03-28T18:23:44.102979Z

that makes sense

whilo 2023-03-28T22:53:03.673829Z

i fixed the bucket deletion issue with datahike-s3 0.1.8

viesti 2023-03-29T05:32:49.676429Z

Nice! πŸ™‚

viesti 2022-12-18T19:57:43.048989Z

@whilo Lambda has concurrency configuration, so you could have a transactor lambda with reserved concurrency of 1 and then reader lambdas, with unlimited concurrency > β€’ Reserved concurrency – Reserved concurrency guarantees the maximum number of concurrent instances for the function. When a function has reserved concurrency, no other function can use that concurrency. There is no charge for configuring reserved concurrency for a function. https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html

viesti 2022-12-18T19:59:46.965829Z

I have actually been thinking about this same thing for DataHike, but haven’t had the time/energy to look into it πŸ˜„

viesti 2022-12-18T20:01:38.901089Z

I think the thing that set me off was that I didn’t figure out if DataHike could be backed by just S3 & DynamoDB (there was some old trial of persistence layers that used S3 and DynamoDB that I looked into the summer of last year, but those weren’t up to date with latest DataHike at that time, if I recall correctly)

viesti 2022-12-18T20:02:39.433629Z

IIRC, DataHike supports SQL database as backing store, so Aurora Serverless v1 could be an option, but that has cold start in the order of ~30s seconds, which is annoying

viesti 2022-12-18T20:03:38.801749Z

there’s also Serverless PostgreSQL options with better cold start (and more recent PostgreSQL versions), like https://neon.tech/

viesti 2022-12-18T20:06:57.479919Z

but to me, for a Datalog database, throwing all that querying capability of a SQL database out the window and using it only as triple store feels wrong :D

viesti 2022-12-18T20:07:34.127839Z

so would be very interesting to see a S3 + DynamoDB backing for Datahike

viesti 2022-12-18T20:10:33.981479Z

I haven’t wrapped my head around if the querying lambdas would need to build some kind of query index in their memory, or could this query index then reside in the memory of the transactor lambda. With provisioned concurrency, one could keep such a transactor process always running even, although that incurs a cost

viesti 2022-12-18T20:12:19.928199Z

anyway, I suggest looking at concurrency control, specifically Reserved concurrency πŸ™‚

whilo 2022-12-18T22:30:31.138409Z

Thank you for the contextualization. Yes, using concurrency of one could work. The S3 backend needs to be ported, but we simplified our backend, you only need to implement this protocol https://github.com/replikativ/konserve/blob/main/src/konserve/filestore.clj#L95 and not all methods are needed https://github.com/replikativ/konserve/blob/main/doc/backend.org#backing-store-protocols. So porting the old backend into a reliable backend should be an effort of a few hours max, hopefully. I don't have experience with AWS unfortunately, but I would be down to do a pairing session and make it happen.

whilo 2022-12-18T22:32:26.371639Z

My take on caching would be to leave it to AWS and just wrap services with different service qualities and then pick the respective backend for your project. Datahike has native image support now, so lambdas should already be fairly fast to fire up. I would look into holy-lambda more to prepackage Datahike, but maybe it is just good enough to add it as a dependency to a project actually.

whilo 2022-12-18T22:32:59.548369Z

@viesti What would be a good test case application in your mind?

whilo 2022-12-18T22:36:20.270859Z

I would probably opt for S3 first and speculate that many simple applications can cope with its latency. Maybe DynamoDB as an alternative is then a good combination for apps where you are willing to pay for the latency. But I think some experimentation with simple setups would be a good start.

viesti 2022-12-19T07:18:37.433929Z

> I would probably opt for S3 first and speculate that many simple applications can cope with its latency. This sounds like a good rationale, I didn't actually have a good grounding to talk about DynamoDB, just that have seen it come up with Datomic πŸ˜„ S3 has some interesting properties, like https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/.

πŸ‘ 1
viesti 2022-12-19T07:24:46.629319Z

Just few weeks ago, the JVM AWS Lambda runtime (java 11 currently) got support where after deployment, it creates a VM-level snapshot of the Lambda process, and on invoke, loads this snapshot, which then avoids the slow cold start of a JVM process (in my trial of a Reitit Ring app, cold start went from ~7 seconds into 500 milliseconds). This is called https://aws.amazon.com/blogs/aws/new-accelerate-your-lambda-functions-with-lambda-snapstart/. So, native-image support isn't strictly needed for fast cold start, although I think it is good to keep the code so that it is supported. I'm not familiar with Datahike, but doesn't native-image then though prevent use of clojure.core/eval , where one could evaluate code to use in a query, for example, for explorative purposes? Not sure if this would be a use-case though, maybe one does explorative queries in some other way, than running against a Lambda-based infra.

viesti 2022-12-19T07:27:51.517489Z

> I would be down to do a pairing session and make it happen. I'd be interested, just have have to get better in my time management :D

viesti 2022-12-19T07:32:42.945209Z

> What would be a good test case application in your mind? I don't actually have experience in Datalog databases, but probably something that deals with aspects where Datalog is a very good choice? I guess for Lambda + S3, something that fleshes out the compute and the persistence part, but still is not pathological in that sense.

whilo 2022-12-20T09:29:58.755439Z

Btw. we don't even use SQL as a triple store, just as a blob store. That is also why I think it is not a good default backend, it is very wasteful.

πŸ‘ 1
whilo 2022-12-20T09:30:18.676649Z

Snapstart sounds cool, but 500 ms is still quite some time.

viesti 2022-12-20T09:30:34.948109Z

depends on the app, can be lower

viesti 2022-12-20T09:30:54.425869Z

and that is just the cold start, when lambda has the process running, response times are lower

viesti 2022-12-20T09:31:26.649219Z

especially after jvm hotspot kicks in

whilo 2022-12-20T09:31:52.908689Z

I don't think we need eval, I think almost all Clojure applications using Datahike can be natively compiled.

steveb8n 2022-12-20T09:32:27.628319Z

I'm building on snapstart and learning lots. Touch base if you want more info

whilo 2022-12-20T09:32:33.980269Z

Ok, it is very cool to have options for sure. I just want to aim for the most simple setup that is resource efficient, but maybe not fastest.

viesti 2022-12-20T09:32:53.618439Z

yup, I think I was going a bit too far, I haven't used Datalog databases, so was wondering how people do explorative queries, but I guess that happens at the repl, not in the deployed app πŸ™‚

whilo 2022-12-20T09:33:59.672319Z

Yes. I am a fan of JIT compilers and interactive setups, but native image compilation provides interesting options to scale out like this.

viesti 2022-12-20T09:34:00.924469Z

but I guess the interesting thing is that if a transactor lambda with reserved concurrency would fit the singleton transactor requirement

πŸ‘ 1
whilo 2022-12-20T09:35:25.072339Z

@steveb8n I see.

viesti 2022-12-20T09:35:50.876839Z

for that S3 store I'd go for plain AWS Java SDK v2

whilo 2022-12-20T09:36:35.580069Z

Also, not sure how important this is, but I have not tried AOT compiling Datahike lately on the JVM.

whilo 2022-12-20T09:37:03.547959Z

I am a bit worried about firing up the JVM to run a query to be honest.

whilo 2022-12-20T09:37:23.302249Z

I think the latency impact will be seconds.

whilo 2022-12-20T09:37:38.882619Z

And the compute spent massive compared to what the query execution costs.

viesti 2022-12-20T09:37:50.806579Z

depends on activity

whilo 2022-12-20T09:37:54.333159Z

(for simple queries)

viesti 2022-12-20T09:38:05.150529Z

if there more following queries, it'll be efficient in the long run

whilo 2022-12-20T09:38:39.383599Z

I see, but often you have to consider the worst case latency for your app.

steveb8n 2022-12-20T09:38:46.858599Z

Warm JVM is approx 2x faster than graal native

viesti 2022-12-20T09:38:49.222509Z

you need AOT to be able to do native-image, so I guess AOT for Datahike works then? πŸ™‚

whilo 2022-12-20T09:39:00.371389Z

I guess so, too.

steveb8n 2022-12-20T09:39:21.463219Z

Although that could improve given the size of the graal team

viesti 2022-12-20T09:39:22.224919Z

the Snapstart freezes a Firecracker VM process, so what wakes up, is a warm JVM

viesti 2022-12-20T09:39:43.857329Z

freezes during deploys, then on cold start, thaws it

whilo 2022-12-20T09:39:57.234559Z

I think they should just turn the JVM into an OS at this point.

steveb8n 2022-12-20T09:39:58.481959Z

Not entirely but pretty close

viesti 2022-12-20T09:40:10.943619Z

heh, they would be tied to JVM then πŸ™‚

viesti 2022-12-20T09:40:40.350899Z

I'm expecting Snapstart to be available for other runtimes when they figure out how to offer stable random numbers, that don't get frozen

steveb8n 2022-12-20T09:41:05.898999Z

However ssl connects are slow first time due to handshake. Snapstart can't fix networking

steveb8n 2022-12-20T09:41:36.792649Z

All AWS APIs are ssl calls e.g. S3

viesti 2022-12-20T09:42:08.276489Z

https://docs.aws.amazon.com/lambda/latest/dg/snapstart-uniqueness.html, there's a scanner that operates on bytecode level to check for patterns that one would want to avoid with Snapstart

steveb8n 2022-12-20T09:42:15.572429Z

I'll test holy lambda vs snapstart soon

viesti 2022-12-20T09:42:35.414869Z

(the findbugs successor)

viesti 2022-12-20T09:43:30.255069Z

I'd think this snapstart would be great for say ML stuff, where you'd load a model in memory, then freeze the process, then thaw it upon first request and do inference

steveb8n 2022-12-20T09:45:42.215739Z

It's excellent for CPU bound tasks. Still figuring out how to use it for ssl calls ie AWS APIs

whilo 2022-12-20T09:46:40.906999Z

Deep learning requires a lot of GPU memory, just loading this will always be slow in current stacks.

whilo 2022-12-20T09:47:14.708709Z

What is the best library to use to implement the S3 backend for Datahike?

whilo 2022-12-20T09:47:35.433179Z

I could try to use the Java API directly.

πŸ‘ 1
whilo 2022-12-20T09:48:00.376239Z

Ideally I would like to have an API that can also be used asynchronously, e.g. with callbacks for the http requests.

whilo 2022-12-20T09:48:20.707299Z

We have a dual async/sync stack.

viesti 2022-12-20T09:48:36.191679Z

I think I would also use the Java API, just make sure to use the v2 SDK πŸ™‚

πŸ‘ 1
steveb8n 2022-12-20T09:49:45.486729Z

I used v2 java SDK with holy lambda. Worked well.

πŸ‘ 1
steveb8n 2022-12-20T09:50:11.264989Z

Also give you a choice of http clients

viesti 2022-12-20T09:50:41.824209Z

not sure if it's necessary here, but that java api has more support for things like multipart download, efficient syncing of large data, but we don't need that here. Generally I think it tracks new S3 features well, and has pluggable http client support (aws has their "common runtime" which is a optimized C library I think)

whilo 2022-12-20T09:51:18.463489Z

Ok, cool. Thanks!

whilo 2022-12-20T09:51:44.605899Z

If you feel like pairing over it, lmk.

πŸ‘€ 1
viesti 2022-12-20T09:53:14.688359Z

those aws java sdk libs ship with native-image configurations, haven't looked into how much they matter, but they make effort to have the libraries graalvm native-image compatible

whilo 2022-12-20T09:53:34.010009Z

Cool! That is good.

viesti 2022-12-20T20:09:31.310849Z

Apropo, when developing, you can use for example Minio via docker image, it has good support for the S3 API, so one cam use the AWS Java SDK against Minio. https://min.io/docs/minio/container/index.html Continuing that thougth, a S3 backend for Datahike would allow to use any other object store that implements the S3 API, which I think is interesting.

viesti 2022-12-20T20:19:06.521459Z

I think I'd be interested in a pairing session, I just don't know when and might be the slower one that benifits most :D

viesti 2023-03-29T15:37:18.083139Z

Took a look, I think I forgot to say, that it might be neat to be able to specify a prefix, so you could have multiple databases in a single bucket, something like

{:store {:backend :s3
         :bucket "datahike-s3-instance"
         :prefix "my-db-1"
         :region "us-west-1"}}

πŸ‘ 1
viesti 2023-03-29T17:29:30.507759Z

tried out with separate writer and reader lambdas, but what happens is a bit interesting

0% bb run write '{"data": [{"name": "Alice", "age": 32}]}'
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":"ok","status":"ok"}
0% bb run read
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":[[3,"Alice",32]],"status":"ok"}
0% bb run write '{"data": [{"name": "Bob", "age": 42}]}'
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":"ok","status":"ok"}
0% bb run read
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":[[3,"Alice",32]],"status":"ok"}

viesti 2023-03-29T17:30:01.095629Z

so what is going on here is that the reader lambda has stale db reference, since it doesn’t show the write that the writer did

viesti 2023-03-29T17:30:40.697149Z

so, one could have concurrency=1 lambda, all reads & writes through same process, lol

viesti 2023-03-29T17:30:59.500659Z

but, how should one exactly use many reading processes with datahike?

viesti 2023-03-29T17:31:31.759229Z

is there a way to tell to to datahike to β€œgo refresh caches from the persistent store”

whilo 2023-03-29T17:50:08.577149Z

oh sorry, there is one boolean flag streaming? that needs to be changed

whilo 2023-03-29T17:51:49.019179Z

this can be done by using a different config for the query endpoints

viesti 2023-03-29T17:52:37.332509Z

aa!

whilo 2023-03-29T18:01:15.849729Z

injected this for the query connection before you use it (swap! (:wrapped-atom conn) (fn [db] (update db :writer #(assoc % :streaming? false))))

whilo 2023-03-29T18:01:39.559389Z

that forces the connection to refetch from the underlying store every time you access it

whilo 2023-03-29T18:02:58.939389Z

that would be here https://github.com/viesti/clj-lambda-datahike/blob/main/src/clj_lambda_datahike/core.clj#L31, before the call to d/q

whilo 2023-03-29T18:05:41.775589Z

There will be a cleaner way to do this through the config.

viesti 2023-03-29T18:06:53.570459Z

oh nice, I’ll try that πŸ™‚

viesti 2023-03-29T18:07:20.880889Z

How long has that option been around? I think I was looking for something like that maybe 1-2 years ago

whilo 2023-03-29T18:07:59.789109Z

the PR was merged last week πŸ˜…

viesti 2023-03-29T18:08:23.208399Z

ok πŸ™‚

whilo 2023-03-29T18:08:28.266949Z

currently it sets this when you have a remote transactor in form of datahike-server

viesti 2023-03-29T18:08:56.102429Z

well, and at that time, s3 backend wasn’t around, which was the thing that I was actually looking for πŸ™‚

viesti 2023-03-29T18:09:08.238229Z

ah

viesti 2023-03-29T18:11:49.558409Z

well I’ll be damned, I guess it worked!

0% bb run write '{"data": [{"name": "Pedro jr", "age": 15}]}'
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":"ok","status":"ok"}
0% bb run read
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":[[6,"Pedro jr",15],[5,"Pablo",55],[4,"Bob",42],[3,"Alice",32]],"status":"ok"}
0% bb run write '{"data": [{"name": "Pedro", "age": 59}]}'
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":"ok","status":"ok"}
0% bb run read
{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}
{"result":[[6,"Pedro jr",15],[5,"Pablo",55],[4,"Bob",42],[3,"Alice",32],[7,"Pedro",59]],"status":"ok"}
both Pedros visible after each read

whilo 2023-03-29T18:12:13.328239Z

❀️❀️❀️

whilo 2023-03-29T18:12:21.232879Z

this is awesome!

whilo 2023-03-29T18:12:34.802189Z

finally all the nights spent pay off πŸ™‚

viesti 2023-03-29T18:13:02.863769Z

yes! πŸ™‚

whilo 2023-03-29T18:15:24.278599Z

what do you think?

whilo 2023-03-29T18:29:34.985289Z

i have a PR for datahike that significantly reduces write latency btw., and could do auto batching in case we run the transact calls async in the lambda https://github.com/replikativ/datahike/pull/618

viesti 2023-03-29T18:35:59.777439Z

walking the dog outside, -2 and fingers freezing, still a bit bewildered and glad that I could help, thinking that would need to do some demo with frontend, say todo list :) then also thinking about a perf suite and snapstart setup for eliminating cold starts

viesti 2023-03-29T18:36:39.640669Z

to really have a serverless database, even datalog style, for Clojure, is just wicked :)

whilo 2023-03-29T18:38:25.664279Z

fortunately freezing stopped here in vancouver already πŸ™‚ my partner in montreal is still freezing though

viesti 2023-03-29T18:39:39.848129Z

here in Finland winter came back, but probably only for a short while :)

whilo 2023-03-29T18:42:04.782179Z

still have to visit finland unfortunately, never did it when i lived in germany

whilo 2023-03-29T18:42:24.925269Z

winter in vancouver is not as cold, but very humid, so the cold sticks

whilo 2023-03-29T18:44:00.936259Z

what you say makes sense. with anything you can help i would be super grateful, as i am thinly stretched atm. also with my AI research (which hopefully i can integrate into Datahike as probabilistic inference)

πŸ‘€ 1
whilo 2023-03-29T18:44:37.305589Z

i also need to do sales again as soon as there is something interesting to sell πŸ™‚ atm. we do not make a lot of revenue with datahike and that slows its development

viesti 2023-03-29T18:45:54.359949Z

I'm surprised in a positive way that Datahike can provide revenue :)

whilo 2023-03-29T18:46:43.900549Z

yeah, we were somewhat lucky. we suck in sales

πŸ˜„ 1
viesti 2023-03-29T18:46:48.873079Z

should make some noise somewhere about this lambda trial :)

whilo 2023-03-29T18:47:02.321909Z

but i also needed to first get the distributed use case done before i wanted to go out and pitch it

viesti 2023-03-29T18:47:16.962949Z

yeah

whilo 2023-03-29T18:47:37.331309Z

i would write a blog post as soon as we have a project template that people can use to build prototypes and small apps

whilo 2023-03-29T18:48:32.003119Z

is it possible to fetch and process multiple requests in lambda that return asynchronously?

viesti 2023-03-29T18:50:17.307729Z

lambda is event by event, although there is async invoke to dispatch without waiting but then the event size is quite limited

whilo 2023-03-29T18:50:56.596779Z

ok, that is our business case for a server then

whilo 2023-03-29T18:51:21.329529Z

the server can process multiple requests in parallel and batch them, which gives you better scale on S3

whilo 2023-03-29T18:51:49.196049Z

it is particularly helpful on S3 because of the high latency, it helps in general ofc.

whilo 2023-03-29T18:53:04.233579Z

or do you think there is a better approach?

viesti 2023-03-29T18:53:56.682709Z

i wonder putting write request to say sqs or another queue supported by lambda and then batching off the queue

whilo 2023-03-29T18:55:59.084729Z

that would also do probably, i have no experience with this

whilo 2023-03-29T18:56:31.224559Z

i think you want tx responses though, so the client needs to be notified only after tx call

viesti 2023-03-29T18:57:43.257859Z

ah hmm

viesti 2023-03-29T19:04:18.785309Z

> By default, Lambda polls up to 10 messages in your queue at once and sends that batch to your function. To avoid invoking the function with a small number of records, you can tell the event source to buffer records for up to 5 minutes by configuring a batch window. Before invoking the function, Lambda continues to poll messages from the SQS standard queue until the batch window expires, the invocation payload size quota is reached, or the configured maximum batch size is reached. > https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html it's been some time I did these things, I remember there being some batch size and window size tunable with stuff like Kinesis Firehose, which allowed to put event processing into lambda, when going through Kinesis

viesti 2023-03-29T19:04:59.752559Z

these serverless things are kind of, well, you don't configure a single server, but a host off services :)

viesti 2023-03-29T19:06:15.790739Z

when inside aws, lambda talks quite fast to nearby aws services, but yeah, some kind of benchmark would be neat

whilo 2023-03-29T19:15:45.414419Z

right

whilo 2023-03-29T19:18:14.186489Z

it might be nonetheless reasonable to just run an EC2 instance for transact then, not sure how the prices of the services compare

viesti 2023-03-29T19:23:54.759199Z

if you would have enough traffic that lambda is kept running all the time, then ec2 is cheaper, but it gets more complex, since nowadays you can even buy compute capacity for lambda upfront and benifit from discounts the same way as it was for reserved instances of ec2's or databases

viesti 2023-03-29T19:24:58.886249Z

for on anf off traffic, this kind of setup with Lambda doing writes, with fast enough cold start, is appealing

viesti 2023-03-29T19:28:23.145739Z

so depends on the use case I think

whilo 2023-03-29T19:42:01.524129Z

right, i will think about it

whilo 2023-03-29T19:42:13.740509Z

one thing that is nice if we can also host our setup in other environments

whilo 2023-03-29T19:42:28.481719Z

S3 support is now fairly general in many environments and we have other store backends

whilo 2023-03-29T19:42:41.101369Z

there are also other lambda runtimes that we probably can cover

whilo 2023-03-29T19:43:28.452429Z

how would you like to proceed from here?

viesti 2023-03-29T19:54:03.179559Z

will proceed to bed now :D, but with other lambda runtimes you probably mean say GCP Cloud Run, since the other JVM option in AWS would be a custom runtime. I tried GCP Cloud Run when it came out, it probably has advanced since, I think it even has an option to keep the compute that runs the process "warm" without throttling, as opposed to Lambda, where the process runs only when an event is processed, otherwise it is frozen, so you can't do background processing, can only execute while handling an event, though there is upper processing limit in cloud run too I think I"d want to setup snapstart for the aws lambda next, not sure what after that, some kind of write and read benchmark would probably be neat, read side scaling is interesting, but would have to figure out suitable benchmark scenario, does datahike have benchmarks available?

viesti 2023-03-29T19:54:34.521479Z

but off to bed this side of the globe now :)

whilo 2023-03-29T20:08:11.249149Z

have a good night! thanks for all the input πŸ™‚

whilo 2023-03-29T20:08:19.331509Z

we have https://github.com/replikativ/datahike/blob/main/doc/benchmarking.md, but this probably needs to be adjusted a bit

whilo 2023-03-29T20:08:43.588669Z

i think it is also fine to just write up a synthetic benchmark of your own to get started

πŸ‘Œ 1
whilo 2022-12-18T02:33:45.712229Z

I understand your point, but I think this is not really true. Most operations typically just query the database and then it is a very good fit for lambda functions, because they provide horizontal read scaling, a model that is very compatible with Datomic/Datahike's decoupled, scalable readers. Tne point is nonetheless that sometimes you will need to update the database and transact into it and might want to do this from your lambdas. In this case I guess it would be ideal to have a service running that would be reachable from the lambdas and well integrated. I could just deploy such a transactor into AWS, but I was wondering whether there was already a notion for such services in lambda land.

whilo 2022-12-18T02:36:50.486899Z

For instance if you just want to query a static Datahike database in lambdas this would make perfect sense I think, because queries need zero coordination and can be executed in milliseconds with minimal reads from the index.

steveb8n 2022-12-18T03:58:07.499259Z

agreed. read can be in parallel so suitable for lambdas. I just can’t think of a way to maintain a singleton for keeping writes in serial. maybe ec2 for writes only? or build a disk/storage format which can reconcile writes i.e. don’t need a singleton writer

whilo 2022-12-18T05:02:10.915889Z

Ok, cool. Thanks for validating, I will think about it.

whilo 2023-03-05T23:00:46.400929Z

Both repos are released now, so you can just depend on datahike-s3 and datahike and develop against that.

whilo 2023-03-05T23:01:59.060449Z

I would be in particular interested to have MVP example with one lambda covering the transact function to S3 (which is guaranteed to only run once at a time) and some query in another lambda. If you can help me setup an example project for that I would be very happy.

viesti 2023-03-06T05:53:18.628459Z

Awesome! :)

😊 1
steveb8n 2022-12-17T00:48:14.252249Z

Doesn't really fit lambda. Single server would be ec2 or ecs