What is the status of :jdbc backends for now? Docs say that's ok, but it is not working. Moreover in code (for example store.clj) no methods for :jdbc, only for :mem and :file ...
@sasha_bogdanov_dev I'm using them in production 3M+ datoms
The S3 backend could give you subsecond latency, but I haven't tried it out inside of an AWS data center yet. As long as you run in a single VM the read caches should also be hot in general, so once your app is up read latencies should not be affected that much by the latency of the store. it depends on your data set and use case though.
If you do not require to transact things in a strict order, use transact! to transact asynchronously. The writer will then automatically batch and the latency of the store will not compound (i.e. you will wait a bit longer than one roundtrip to the store in the ideal case) even if you have many concurrent transactions.
Alternatively EFS inside of an EC2 instance with the file backend might be faster and cheaper.
The advantage of S3 is that you can give outside readers read access to it and they can join and query with your datahike database freely. No coordination or additional infrastructure needed.
I will check different approaches, yes
Thank you for suggestions
Cool. Keep me posted on what works and doesn't work for you.
If dynamo is worth it, I am happy to support it. I just wanted to first cover wider and cheaper ground with S3 and the file backend should also work well in many cases.
About DynamoDB: I trying now it with Datascript IStorage protocol (because it is much simpler), and if it will work not bad, then we can implement it in Datahike. File storage could be nice, but EFS from Lambda is not as robust as expected. Datahike hangs up on attempts to persist data, Datalevin sometimes had "resource not Available" errors, etc...
I see.
Why do you need lambdas?
If you have a persistent distributed database like datomic or datahike you can just scale EC2 instances in front of it if needed.
In my experience lambdas are great for developers who do not have a good distributed programming model.
It should be much cheaper to scale up a single EC2 instance first unless you need extreme unmanaged blitz scaling.
Which might also not work with lambdas.
I am not claiming I understand your requirements, I am just curious.
I have multiple small applications (will be many if all good), each with very low load, some mostly completely idle. So, it's really the case for serverless setup. And I want one storage for them all (sure different databases/tables/etc...). I tried Datomic cloud. It could be great, but CloudFormation somehow rejected to deploy official templates and I gived up on it))) maybe I wrong, but it was really annoying
Right, I think somehow they made Datomic difficult to get running in general.
I haven't used it in a long time to be honest.
And it's mostly hobby-project, so good landscape to try things
Ok, that makes sense.
Did you try datahike's S3 backend?
I kind of made it for the lambda use case. I am just not so convinced about lambdas myself anymore.
It is easy to deploy multiple small apps on a single EC2 instance and you can even keep REPLs to them open over SSH if you want.
For now I didn't. It was to hard to believe that such storage can be fast enough for real-time applications
I had latency around 400ms from my laptop here in Canada to next US AWS data center.
I think it will be lower inside of AWS.
But I did not have enough time to try things out. I get that lambdas and EFS are not a very good fit.
(Although that is on Amazon)
> It is easy to deploy multiple small apps on a single EC2 instance and you can even keep REPLs to them open over SSH if you want. Yes, I can, but honestly I am happy with SQS queues between API and Lambdas. So, for now I will stay on Lambdas))
> But I did not have enough time to try things out. I get that lambdas and EFS are not a very good fit. For databases I thing not good, like a file storage it could be nice. And maybe I did something wrong. This landscape is relatively new for me (around a year)
Ok. It should be easy.
@viestiβs lambda template for datahike is using S3 https://github.com/viesti/clj-lambda-datahike/blob/main/src/clj_lambda_datahike/core.clj
But I get that it might be too slow for you. Do you need real-time writes or just fast reads?
It is not native compiled yet though. @viesti mentioned that there are warm start JVM options now that also reduce lambda warmup time. I am not an expert in this.
> But I get that it might be too slow for you. Do you need real-time writes or just fast reads? (edited) Reads speed is critical, writes less
Can you put a number on it?
In terms of milliseconds.
I have concerns about that "new" warm start options, but need to try also
150-200ms is definitely the limit
Also, is your database written a lot? If it is only written sporadically then most reads can be cached and you will only need one roundtrip to S3 on read.
I see, ok then S3 is maybe too slow.
EC2 instance with file backend should be fine though.
Need to try anyway, but now I am in another approach
Cool, thanks for contextualizing π
No problem!
> It is not native compiled yet though. @viesti mentioned that there are warm start JVM options now that also reduce lambda warmup time. I am not an expert in this. > AWS Lambda Java runtime has support for creating a VM snapshot at publish time on Lambda version, which helps in cold starts, but that comes with it's own caveats too (publish takes a couple of minutes, the VM snapshots are cached up to two weeks, rarely used lambdas get evicted from snapstart cache so cold start times might deviate without for example hourly ping) https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html
Maybe this would work https://github.com/passren/DynamoDB-JDBC
It also wouldn't be super hard to add a konserve-dynamodb backend.
But it would make sense to know why it is really needed.
@whilo @sasha_bogdanov_dev I explored this a while back but I got nervous that DynamoDB is not strongly consistent by default. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html
Strongly consistent reads are double the cost of eventually consistent reads.
There's probably a work around, but that's why I stayed clear of it.
Most values we write only exist once, i.e. either you read the right one or there is none.
Exceptions are the root entries for the db, they are overwritten.
In principle reading an old value of those would just mean you didn't fetch the latest snapshot.
Hi, thank you for your investigations! Yes, I know about consistency, but that is not clear what they mean saying βwrites can be not available imminently to readβ. If there is couple milliseconds of latency, then it is ok for my case, if seconds β then no. Need to try and find out.
How so? I don't understand...
So you include datahike.jdbc. Sent the link in the chat.
Oh okay thank you
Including it allows :jdbc to work
I'm currently walking. Will send you an example config a bit later
I think I am okay with further steps, do not worry\
Cool beans:muscle:
Oh No. ππ
Show: Project-Only All
Hide: Clojure Java REPL Tooling Duplicates (7 frames hidden)
1. Unhandled java.lang.IllegalArgumentException
No implementation of method: :-connect of protocol:
#'datahike.connector/PConnector found for class:
clojure.lang.PersistentHashSet
core_deftype.clj: 584 clojure.core/-cache-protocol-fn
core_deftype.clj: 576 clojure.core/-cache-protocol-fn
connector.cljc: 18 datahike.connector$eval47679$fn__47680$G__47670__47685/invoke
connector.cljc: 201 datahike.connector$connect/invokeStatic
connector.cljc: 197 datahike.connector$connect/invoke
REPL: 33 datahike-sandbox.core/-main
REPL: 11 datahike-sandbox.core/-main
RestFn.java: 397 clojure.lang.RestFn/invoke
REPL: 41 datahike-sandbox.core/eval62264What does you setup up code look like?
Oh wait. Its ok, my fail
It would be nice to support DynamoDB (it have jdbc driver). I had a look on code and I do not think I can create a PR quick.
Is it possible to realize?
As I see changes are needed not only in konserve-jdbc , correct?
I think it should be enough to fix konserve-jdbc to connect to dynamo. If you can provide a PR for that it should make datahike work with dynamo.
Looks like no. Need to patch next.jdbc at least in this function: https://github.com/seancorfield/next-jdbc/blob/218cf8263727ce662483fbf26ab08bd9cf22cfad/src/next/jdbc/connection.clj#L140
Problem is that DynamoDB driver (I used CData's one) uses uncommon keys in connection url or spec.
And if we pass complete connection URL then no classname parameter...
So can you use the URL and just make sure it supports the classname? It seems the spec translation is an additional step to translate it into an URL first, right?
I think this mapping should be a multimethod and not a closed map that cannot be extended from the outside to new SQL types, but it is what it is. Probably easiest to add dynamo there and open a PR.
Yes, that is definitely easiest way.
But another problem here: Could not find a valid license for using CData JDBC Driver for Amazon DynamoDB 2024 on this system.
Looks like better to give up on DynamoDB with JDBC. I will take a look on direct integration with taoensso/faraday later.
There's a separate library you load.