Fork me on GitHub
#datomic
<
2019-12-10
>
kosengan03:12:39

Hi Channel đź‘‹ <https://clojurians.slack.com/archives/C03S1KBA2/p1575946578414400> I'm pondering on pros and cons of using Datomic in production at scale. Got the below comment regarding the same. Please do share your thoughts?

johnj04:12:09

@thegobinath small data, high reads is the sweet spot

steveb8n04:12:09

IMHO the current biggest con is no export tools. the recommendation seems to be that backups are not required but that just doesn’t fly in the enterprise world where I provide services. Not sure what to do about this yet

kosengan05:12:57

Disadvantages

It can be slow, as Datalog is just going to be slower than equivalent SQL (assuming an equivalent SQL statement can be written).
If you are writing a LOT, you could maybe need to worry about the single transactor getting overwhelmed. This seems unlikely for most cases, but it's something to think about (you could do a sort of shard, though, and probably save yourself; but this isn't a DB for e.g. storing stock tick data).
It's a bit tricky to get up and running with, and it's expensive, and the licensing and price makes it difficult to use a hosted instance with it: you'll need to be dealing with sysadminning this yourself instead of using something like Postgres on Heroku or Mongo at MongoHQ
Source: <https://stackoverflow.com/questions/21245555/when-should-i-use-datomic>
So, how is the current situation with regards to the disadvantages of Datomic as described in this stackoverflow thread?

henrik08:12:24

@U064X3EF3 Mentioned that Cloud doesn’t use a single transactor. I’m not sure of the details, but I presume that if you create more than one DB in Cloud, there’s no need to sync writes between those DBs as they are isolated from one another. The sysadmin bit also doesn’t really apply to Cloud (though you’ll have to deal with AWS in some capacity). Getting it up and running is pretty much just clicking through a wizard. Pricing wise, a solo deployment lands at around $30-$40/month. For production, it depends a lot on usage.

Alex Miller (Clojure team)13:12:16

Saying that datalog is slow compared to sql is literally nonsense (in the literal sense of “literal”). The rest of that post reads as out of date (pre Cloud). The whole point of cloud is that the environment is largely built for you and makes best use of aws.

marshall11:12:03

The presumption that datalog is slower than sql is incorrect

val_waeselynck12:12:44

@thegobinath I'd say the number 1 disadvantage of Datomic is the time you have to spend explaining why you're using it... and the fact that it's not open-source of course, which is a deal breaker for some people. For the rest, I think it's more objective to talk in terms of limitations rather than disadvantages. Let me lay those out:

đź‘Ť 4
val_waeselynck12:12:58

1. Datomic is not low-level storage. Don't use it for high-churn data, blobs, etc. Use if for accumulating facts, only that.

val_waeselynck12:12:30

2. Datomic will be challenging if you have a high write throughput or data size (official rule of thumb: 10 billion datoms is the limit). It will be even more challenging if the relationships in the data have poor locality (this is a rare condition: a large graph with long-range relationship is an example. The usual enterprise system will be fine).

val_waeselynck12:12:46

3. Most developers don't know it. I don't think it's hard to learn, especially for juniors, but your developers have to be able and willing to learn.

val_waeselynck12:12:05

4. It's pretty much married to the JVM as a platform. You can call it from other platforms, but will lose many of the advantages.

val_waeselynck12:12:48

5. It's not lean in terms of computational resources: the minimum deployment will have a high footprint.

val_waeselynck12:12:09

6. It has essentially no support for all but 'relational' queries (fulltext etc.), and performs poorly on big aggregation queries.

val_waeselynck12:12:04

7. It's not a bitemporal system, people often have misplaced expectations regarding this, because of the temporal reputation of Datomic.

mpenet12:12:31

8. AWS only (not considering on-prem)

val_waeselynck12:12:05

Yes, if we're only considering Cloud I could add a few more limitations.

val_waeselynck12:12:17

I still believe Datomic is the best technical option for the most mainstream use case of databases: online information systems with high reads and non-trivial transactional writes, a natural relational / graphical data model, and acting as a source of information for downstream systems.

🚀 4
henrik12:12:10

Add the Cloud-specific ones, for completeness.

val_waeselynck12:12:04

How so? On-Prem is an option for the others.

henrik12:12:32

Oh, sorry, I didn’t realise the question was about on-prem.

val_waeselynck12:12:55

I don't know that the question was about one specific deployment strategy 🙂

henrik12:12:03

Well, it’s quite a bit more than deployment strategy, right? The “sysadmin” bit in the post above applies more to on-prem than Cloud. And with Cloud, you’re married to CodeDeploy, for better or worse, etc.

val_waeselynck12:12:19

Yes I fully agree, I was only refraining from going into these specifics.

đź‘Ť 4
johnj03:12:48

and acting as a source of information for downstream systems. Like some kind of meta database?

val_waeselynck17:12:55

No, like the «sales» system upstream of the «emailing» and «analytics» systems

val_waeselynck12:12:34

@thegobinath note: the SO post you mention predates Datomic Cloud, so some parts of it are no longer true! Especially the "It's a bit tricky to get up and running with" part as mentioned by @henrik

kosengan13:12:27

Ok. Apart from the challenges with Learning/Deploying, what it would be like if Twitter/Reddit had chosen Datomic (with Clojure of course)? Reddit uses Postgres+Cassandra Twitter uses MySQL

Alex Miller (Clojure team)13:12:13

That seems like something impossible to answer

kosengan13:12:10

Yeah. That's slightly a stupid question :) I'm just considering similar use case with similar volume of Data transactions

val_waeselynck14:12:34

Everyone reivent their own database system at that scale

val_waeselynck14:12:14

Neither Reddit nor Twitter started with something having the capacity to deal with their current scale, and that's fine

kosengan14:12:34

Makes sense. So one can be safe by starting out Datomic and come up with their own solutions to deal with Scaling. Innovation is born out of necessity :)

henrik14:12:16

The hugely interconnected nature of social graphs, where users can be expected to ad-hoc interact with any other user or any piece of content, seem like a problem hard to target without talking about a lot of infrastructure beyond the database.

Alex Miller (Clojure team)14:12:30

you might note that Nubank started with Datomic and is now the largest fintech company in Latin America, still using Datomic

kosengan14:12:14

One great example (not to do with DBs) is what Facebook did with php

kosengan14:12:36

Recently, how Discord used Rust to speed up Elixir

Alex Miller (Clojure team)14:12:37

they have done a lot of excellent engineering to allow them to make the most out of Datomic

henrik14:12:40

Nubank’s credit cards does seem like something that would be easier to compartmentalize than a social network. No user should interact with any other user’s data.

souenzzo15:12:55

But you can add friends and chat with (support) people :thinking_face:

souenzzo15:12:52

There is also a personal timeline, from a social net, the unique missing feature is feed from your friends

henrik15:12:38

Right! But those things seem like they can be cleanly sliced per customer. If they support families, it becomes a different matter. Then you might want to make sure that they sit in the same DB I suppose.

mpenet14:12:20

yes, it's heavily sharded if I recall correctly

mpenet14:12:03

they probably use datomic for other things tho. Every "db" has limitations/tradeoffs

Mark Addleman14:12:05

One point related to Datomic Cloud's single transactor per db model: If I recall correctly, as of a year ago, you cannot use Datomic's datalog to join data across dbs but the problem was an implementation detail. I don't know if that has been resolved. If it's been fixed and your transaction boundaries don't cross dbs, then Datomic might scale very well given query groups

grzm16:12:21

I'm trying to test a database function I intend to use as an entity predicate. My thought is to use it in a query: for example, identifying entities that currently violate the predicate. Something like this:

(d/q '[:find (sample 1 ?e)
         :where
         [?e :some/attr]
         [(com.grzm/valid? $ ?e) ?valid]
         [(not ?valid)]]
         db)
Works in dev. Doesn't work in prod. In prod, I get the following error:
Execution error (ExceptionInfo) at datomic.client.api.async/ares (async.clj:58).
Unable to find data source: $__in__3 in: ($ $__in__2 $__in__3 $__in__4 $__in__5)
- Dev and prod have the same sha deployed. - Both have the same version of Datomic Cloud (535-8812). - Dev is solo, prod is, um, production. Save me, Obi-Wan Kenobi. You're my only hope.

jaret18:12:45

Is this in an Ion?

jaret18:12:37

Am I correct in understanding, the only difference is a solo topology for one system (working) and a production topology for another system (not working)?

grzm18:12:15

That's the only difference I'm aware of. I'm running that query from the repl (it's an ion in the sense that it's an allowed function), only changing my proxy connection between the two.

marshall18:12:35

are you sure the ns with the valid? function is available in both?

marshall18:12:41

i.e. deployed to both

grzm18:12:51

Yes. If I typo the name of the function, I get a is not allowed by datomic/ion-config.edn error instead.

jaret19:12:22

Could you try two things? 1. use an explicit in 2. try passing in a specific entity ID to check validity

jaret19:12:59

for number 1. it would look like:

(d/q '[:find (sample 1 ?e)
         :in $
         :where
         [?e :some/attr]
         [(com.grzm/valid? $ ?e) ?valid]
         [(not ?valid)]]
         db)

jaret19:12:34

If all that still fails, I’d like you to log a support case with <mailto:[email protected]|[email protected]> so we don’t lose this to slack archive.

grzm19:12:06

Number 1 fails with the same error.

grzm19:12:08

Number 2 succeeds (passing in an eid, no sample)

grzm19:12:30

So, the question becomes how do I write the query to return entities that fail the predicate? Would be nice to be able to use sample, as I don't want to necessarily perform an exhaustive search.

marshall19:12:08

Aggregations in the find don't change the amount of work performed by the query

marshall19:12:22

They only shape the result

marshall19:12:10

Things like sample and limit do not "short-circuit" the query

grzm19:12:41

(d/q '[:find ?e
         :where
         [?e :some/attr]
         [(com.grzm/valid? $ ?e) ?valid]
         [(not ?valid)]]
       db)
Returns
Execution error (ExceptionInfo) at datomic.client.api.async/ares (async.clj:58).
[?valid] not bound in expression clause: [(not ?valid)]

marshall19:12:27

You may have to use an explicit not join

marshall19:12:35

Does your predicate just return true or false

marshall19:12:46

If so, I think you want to put your predicate call inside of the not

marshall19:12:59

No need for the valid variable

grzm19:12:05

That's promising. Now I'm just getting timeouts and 503s. This is something I can work with. Thanks!

grzm19:12:19

Any idea why sample works in solo and not in production?

marshall20:12:41

not immediately; we’ll look into it though

grzm20:12:26

Want me to open a ticket?

marshall20:12:37

sure, that’d be helpful

đź‘Ť 4
grzm16:12:02

New wrinkle:

(d/q '[:find ?e
         :in $ ?from ?until
         :where
         [?e :some/time ?t]
         [(<= ?from ?t)]
         [(< ?t ?until)]
         (not-join [?e]
                   [(com.grzm/valid? $ ?e)])]
       db from until)

grzm16:12:00

When the range of from/until returns a small set, it completes fine. When it returns a large set (just changing range), it fails with Unable to find data source: $__in__3 in: ($ $__in__2 $__in__3 $__in__4 $__in__5)

marshall16:12:49

Can you file a ticket with that info please

grzm16:12:42

Yup. Haven't done the one from yesterday. Same ticket or two?

grzm17:12:08

More follow-up: there was some data in the production database which was causing one of the subsequent queries within the database function to fail. Given the nature of the error messages, it wasn't obvious to me where in the stack the error was happening.

Aleed17:12:10

@val_waeselynck you mentioned that "Datomic will be challenging if you have a high write throughput or data size." Do you think datomic could work for a note-taking style app? i wanted to take advantage of point-in-time queries, but documents will have high data sizes

johnj17:12:40

Datomic doesn't do well with large strings, so much that in cloud they are restricted to 4096 chars.

johnj17:12:02

As @val_waeselynck said, datomic is not a bitemporal system, you should not rely on tx time to model time in your domain/business logic

johnj17:12:40

create your own time attrs

Joe Lane17:12:40

Remember @UPH6EL9DH, in cloud you have access to literally all of aws and their services. You could put a note at a point in time into s3 backed by cloudfront and store the reference to it in datomic. You can use cloudsearch or opendistro for searching as well.

đź‘Ť 4
Alex Miller (Clojure team)17:12:00

I don’t understand how Datomic is not a bitemporal system (if you use it that way).

Alex Miller (Clojure team)17:12:42

You have both transaction times and, if desired, attributes for event times, with the ability to query taking both into account

đź‘Ť 4
Aleed17:12:44

@U0CJ19XAM thanks for the tip about saving note documents to s3 hadn't considered that

johnj18:12:08

oh yeah you can, the question is if you should use datomic's history features for domain logic, in contrast to just use it for auditing/troubleshooting. https://vvvvalvalval.github.io/posts/2017-07-08-Datomic-this-is-not-the-history-youre-looking-for.html?t=1&amp;cn=ZmxleGlibGVfcmVjc18y&amp;iid=1e725951a13247d8bdc6fe8c113647d5&amp;uid=2418042062&amp;nid=244+293670920

val_waeselynck19:12:19

https://clojurians.slack.com/archives/C03RZMDSH/p1575999720198400?thread_ts=1575997270.193600&amp;cid=C03RZMDSH Because Datomic provides no support for expressive bitemporal queries, in the same way that MySQL et al provide no support for expressive temporal queries. Choosing to "use it that way" is not enough. Sure, you can encode bitemporal information in Datomic, but it won't be particularly practical to leverage it.

johnj17:12:03

Datomic does not provide a mechanism to declare composite uniqueness constraints - does this still holds now that there is composite tuples?

jaret18:12:03

Did you see this in the docs? Could you throw me a link. Because, you’re correct this is no longer true with the addition of composite tuples.

jaret18:12:26

NVM just saw your link.

johnj17:12:55

Ok, that sentence is still in the docs <https://docs.datomic.com/cloud/schema/schema-reference.html#db-unique-identity>

jaret18:12:07

Will correct. You and @U0CJ19XAM are correct. That is no longer true with the introduction of Composite Tuples.

đź‘Ť 4
Lone Ranger18:12:07

whoaaa there is composite uniqueness now? Am I hearing this correctly?

Lone Ranger18:12:38

if-so... huzzah!

Ike Mawira19:12:06

Hello, I am having trouble setting up Datomic as mentioned here, https://clojurians.slack.com/archives/C053AK3F9/p1576002374401100 , I get

ActiveMQNotConnectedException AMQ119007: Cannot connect to server(s). Tried with all available servers.  org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory (ServerLocatorImpl.java:799)
Any reason why I could be getting this error?

Ike Mawira19:12:59

Seems like the issue is a Netty library, when i run

(d/create-database "datomic:")
I get a warning,
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by io.netty.util.internal.PlatformDependent0 (file:/home/ike/Documents/softwares/datomic-pro-0.9.5697/lib/netty-all-4.0.39.Final.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of io.netty.util.internal.PlatformDependent0
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
ActiveMQNotConnectedException AMQ119007: Cannot connect to server(s). Tried with all available servers.  org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory (ServerLocatorImpl.java:799)
While in IntelliJ i get this extra info
WARNING: All illegal access operations will be denied in a future release
Dec 10, 2019 10:27:44 PM org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector createConnection
ERROR: AMQ214016: Failed to create netty connection
.ssl.SSLException: handshake timed out
	at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)

marshall19:12:23

@mawiraike are you running a transactor on your local machine?

Ike Mawira19:12:40

Yes, i got the

System started datomic:<DB-NAME>, storing data in: data
message so i think so.

marshall19:12:48

i would also recommend that you upgrade to a more recent version. that release is 1.5 years old

marshall20:12:27

if you’ve had this storage system running before, you may be hitting that change ^ with h2

Ike Mawira20:12:24

Okay, thanks @marshall, lemme update and see if it passes.

Jon Walch22:12:22

What would the datalog look like for "give me the top ten users with the most cash" I tried

{:query '[:find ?user-name (max 10 ?cash)
                     :in $
                     :where [?user :user/cash ?cash]
                     [?user :user/name ?user-name]]
            :args  [db]}

favila22:12:39

Datalog doesn’t do sorting or truncating. You would do this in two queries

favila22:12:15

or one plus a pull

Jon Walch22:12:33

but this gives me every user