Fork me on GitHub
#datomic
<
2017-09-10
>
cjhowe18:09:14

is it a good idea to use datomic as a graph database, or is it better to use neo4j? i know datomic doesn't have the same level of graph capabilities as neo4j, but it can still do most graph database operations, right?

cjhowe18:09:20

i like the capabilities of neo4j's cypher, but i would much prefer to use data. the biggest thing is i can't think of how to add properties to links in datomic...

cjhowe18:09:31

has anyone deployed datomic on heroku?

hmaurer20:09:03

@cjhowe so, regarding the first question: I am new to Datomic/Clojure but yes, Datomic seems like a great fit for exploring graphs

hmaurer20:09:26

it’s essentially a triple-store (well, 5-tuple store really if you account for the transaction id and operation)

hmaurer20:09:29

It really depends on what you want to do; neo4j’s cypher is very high-level, you might have to do a lot more work to run the same queries on top of Datomic

hmaurer20:09:41

but Datalog should already get you a long way

hmaurer20:09:15

and if you have more complex query needs you can build them on top of Datomic’s primitives

hmaurer20:09:09

You are right, you cannot add properties to links in Datomic (links = attributes)

hmaurer20:09:40

As fa as I aware, a standard approach would be to reify the link as a entity, which can then have attributes

hmaurer20:09:18

e.g. imagine you had a link between friends, e.g. :person/friends, an attribute with cardinality many and value-type ref

hmaurer20:09:51

now say you would like to store an attribute on that link, e.g. an integer “weight” which indicates how good a friend the person is

hmaurer20:09:04

Neo4j would let you add that attribute directly on the edge

hmaurer20:09:37

With Datomic you can’t do that, but you could “reify” that link as an entity, and have an attribute like :person/friendships

hmaurer20:09:15

which would point to Friendship entities, which would themselves have a :friendship/target pointing to the friend Person entity

hmaurer20:09:41

since Frienship is now an entity, you can add :friendship/weight, or any attribute you want

cjhowe20:09:00

i think the power of using plain data structures for queries makes up for some additional query complexity

cjhowe20:09:25

i'm more concerned about cost:performance ratio for graph heavy workloads

hmaurer20:09:52

@cjhowe yeah, I think it really depends on the type of queries you are doing to do. If you have very complex graph queries then something like Neo4j might do a lot of the optmization grunt-work for you

cjhowe20:09:54

then again, if it's reads it's cached on the client right?

hmaurer20:09:34

@cjhowe yes the peer keeps a cache. I’m not sure how it determines when to clear a portion of the cache and what to clear though (if it’s full)

cjhowe20:09:37

it seems like running the graph query mostly on the client would help make up for neo4j's extra optimizations

cjhowe20:09:59

it's a lot to give up immutability for query optimization

hmaurer20:09:56

It depends on your dataset and the type of queries but for most use-cases I suspect Datomic will work just fine

hmaurer20:09:28

@cjhowe to be honest, it might just be a lack of understanding/experience on my side, but I kind of wish there were a few higher-level abstractions built on top of Datomic

hmaurer20:09:07

For example, if your use-case is manipulating graphs, I wish there was library built on top of Datomic which offered Cypher-like querying capabilities, amongst other things

cjhowe20:09:16

shortest path queries are pretty common for me

cjhowe20:09:28

i think that could be added through a library too

hmaurer20:09:38

@cjhowe yes I’m sure you could write a function for that which uses Datomic’s low-level API (direct access to the index) to walk the graph

hmaurer20:09:54

How big is your dataset? out of curiosity

cjhowe20:09:24

idk, i'm making a study group app, so however many people use it i guess. it's kind of like tinder, and each one of those tinder-like matches has to find the path with the shortest total weight that starts at and ends with two different users who haven't matched before

cjhowe20:09:45

and then in addition to that, it's trying to choose matches that will cause complete subgraphs of 3+ nodes to appear

cjhowe20:09:13

that's for every time someone does a match, so it's very high volume

hmaurer20:09:21

@cjhowe I see, so I guess you won’t have such a large dataset that you would need to start writing fancy, optimized versions of your pathfinding algorithm

cjhowe20:09:29

not immediately

cjhowe20:09:28

then again, i don't want to shoot myself in the foot

hmaurer20:09:01

My (beginner, ignorant) approach would be: use Datomic, see how far it can get you. If you ever run into a case where Datomic isn’t enough and/or it’s too much work to build the feature you want on top of it, construct a read-only neo4j (or else) replica of your Datomic DB

hmaurer20:09:16

which should be doable with the direct access to the transaction log and the tx-report-queue API

cjhowe20:09:49

ohhh, that's a good point

hmaurer20:09:12

Datomic will store the history of your data, neo4j won’t

hmaurer20:09:19

so you have more information by using Datomic

hmaurer20:09:45

theoretically you can move to neo4j later, should you want to (ignoring the fact that it would be a pain to rewrite your app)

hmaurer20:09:57

or you can use neo4j as I mentioned, as a read-only replica for some queries

hmaurer20:09:02

should it become necessary

hmaurer20:09:27

the right term would probably be “use neo4j as a materialized view of your Datomic database”

cjhowe20:09:46

i like this

cjhowe21:09:10

thanks for the help! i'll probably do that, since this isn't really a problem right now and i want datomic

hmaurer21:09:25

There is also another benefit of using Datomic: since it’s lower-level, you’ll learn a lot more about how your graph traversals actually run (since you’ll likely implement them yourself).

hmaurer21:09:39

(if that’s something you care about)

cjhowe21:09:11

ah, yeah, that's great! i just took my last math class for my CS degree so i'm ready to deep dive into graph theory

cjhowe21:09:12

how do people transact their datomic schemas? is it best to use a boot/leiningen plugin, or should i transact it every time i start my app server with https://github.com/rkneufeld/conformity ?

hmaurer22:09:55

@val_waeselynck did some nice work writing about this

cjhowe22:09:26

i just read through that, thanks!

cjhowe22:09:42

i guess i just need to know how i should actually run the conformity code at deployment

val_waeselynck09:09:55

The basic idea is to re-transact your schema (idempotent) and run transactions (non-idempotent) prior to executing new code.

val_waeselynck09:09:28

Depending on the write semantics of your app, this protocol may be too naive and present some race conditions (for instance, when migrating the data from an attribute to a new attribute, some Peer may continue to write to the old attribute between the time the migration is run and the time the new code is deployed to the Peer).

val_waeselynck09:09:48

In which case you may want to either: 1. if that works, run the migration after new code is deployed to all online Peers, or in 2 phases 2. Temporarily prevent all Peers from writing 3. Move the write semantics from the Peer code to storage (e.g in the form of a transaction function), so that the switch can be atomic

cjhowe22:09:20

i mean, if i use conformity every time my api server launches, it's a bit of overhead, but it shouldn't do anything if the schema is already there

hmaurer22:09:03

@cjhowe it will still create a transaction

hmaurer22:09:17

but without any datoms

hmaurer22:09:39

it should be neglible overhead if you are worried about startup time though

hmaurer22:09:01

the JVM’s time to startup is likely an order of magnitude larger than the time to run a transaction for your schema

cjhowe22:09:30

okay, cool

cjhowe22:09:34

thanks again!

hmaurer22:09:13

@cjhowe oh also, regarding your earlier question on Heroku: I think the recommended memory for the transactor (and peers) is pretty high; in the order of 1gb or more

hmaurer22:09:31

I considered using heroku on a project but realised that would be prohibitively expensive

hmaurer22:09:03

you might be able to get away with lower memory on small projects / with the right configuration though; I haven’t investigated

cjhowe22:09:21

hmmm, well, maybe i'll use aws free tier then

cjhowe22:09:28

it seems like datomic was made for that anyways

cjhowe22:09:39

i hope 1GB is enough though

hmaurer22:09:46

@cjhowe right now I’m trying to run Datomic on Kubernetes on Google Cloud

hmaurer22:09:21

haven’t encountered significant issues thusfar, but I have only experimented over short periods of time, not under load and/or in production

cjhowe22:09:25

ah, then you have to set up a cassandra instance i guess?

hmaurer22:09:51

oh no, you can run the transactor in dev mode (it will store stuff on the filesystem)

hmaurer22:09:57

for production I’ll likely use mysql or postgres

hmaurer22:09:13

on AWS you can use Dynamo though

hmaurer22:09:48

I also managed to run a transactor on http://hyper.sh

hmaurer22:09:59

AWS will likely be the cheapest option though 🙂

cjhowe22:09:23

ah, well, in my case, i'm just worried about what i can get for very cheap/free since i'm a student

hmaurer22:09:10

I am a student too; I know the struggle!

hmaurer22:09:41

Google Cloud also gives you 300$ credits usable over a 12 months period

val_waeselynck09:09:11

The cheapest you can get for small projects is probably to run your transactor and storage (and maybe also Peers) on one box e.g using Digital Ocean, maybe using the dev storage to save even more memory

cjhowe22:09:46

i hope ad money will pay for it in the long run, but if not, i can just shut the backend down and take the app off the app store