Fork me on GitHub
#datomic
<
2016-04-26
>
conaw12:04:29

Anyone have experience allowing end users to define attributes — and even further — allowing end users to define queries using those attributes?

Lambda/Sierra12:04:12

@conaw: I know that people do such things. You have to consider how much you want to trust your users — Datomic's datalog queries allow arbitrary code execution.

Lambda/Sierra12:04:35

User-defined attributes are pretty safe, as long as the names don't clash.

Lambda/Sierra12:04:28

Also, keep in mind that all idents are kept in memory all the time.

conaw12:04:28

Can you have the user defined attribute stored as an identity — and map it to some generated attribute name — that way the user doesn’t deterimine what the actual query is?

conaw12:04:42

sorry, stored as an entity

Lambda/Sierra12:04:54

Attributes are entities already.

conaw12:04:27

so presumably you could create a new entity when the user is creating some custom relationship, 'user/custom-attr and then generate a unique attribute based on that, which would be the actual entity

conaw12:04:06

I may not have explained that clearly, but the idea is that you limit what queries a user can actually run, and avoid the risk of conflicting attribute names, by having some intermediary entity sitting between what the user thinks the attribute is called and what the actual attribute is that ties an entity to a value

bvulpes17:04:33

(d/q '[:find (count ?e) :in $ :where ... ] db)) returns nil if there are no entities matching, and not zero. am i misusing count?

Ben Kamphaus18:04:22

@bvulpes: at present, aggregates return nil instead of 0 in cases where there are no matched datoms. We are considering requests around changing this behavior, but for now you'll need to check for nil case manually if you want to return 0. Note this has to do with aggregates in general so you can't e.g. use a workaround with a custom aggregate with that same query.

bvulpes18:04:48

@bkamphaus: thank you for the clarification!

p.brc19:04:14

@bkamphaus: I managed to reproduce the issue mentioned a while ago in this channel: datomic seems to ignore the data-dir setting for indexing jobs when setting it via the property file. I have put together what I think is a minimal example that reproduces it here: https://github.com/pebrc/datomic-datadir-issue

Ben Kamphaus20:04:50

@p.brc: thanks for putting together the repro and report, I’ll look into it.

currentoor21:04:36

If I want to store blobs of data as bytes, how big is too big?

currentoor21:04:01

Is under 1MB (when serialized as edn) acceptable?

Lambda/Sierra21:04:11

recommended max value size is 1 KB

therabidbanana22:04:16

@stuartsierra: what's the reasoning behind that recommended size? Are there instances where it might be safe to go over? If you can't store blobs bigger than 1kb, any recommended approaches for how to handle it separately?

Lambda/Sierra22:04:32

The way Datomic stores values in its indexes. Index segments in storage are 40-60 KB in size. Large values would bloat the segment sizes.

Lambda/Sierra22:04:12

You can always go over, it's not a hard limit, but performance will degrade with a large quantity of large blob values.

Lambda/Sierra22:04:26

Instead, use a separate blob store and keep just metadata in Datomic.

therabidbanana22:04:11

I see - so even if we don't want any indexes on the blob values, they still can bloat the indexes?

Lambda/Sierra22:04:38

yes, the values still have to be stored. All values are stored in indexes.

Ben Kamphaus22:04:31

even if avet/indexed isn’t turned on, eavt, aevt are covering indexes of segments for all values in Datomic.

therabidbanana22:04:00

I see - thanks for the additional details

therabidbanana22:04:26

Are there any recommendations for ways to store larger blobs that integrate well with datomic holding the metadata? Essentially we just need a key-value DB that's easy to query and join onto datomic data I guess?

bvulpes22:04:25

@therabidbanana: for maximum ops simplification, use a postgres data store and a field of blobs in there

bvulpes22:04:48

then you'll have datomic_kvs and therabidbananas_blobs

bvulpes22:04:12

uuid up and you're done

therabidbanana22:04:53

That's basically what we're thinking of doing, though we had planned on using Cassandra as the datomic datastore (it's what we've used in another project)

Ben Kamphaus22:04:30

Yeah, it can be reasonable to just use the underlying storage, or on aws s3 might be preferred over dynamo. Also if you’re transacting a lot of blob data to the underlying storage it could impact performance, so take the volume you expect to be transacting into consideration.

therabidbanana22:04:43

I had heard that Postgres was not a preferable storage backend for Datomic if we can avoid it - not sure if that's still the case?

Ben Kamphaus22:04:23

@therabidbanana: not preferable for what reasons? The storage choice is really contingent on your familiarity and use case.

bvulpes22:04:10

@therabidbanana: i asked some cognitect staff about data store selection criteria and the answer was "budget first, then familiarity. if you can swing the price of DDB, use that."

bvulpes22:04:06

to that end, i use pg as a datastore. granted, the stuff i do is pretty low-throughput, so i've not yet hit performance ceilings or the like, but provisioned reads and writes on ddb get priiiiicey quickly.

bvulpes22:04:17

plus, mother postgres can do no wrong!

therabidbanana22:04:21

@bkamphaus: heard it from @currentoor, so not sure of the exact details - but apparently someone at Cognitect advised against it since we also had familiarity with Cassandra? Maybe that was because our other use case was more likely to have high amounts of writes though.

Ben Kamphaus22:04:44

Scaling up to high throughput would point to Dynamo, Cassandra, etc. yeah. If you’re familiar with Cassandra then it’s less risky to take on.

currentoor22:04:47

yeah I believe we were told cassandra or dynamoDB is preferred because of write scalability, @marshall I believe you mentioned this on the phone

therabidbanana23:04:22

This database, especially without blobs being stored in it, is much less high-throughput on the write side than our other use case though

Ben Kamphaus23:04:42

Yep, given an expectation to scale writes that’s reasonable. Just want to clarify that’s only indicated by scale and not a generic recommendation to avoid Postgres by any means.

therabidbanana23:04:16

Maybe we could swing consolidating in Postgres - we're planning on using it as a store for Quartzite (http://clojurequartz.info/articles/guides.html) already anyway.

ambroise23:04:20

hi I’m trying to get a transactor running in a Docker container, linking it to a mysql database that I am running locally. Things seem fine when I docker run my image: in mysql, when I query, I get

mysql> select * from datomic_kvs;
| id | rev | map | val |
| pod-coord |    9 | {:key "[\"192.168.1.95\" nil 3306 \"QfiEuYt70Bt3Qy7JVPuuW47I4uLze8+jKUCAcrrXCAI=\" \"uIi+Qhy2RQPs8JHqb6pChvuEWoQTeK0S26hPDrjlcNM=\" 1461711438104 \"0.9.5350\" true 2]"} | NULL |
1 row in set (0.00 sec)
in the repl, I then try to create a database, runing (datomic.api/create-database "datomic:) and i get
HornetQNotConnectedException HQ119007: Cannot connect to server(s). Tried with all available servers.  org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory (ServerLocatorImpl.java:906)
ERROR org.hornetq.core.client - HQ214016: Failed to create netty connection
java.nio.channels.ClosedChannelException: null
	at org.jboss.netty.handler.ssl.SslHandler.channelDisconnected(SslHandler.java:649) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:396) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.channel.socket.oio.AbstractOioWorker.close(AbstractOioWorker.java:229) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:104) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:51) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.6.7.Final.jar:na]
	at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:175) ~[netty-3.6.7.Final.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_74]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_74]
Any ideas on where I am missing something? Also subsidiary question, is the Peer connecting to the transactor or is it the other way around? thanks a lot for any help!

bvulpes23:04:40

@ambroise: peer connects to transactor

bvulpes23:04:54

ensure that you have a clear network path to the transactor from wherever the repl is

ambroise23:04:15

hi @bvulpes, thanks for your help. I’m actually not sure how the repl (peer) gets the network path to the transactor. Is it supposed to be in the mysql database?

ambroise23:04:16

right. because right now, in the mysql db, I have 192.168.1.95 and 3306, which points to the database (and not the docker container)

bvulpes23:04:43

sounds like you have host=<mysqlhost> in your transactor.properties

bvulpes23:04:15

gimme a cookie

bvulpes23:04:52

(i have spent hours automating transactor.properties lately is all)

ambroise23:04:43

thanks a lot!

kenbier23:04:30

can reverse attribute navigation be used in the :db.cardinality/many situation? something like :foo/_bar [vector-of-lookups-refs], when creating a new foo entity.