Fork me on GitHub
#datomic
<
2018-04-25
>
cjohansen05:04:01

@rboyd in project.clj:

cjohansen05:04:05

:repositories {“” {:url “”
                                   :username [:gpg :env/datomic_username]
                                   :password [:gpg :env/datomic_password]}}

👏 4
cjohansen05:04:33

then set DATOMIC_USERNAME and DATOMIC_PASSWORD in your CircleCI environment variables

cjohansen05:04:32

(Build settings -> environment variables)

val_waeselynck10:04:58

For those who worry about GDPR: this Gist demonstrates an alternative to Excision for erasing data from Datomic. Hope this helps, feedback welcome. https://gist.github.com/vvvvalvalval/6e1888995fe1a90722818eefae49beaf

👍 4
😞 8
octahedrion15:04:07

count me among those worried

val_waeselynck15:04:42

@octo221 I take it you are not happy with this solution?

octahedrion15:04:56

it's at least a solution

4
octahedrion15:04:01

but I don't understand why cloud doesn't support excision since it's such an important feature

octahedrion15:04:15

given that on-prem does

val_waeselynck15:04:59

Neither do I. I will soon publish an article which should help alleviate the lack of Excision on Cloud.

octahedrion15:04:24

have you thought about the possibility of using multiple DBs as an alternative too ?

octahedrion15:04:36

a db per user

val_waeselynck15:04:48

No, my approach is rather a complementary mutable KV store, turns out you can get a loooong way with that.

octahedrion15:04:53

oh wait i just saw another thead on that!@

octahedrion15:04:10

hmm i want datomic only

robert-stuttaford10:04:28

@val_waeselynck what’s the biggest database you’ve used this on?

val_waeselynck10:04:57

@robert-stuttaford BandSquare's, about 500k txes and 37M datoms, took about 4 hours to complete on dev storage on my local machine (note that this does not mean 4 hours of downtime).

val_waeselynck10:04:44

Note that pipelining could theoretically used to speed things up (as soon as you're confident there won't be errors), but I could not get it to work. Unfortunately, I suspect this is a Datomic concurrency bug, but have not worked yet though a minimal repro.

robert-stuttaford10:04:54

i think it’s important to mention the implications on your gist, @val_waeselynck that this is a much slower process than excision - similar to replacing an engine in a car, rather than removing a tiny piece while it’s driving

robert-stuttaford10:04:16

i wonder how long it’d take to process our 72,891,554 txes

val_waeselynck14:04:54

From my measurements, you probably won't do better than 30k tx/min

dominicm12:04:29

Why do this, if it's slower than excision?

val_waeselynck12:04:55

@U09LZR36F I tried to explain it in the Gist - please tell me if it's not clear?

dominicm12:04:56

Sorry, I should have read the gist closer, you're right, thank you

cjohansen12:04:26

My intention for new systems is to try to avoid this problem by designing around it - using multiple databases

val_waeselynck12:04:28

Sure, on the other hand you may not always get things right upfront 🙂 (I know I haven't) in which case you will probably need a safety net

dominicm12:04:39

@U9MKYDN4Q you mean a mutable one for personal data?

cjohansen12:04:48

Possibly, but not necessarily. Could also be interesting to use multiple datomic databases - maybe even a database per user + one that has all the interconnections. Won’t work in all circumstances though

4
dominicm12:04:28

Deploying a transactor per-user sounds expensive (in hardware)

cjohansen12:04:17

You can have multiple databases on the same transactor

dominicm12:04:34

Interesting, I did not know that

dominicm12:04:46

is there a significant overhead?

cjohansen12:04:26

Actually, I think maybe this depends on storage backend. Don’t quote me on it 🙂

cjohansen12:04:34

It’s something I’ve been meaning to look into anyway

robert-stuttaford14:04:08

totally can have multiple databases on a transactor, just like you can have multiple dbs on a mysql server or mongo server

robert-stuttaford14:04:33

there is peer memory overhead for each database of course

folcon15:04:41

Is there an issue with entity id collisions when querying across databases?

val_waeselynck16:04:04

@U0JUM502E no, since in such cases you need to specify explicitly in which db you are matching a particular Datalog clause.

folcon16:04:59

@val_waeselynck Not sure I understand, say you do a query such as

[:find ?e ?like
 :in $db1 $db2
 :where [e :user/likes ?like]]
wouldn’t you just get a mix of entity id’s?

val_waeselynck16:04:33

You'd have to write it as

[:find ?e ?like
 :in $db1 $db2
 :where [$db1 e :user/likes ?like]]

val_waeselynck16:04:57

I think Datalog simply won't let you do what you suggested

timgilbert15:04:50

Say, I have a question about rules. Is it possible to have variables that only exist inside of the rule be exported to the calling query? Eg, if I have something like this:

(def rules '[[(tracks ?artist)
              [?artist :artist/albums ?album]
              [?album :album/tracks ?tracks]]])
...can I then access the ?tracks value outside of the rule, or bind it to another var or something?

favila15:04:39

if you want an "out" parameter, add it to the rule

timgilbert15:04:29

Ah, so input to a rule doesn't need to already be bound to something?

favila15:04:41

'[(tracks ?artist ?track)
  [?artist :artist/albums ?album]
  [?album :album/tracks ?track]]

favila16:04:03

no, that would be pointless

favila16:04:11

well not pointless completely

favila16:04:31

but it would mean that rules could only serve like predicates or filters

favila16:04:43

rules are actually constraint specifiers

timgilbert16:04:49

Right, that makes sense. I'll mess around with it, thanks!

favila16:04:13

if you want to require a parameter to be bound (sometimes important for performance), surround the arguments with a vector

favila16:04:28

'[(tracks [?artist] ?track)
  [?artist :artist/albums ?album]
  [?album :album/tracks ?track]]

favila16:04:45

that means this rule can only run "in one direction" from a bound artist to an unbound track

timgilbert16:04:53

Ah, ok. I think I was confused about what that syntax meant

favila16:04:55

but a rule can run "backwards" too

favila16:04:08

so the rule name is bad, it's really describing a constraint you want to satisfied among all rule parameters, not input-output

favila16:04:43

a name like artists-tracks might make that clearer

favila16:04:20

but both 'tracks for artists' and 'artists for tracks' are valid names, because the rule expresses both

timgilbert16:04:47

I see what you mean. artist-tracks would make sense if I used the vector args to make the artist required, yes?

favila16:04:38

I was hoping the name expressed the bidirectionality better (i.e. not with a required-bound arg)

favila16:04:50

you see how it's hard to name a rule

timgilbert16:04:55

Gotcha, yeah

favila16:04:12

but "tracks" definitely works if artist must be bound

favila16:04:37

its when either none or one or both can be bound that it's hard to name the rule

favila16:04:10

you have to name the constraint the rule itself expresses, not the "output" (because there isn't really any)

octahedrion18:04:02

if you

d/delete-database
is there any way to get it back ?

favila19:04:06

If gc-deleted-dbs hasn't been run yet, the blocks are still in storage

favila19:04:35

in theory you could shut down the transactor and manipulate the proper values in storage to "resurrect" it

favila19:04:59

but there's no cognitect-blessed way to do it

favila19:04:13

you're deep into undocumented internals at this point

marshall19:04:43

@octo221 Datomic On-Prem or Datomic Cloud?

marshall19:04:13

Can you issue a support ticket to the portal at http://support.cognitect.com

octahedrion19:04:45

@marshall I haven't done it, I was wondering

marshall19:04:19

ah. It may be possible to recover some aspects of the DB, but don’t count on it

octahedrion19:04:03

actually I was hoping that delete meant delete

octahedrion19:04:05

presumably dbs in cloud are stored in S3

marshall19:04:25

s3, dynamodb, efs

octahedrion19:04:20

and delete-database would delegate deletion to whatever AWS deletion mechanism they have

octahedrion19:04:12

meaning it's out of datomic's hands right ?

marshall19:04:10

I don’t believe the individual segments are removed from storage

marshall19:04:16

the db is removed from the catalog

marshall19:04:31

the same as mentioned by @favila for Datomic On Prem

marshall19:04:40

they’re no longer accessible, however

favila19:04:57

cloud will eventually GC them, or no?

marshall19:04:18

good question - let me get back to you

favila19:04:20

(I'm only familiar with on-prem; there you have to schedule the GC yourself)

favila19:04:43

bottom line is delete-database means the db becomes api-inaccessible, but does not guarantee that all bits that back the db were erased.

favila19:04:12

on on-prem, there is a separate process that does that, which you run at will. not sure yet what cloud does, but it's probably a similar process

marshall19:04:28

agreed on both counts and I’m looking into the specifics of cleanup in Cloud

favila19:04:11

not to be too pedantic about it, but "bits are erased" is itself just a guarantee that whatever storage-level api you have cannot access them anymore. e.g. with an sql storage, you may still need to vaccuum to remove the bits from the db's storage; and then you may need to write over the blocks on disk; etc

favila19:04:17

it's apis all the way down

marshall19:04:26

yep, very good point

marshall19:04:08

that’s one of the things that I suspect is going to make the GDPR stuff so hard to enforce/define/resolve

octahedrion19:04:12

I thought that if you issued a 'delete' command to an AWS service, then it's Amazon's responsibility to ensure that the deletion is done correctly

favila19:04:19

I don't know what guarantees they make. At a minimum that is a guarantee that a "read" of that same item will not succeed via the s3/dynamodb/whatever api

favila19:04:30

but perhaps some lower-level api could still read it

favila19:04:46

i.e. it's not necessarily a guarantee that the bits are obliterated

octahedrion19:04:55

and of course they could always be copying everything anyway

octahedrion19:04:07

you wouldn't be to know

octahedrion19:04:05

if you wanted to be sure, could you chase the individual segments and delete them ?

marshall20:04:48

cleanup of deleted dbs is automatic in Cloud

stijn21:04:25

is it possible to launch elasticbeanstalk instances in the Datomic Cloud VPC and have the ELBs running in the same VPC? Or are is it always needed to setup vpc peering when working with beanstalk? (i'm a bit in the dark on the AWS concepts)

stijn21:04:29

I tried running elastic beanstalk in the datomic apps security group, but it keeps on insisting that this security group does not exist (probably i'm missing some other configuration option. subnets?)