datomic 2018-01-18 | Slack Archive

Oliver George05:01:44

Congratulations for launching Datomic Cloud!

Oliver George05:01:22

Also struggling with the "Authorize a Group" bit. Am I in the right place?

Oliver George08:01:47

I think I'm getting there. Copy in this section might benefit from a review.

Oliver George08:01:03

Yeah, totally works. Nice.

val_waeselynck08:01:39

I had not taken too much interest in Datomic Clients (and thus Datomic Cloud) so far because I assumed db.with() could not be supported - but I now see that it is. How does it work? Does it hold a with-ed database on the server-side, or does it re-apply the speculative transaction on each subsequent query?

marshall13:01:36

@olivergeorge glad you got it sorted

stuarthalloway13:01:26

@val_waeselynck Datomic Cloud holds the with db on the server

val_waeselynck13:01:43

@stuarthalloway interesting, how does this work w.r.t resource reclamation?

stuarthalloway13:01:27

Reclaimed when space needed

val_waeselynck13:01:07

@stuarthalloway same thing with acquiring the current db of a connection from a client?

stuarthalloway13:01:25

that can always be recovered via a filter

val_waeselynck13:01:21

@stuarthalloway not really, you can't db.with an asOf db properly

val_waeselynck13:01:37

nor can you emulate it properly with a filter AFAICT

stuarthalloway13:01:07

You can’t db.with an asOf db at all, that is not supported in any version of Datomic.

val_waeselynck13:01:34

@stuarthalloway let me ask a bit differently: 1- from a client, when using conn.db(), to what extent can I rely on the returned value not being deleted from under me? 2- same question with db.with().

stuarthalloway13:01:04

normal db value cannot go away, always recoverable via filter

stuarthalloway13:01:12

with value can be dumped from the cache

stuarthalloway13:01:58

once we release query groups you could have 1 or more groups of machines dedicated to with queries, so they are not competing with other uses of the system

stuarthalloway13:01:35

or dedicated to any other read task you wish to isolate, for that matter

val_waeselynck13:01:01

> normal db value cannot go away, always recoverable via filter @stuarthalloway and this recovery is automatic, right?

stuarthalloway13:01:46

yes

val_waeselynck14:01:00

Got it. So no long-lived with'ed dbs, at least not in production

stuarthalloway14:01:57

We will update the docs to make this more clear. Thanks!

souenzzo15:01:48

I think that i dont understand one thing about client api

(let [db (d/db conn) ;; basis-t = 42
      data (d/q MY-QUERY-1 db)
      tx (long-computation-10h data)
      {:keys [db-after]} (d/with db tx)]
  (d/q MY-QUERY-2 db-after))

- how do peer know that it cant "free" the basis-t 42? - if the peer free the basis-t 42, then receive it can recover it by basis-t, but will not be allowed to do the with.. how peer/clients talk in this case?

stuarthalloway16:01:19

@U2J4FRT2T I don’t think I understand the question

val_waeselynck16:01:30

I think the example shows that d/with can legitimately be called on a db value that may or may not be resolved using asOf on the server side

stuarthalloway16:01:06

@val_waeselynck will investigate, thanks

val_waeselynck16:01:15

(secretely hoping that this will result in filing a bug leading to db.with() being supported on asOf dbs) 😛

souenzzo16:01:32

The question can be: - (d/db conn) on peer returns basis-t 42 - a client, that was which is operating a db on t=20, request to this peer to do a d/with over the t=20. The peer will use (d/as-of db 20) or will use some other "internal dark magic"? If it uses d/as-of, it will not be allowed to do the with requested by the client.

stuarthalloway16:01:06

@U2J4FRT2T got it, will get back to you

stuarthalloway16:01:08

thanks!

souenzzo12:01:12

Sorry, but I'm really curious about that. 🙂

chrisblom14:01:18

does anyone know of a library for managing users & permissions using datomic?

roklenarcic14:01:07

quick question: I've noticed that Cloud variant is offering a different feature set than OnPrem. Are you planning to make two very divergent products? I see that you're excited by cloud, but there are some applications where AWS is not an option.

stuarthalloway14:01:56

divergence is not an objective 🙂

stuarthalloway14:01:19

but AWS provides a much richer shared baseline on which to build

stuarthalloway14:01:12

so there will be continue to be differences

roklenarcic14:01:32

I understand that there will be options (like CloudSearch integration), which integrate with services offered by AWS, so obviously you can't use those without access to AWS.

potetm14:01:33

Is there any chance that on-prem will get the “generic node” notion? (i.e. no explicit txor needed, per-database dispatching for a node group)

val_waeselynck14:01:05

I do hope the Peer model will continue to be well supported though. In my case / opinion, that's where most of the leverage lies, and I don't think I would have made the switch to Datomic if there were only clients (however comfortable Datomic Cloud makes them).

potetm14:01:43

Yeah my perception is the same. I’ve gotten a lot of leverage from on-board client caching/lazy entity crawling.

potetm14:01:07

But I’ve not used the Client API. Perhaps the difference isn’t as stark as I imagine.

potetm14:01:53

But the “node” thing, multiple durable locations, and encrypted at rest are all pretty rad. Would love to see at least a few of those end up in on prem.

stuarthalloway14:01:52

@potetm yes, On-Prem may get nodes, encryption at rest, etc.

stuarthalloway14:01:21

and conversations like these help us prioritize, thanks!

stuarthalloway14:01:10

and Cloud will have a more complete story for keeping code and data colocated, although not necessarily the peer model

johnj15:01:23

for cloud solo, why only two options for the node? (t2.small and i3.large) $30 vs $224, more middle ground would be nice, also, why an i3.large, what is the NVMe SSD used for?

marshall15:01:30

solo uses only a t2.small production uses only i3.large instances

marshall15:01:54

the configuration on the marketplace page that shows them both is a limitation of how Marketplace listings work

johnj15:01:08

ah i3.large shows as an option for solo

johnj15:01:12

marshall15:01:27

the SSD provides a large local cache

johnj15:01:44

oh, per the docs I though only EFS was used for that

cch115:01:51

Is the datomic-socks-proxy the only means of accessing datomic in the cloud? We use a VPN tunnel to connect to our VPC when developing locally -it supports accessing AWS services transparently as though our local machine were in the VPC. Having to run the SOCKS proxy as well seems like a waste.

cch115:01:54

Docs say To run Datomic Cloud, currently you must have an AWS Account that supports only EC2-VPC in the region in which Datomic Cloud runs. When is that requirement expected to be lifted?

marshall15:01:59

as long as you can resolve the endpoint that should work

cch115:01:05

OK. That is promising.

timgilbert16:01:11

Say, random question but can anyone point me to some good open-source datomic databases / schemas similar to the mbrainz data set used in the tutorial?

timgilbert16:01:05

I did find the seattle sample data included with the distro, too...

stuarthalloway16:01:01

@timgilbert there are a bunch of small examples at https://github.com/cognitect-labs/day-of-datomic-cloud/tree/master/tutorial

uwo16:01:52

could running backups somehow cause this failure on the transactor?

Critical failure, cannot continue: Critical background task failed
ActiveMQInternalErrorException[errorType=INTERNAL_ERROR message=AMQ119000: ClientSession closed while creating session]

uwo16:01:58

(on-prem)

stuarthalloway16:01:36

@uwo running backups can overwhelm your storage depending on config

stuarthalloway16:01:00

particularly if you are running on DynamoDB which is provisioned

uwo16:01:35

that backup is continuing, it’s only the transactor that’s falling over with a failed heartbeat. we’re using sqlserver as a store

stuarthalloway16:01:17

I haven’t seen sqlserver get so overwhelmed that this problem happens

uwo16:01:46

thx!

stuarthalloway16:01:10

@uwo if heartbeat interval goes wonky just before failure, be suspicious of storage

marshall16:01:38

@cch1 We originally endeavored to have the system work with both EC2 Classic and EC2-VPC. There is an inherent limitation between EC2 Classic and CloudFormation. We raised this issue with AWS Support and received the following response: “In EC2-Classic, Fn::GetAZs returns all the available AZs including the ones that you do not have access to. In EC2-VPC, Fn::GetAZs returns only AZs you have access to and which have a default subnet. So if a customer removes all but one of their default subnets, Fn::GetAZs will only return the AZ where the remaining subnet resides. Then if you try to use Fn::Select to get the second and third subnets, you will get an error because Fn::Select will try to reference an index that doesn’t exist. This is the downside of this approach. Unfortunately, i checked internally and we do not seem to have a workaround in place to fix this. So if you have a mix of EC2-Classic and EC2-VPC enabled for your account this approach may not be ideal for you” The CFTs we provide for Marketplace are generic and use discovery to set up VPCs/AZs. We’re considering options for you. I’ll get back to you by next Wed.

Desmond16:01:27

Does anyone have any idea why I might be seeing a 10-15 second response time for the first request after restarting my Peer? After that response times are sub-second. I have a small dataset so I'm trying to distinguish between whether the initially slow response is due to the cache not being full yet, which would be concerning as the data grows, or due to the Peer establishing a connection with the Transactor, which I wouldn't really care about.

favila17:01:59

The very first d/connect call for a db always takes a few seconds and appears to be a fixed cost in my experience

favila17:01:31

well, it varies by network speed

favila17:01:12

I suspect part of what is happening is transferring the tx-log since the last index

Desmond18:01:27

ok, good to know. I won't sweat it.

Desmond16:01:21

running on dynamo

stuarthalloway16:01:49

@captaingrover peer has to reload your database

stuarthalloway16:01:40

that is a bounded cost, won’t get bigger as data grows

stuarthalloway16:01:31

in a load-balanced deployment setting, you could load databases you want hot in the peer before telling the load-balancer you are ready for requests

Desmond16:01:07

@stuarthalloway great! that's what I wanted to hear.

Desmond16:01:42

I don't need the fancy load-balancer setup yet but I will definitely keep that in mind

luchini17:01:01

Anyone else having the following problem when creating a Datomic Cloud Solo?

luchini17:01:04

> The following resource(s) failed to create: [ExistingS3Datomic, ExistingTables, ExistingFileSystem, EnsureEc2Vpc].

marshall17:01:29

@luchini are you able to see any additional details from the CFT errors?

luchini17:01:23

@marshall this is the whole error:

luchini17:01:41

The following resource(s) failed to create: [StorageF7F305E7]. . Rollback requested by user.
Embedded stack arn:aws:cloudformation:us-east-1:332243152968:stack/datomic-cloud-solo-test-StorageF7F305E7-1NVIQEOD7DDRO/7036b990-fc74-11e7-be75-500c28635c99 was not successfully created: The following resource(s) failed to create: [ExistingS3Datomic, ExistingTables, ExistingFileSystem, EnsureEc2Vpc].

marshall17:01:15

Ah. So you can go look at the nested Storage Template and see if it reports an issue

luchini17:01:16

Is there a way to see the Storage Template logs separately? (I’m a complete noob in CloudFormation)

marshall17:01:11

under failed & deleted stacks

marshall17:01:16

you can see the storage one

luchini17:01:19

I did found the Storage Template itself (from your S3) and was reading it through… but it will take me on a tangent 🙂

marshall17:01:14

You should see the Storage stack in the CloudFormation stack list

marshall17:01:27

https://console.aws.amazon.com/cloudformation/home

luchini17:01:35

Found it. It says the failure log is on CloudWatch

marshall17:01:36

it may say Failed or RolledBack in status

marshall17:01:48

then go to Events

marshall17:01:53

and you can find the first event that failed

luchini17:01:16

ExistingS3Datomic Failed to create resource. See the details in CloudWatch Log Stream: 2018/01/18/[$LATEST]d0fb81bad8dd4f83917b5428f051f03f

marshall17:01:35

have you tried to launch unsuccessfully prior to this attempt?

marshall17:01:42

or succesfully for that matter?

luchini17:01:53

yup… a few times… same result always

luchini17:01:29

I can’t find any left over S3 bucket from the previous runs

marshall17:01:54

so i suspect that the first time you had some kind of failure, but now when you try to re-create with the same name you’re hitting a separate issue; if some of the parts of the system were created the first failed attempt they may interfere with creating one with the same name

luchini17:01:41

Even if I can’t find those resources? Like in some kind of delayed naming cache

luchini17:01:20

You are partially right @marshall. Tried with a different stack name and it got me a bit further but still failed with The following resource(s) failed to create: [EnsureEc2Vpc].

marshall17:01:30

aha!

marshall17:01:37

one sec

luchini17:01:46

^ in the storage template

marshall17:01:50

https://docs.datomic.com/cloud/setting-up.html#aws-account

marshall17:01:16

as was discussed above with cch1, Datomic Cloud requires EC2-VPC

marshall17:01:46

for now you can either create a new AWS account or start up in a new region in the same account that supports EC2-VPC

luchini17:01:58

ah…. the keyword here is “only EC2-VPC”, correct?

marshall17:01:03

yep

luchini17:01:04

because I assumed my region did support EC2-VPC (but it also has classic for some old stuff that has not migrated yet)

marshall17:01:43

right; if it’s a classic it won’t work

luchini17:01:49

and now I see in the template FailIfEc2ClassicLogGroup 😄

luchini17:01:07

thanks @marshall. This was tremendously helpful

marshall17:01:09

yep; i suspect that log would say the same thing as I just did ^

marshall17:01:12

no problem

luchini17:01:14

tip here: maybe rename EnsureEc2Vpc to EnsureEc2VpcOnly (I know, a bit pedantic, but semantics always help 😄 )

luchini17:01:52

congrats to the Cognitec team… we are stoked over here with Datomic Cloud

donmullen20:01:19

@marshall I have a large import I’d like complete to Datomic Cloud. Using local peer against local data store - I can configure transactor to speed things up a bit - and I understand one can adjust DynamoDB for import using On-Prem. Are there similar adjustments that can be made for Datomic Cloud? I’m a new to CloudFormation - but if you have pointers to docs - or advice that’s be appreciated. Local import currently takes several hours to a datomic:dev database - and I’m doing all the batching/pipelining that is recommended for large imports.

stuarthalloway20:01:59

@donmullen Cloud will autoscale DDB for you, so the import will start slow and speed up, then scale DDB back down automatically

donmullen20:01:25

@stuarthalloway magic. Congrats on the launch.

stuarthalloway20:01:35

@donmullen Cloud will make much less use of DDB than an equivalent On-Prem import

stuarthalloway20:01:47

no DDB writes for indexing

donmullen20:01:51

Timing was perfect - was just getting ready to spin up On-Prem on AWS.

stuarthalloway20:01:11

@donmullen how big is the dataset? Do you hope to stay in the Solo topology?

donmullen20:01:56

Hoping to stay for short term as we work on queries and data analytics - but the data is very large - millions of rows five primary data sets and about 6 GBs of raw data from csv’s for import data. That something Solo can handle?

donmullen20:01:28

@stuarthalloway just did a restore to local dev database from backup - it’s 12 GB in datomic/data.

donmullen20:01:44

Got thru video one - on to video two… 🙂

stuarthalloway20:01:35

@donmullen I have imported full mbainz (100 million datoms) into Solo. It takes a while, especially after the AWS burst ends and you get only a fractional CPU

donmullen20:01:06

@stuarthalloway How long does mbainz take running locally against datomic:dev storage? Then compared to Solo? My import locally takes about 7.5 hours (I don’t create the indexes until after the import).

donmullen20:01:01

I’m assuming I should leave indexes off the schema until post data import.

stuarthalloway20:01:54

ever since we added adaptive indexing (http://blog.datomic.com/2014/03/datomic-adaptive-indexing.html), the index thing matters less

stuarthalloway20:01:36

@donmullen Cloud always builds all indexes, :db/index is not even a thing there

donmullen20:01:59

@stuarthalloway interesting

stuarthalloway20:01:18

@donmullen your local setup likely has more CPU horsepower than i3 large (Prod), which in turn has way more CPU than t2 small

stuarthalloway20:01:06

so if you are already sweating making imports faster, you will likely end up on Production

donmullen20:01:29

yeah - I figure we’ll land there - though once we get the import done ‘right’ we won’t be doing that much (at all?) - and will be incrementally adding data on a weekly basis to the core data set.

fingertoe21:01:15

I am having trouble getting the datomic-socks-proxy to work. It gives me the error “Datomic System not found” although running the ‘aws ec2 describe-instances’ query seems to find it without issue..

fingertoe21:01:21

I also get an “aws: error: argument command: Invalid choice, valid choices are:” prior to the datomic sys not found error.

stuarthalloway21:01:47

@fingertoe you need a newer AWS CLI client

fingertoe21:01:43

@stuarthalloway That did it! Thanks…

2018-01-18

Channels