Fork me on GitHub
#datomic
<
2016-08-30
>
fenton00:08:43

out of the blue i'm unable to connect to datomic. getting: ConnectException Connection refused java.net.PlainSocketImpl.socketConnect (PlainSocketImpl.java:-2)

fenton00:08:58

its running, the port is open/listenning

magnars10:08:21

That was a great interview you did on the defn podcast btw, @robert-stuttaford šŸ™‚

robert-stuttaford10:08:10

why thank you -bends a knee, tips hat-

karol.adamiec11:08:34

transactor autoscaling is terminating my instance and then starts another one. and then stopps it and again forever. any ideas what could go wrong and how to get more data?

karol.adamiec12:08:05

after further digging here is a log from instance that stopped: 'user-data: inflating: datomic-pro-0.9.5390/README-CONSOLE.md user-data: pid is 1879 user-data: ./startup.sh: line 26: kill: (1879) - No such process /dev/fd/11: line 1: /sbin/plymouthd: No such file or directory initctl: Event failed'

karol.adamiec12:08:26

seems like issue with startup.sh, but unable to see whats inside there atm.

robert-stuttaford13:08:06

@jaret hi šŸ™‚ are you around, and able to help with a transactor downtime analysis?

jaret13:08:20

@robert-stuttaford absolutely. Whats up? or should I say down?

robert-stuttaford13:08:36

-grin- it's up again, but i'd really like to get a LOT better at analysing root cause

jdkealy13:08:40

when i get the message "Critical failure, cannot continue: Heartbeat failed" how can i find out what the failure was? it's happening every time i try to restart my transactor

marshall13:08:38

@jdkealy Heartbeat failed indicates that the transactor canā€™t write to storage. What storage are you running on?

jdkealy13:08:32

it happens before anything even connects to the app

jdkealy13:08:42

i mean... it happens before the app connects to datomic

marshall13:08:01

during transactor startup, yeah. Are you using our provided cloudformation scripts/etc?

jdkealy13:08:19

yes. it was working for over a month

marshall13:08:14

what changed when you started seeing heartbeat failures?

jdkealy13:08:43

i did change a lot of application code, but now that's not even running

marshall13:08:59

is the transactor starting for some amount of time before you get the heartbeat failure?

jdkealy13:08:13

like 5 seconds perhaps

marshall13:08:52

if you look in cloudformation, do you see active heartbeats for any time or is it immediate failure?

robert-stuttaford13:08:15

guys, i've just had dynamo db issues as well

robert-stuttaford13:08:25

with a system that has been up for months

robert-stuttaford13:08:36

bet you DDB is having a tantrum

marshall13:08:42

aha.! perhaps there is a plot afoot šŸ˜‰

jdkealy13:08:00

i don't see any events since the thing initially launched

robert-stuttaford13:08:26

nothing on aws status

marshall13:08:46

The twittersphere seems to agree

marshall13:08:07

as of about 15 min ago there are reports of a DDB outage. possibly more out in us-east1

robert-stuttaford13:08:35

expletives and swearwords

marshall13:08:07

EC2 now shows ā€œelevated launch error ratesā€ on AWS status page

potetm13:08:08

I should have check in earlier. We were on the line with the AWS guys half an hour ago. They said "we're updating the status page soon."

robert-stuttaford13:08:45

was that because you had downtime @potetm ?

marshall13:08:45

@potem I saw your tweet - figured something like that was happening

jaret13:08:20

I need to start following everyone...

potetm13:08:23

Yeah we're out completely right now.

robert-stuttaford13:08:39

how did you know there's an outage, @potetm ?

potetm13:08:51

We were on chat with AWS. Yeah.

marshall13:08:53

nothing like finding out the best source for outage news is twitter vs. official status pages.

robert-stuttaford13:08:00

ass. that's not scalable at all

potetm13:08:08

Yeah, that's why I tweeted about it šŸ™‚ Just doing my duty!

potetm13:08:26

#moreimpactfulthanvoting troll

robert-stuttaford13:08:29

-follows you- think you could keep doing that? -grin-

jaret13:08:32

@robert-stuttaford you mean you canā€™t keep an open chat with AWS 24/7 for status updates?

robert-stuttaford13:08:25

so what do we do now? game of chess? šŸ™‚

potetm13:08:51

Heroes of the Storm

robert-stuttaford13:08:31

looks like things are stable again

robert-stuttaford13:08:17

oh, wait, no. my EC2 console was stale

robert-stuttaford13:08:24

last new transactor was 5 minutes ago

pesterhazy14:08:03

not seeing any dynamo issues (eu-west-1). fingers crossed!

robert-stuttaford14:08:14

Amazon DynamoDB (N. Virginia) Increased latencies less 6:47 AM PDT We are currently investigating increased API latencies in the US-EAST-1 Region.

ljosa14:08:17

we're also seeing transactor restarts (and flapping between our two transactors, as one tries to take over when the other kills itself). Is it expected behavior for the transactor java process to kill itself and restart when a heartbeat fails? Selected log lines:

2016-08-30 14:01:27.031 INFO  default    datomic.lifecycle - {:tid 18, :pid 7028, :event :transactor/heartbeat-failed, :cause :timeout}
2016-08-30 14:01:27.033 ERROR default    datomic.process - {:tid 120, :pid 7028, :message "Critical failure, cannot continue: Heartbeat failed"}
2016-08-30 14:01:27.057 WARN  default    org.hornetq.core.server - HQ222113: On ManagementService stop, there are 2 unexpected registered MBeans: [core.acceptor.7b92fd66-6eb7-11e6-a9c9-eb6e98878cd4, core.acceptor.7b932477-6eb7-11e6-a9c9-eb6e98878cd4]
2016-08-30 14:01:27.076 INFO  default    org.hornetq.core.server - HQ221002: HornetQ Server version 2.3.17.Final (2.3.17, 123) [5d3fd9ae-ed45-11e5-a317-db72314f6b95] stopped
2016-08-30 14:02:03.345 WARN  default    datomic.slf4j - {:tid 10, :pid 12511, :message "Starting datomic: ..."}

robert-stuttaford14:08:20

are you using your own instance configuration, rather than the AMI provided by Cognitect, @ljosa?

ljosa14:08:46

yes. so runit restarts the process when it quits.

robert-stuttaford14:08:54

yes. it kills itself to allow auto-scaling to notice that it's dead and replace the instance entirely

mitchelkuijpers14:08:10

We are also down šŸ˜ž

robert-stuttaford14:08:28

we've been stable for 30 mins now

mitchelkuijpers14:08:47

We were stable for 5 minutes and then it went dark again

potetm14:08:10

I'm still having lots of errors, but I'm still able to run queries successfully.

mitchelkuijpers14:08:23

Data reads keep working for us too

mitchelkuijpers14:08:29

but that could be cached

potetm14:08:37

#retriesFTW #ReleaseItPatterns

robert-stuttaford14:08:55

almost certainly cached

ljosa14:08:02

we don't see any SystemErrors in CloudWatch, and the SuccessfulRequestLatencies are normal. In between the failed heartbeats and transactor restarts, things work normally.

mitchelkuijpers14:08:40

And ours is back

ljosa14:08:53

our most recent failed heartbeat were at 14:01Z and 14:11Z. So good for 15 min now.

jdkealy14:08:30

have you guys seen this happen before?

potetm14:08:56

Alright, on your word @ljosa we'll try and bring the backend services up.

potetm14:08:05

I'll send you the bill if it doesn't work troll

mitchelkuijpers14:08:46

@jdkealy There also was an dynamodb issue a while ago, but it does not happen often

ljosa14:08:44

on the upside, this shows that the transactor HA is working. šŸ™‚

jdkealy14:08:46

im back up too... is there any way to protect against this? should i be thinking of different backends other than dynamo ?

ljosa14:08:18

we had to switch from couchbase to dynamodb, and ddb has been great so far (about 6 months).

marshall14:08:23

Realistically, the kind of downtime DDB has is still an order of magnitude (or more) better than pretty much any option you could run on your own behalf

robert-stuttaford14:08:38

the last time DDB went down was 1 week after a MAJOR launch at Cognician. that was so much fun. September last year

robert-stuttaford14:08:59

yup Marshall totally

robert-stuttaford14:08:51

no issues for 45 mins now

potetm14:08:39

@jdkealy There was an outage similar to this last year. https://aws.amazon.com/message/5467D2/

potetm14:08:44

I agree with marshall about relative ddb uptime though.

potetm14:08:26

Some guy must be watching #dynamodb on twitter. The second I said something about an outage, he likes it and tweets this: https://twitter.com/shirleman/status/770614099726114816

ljosa14:08:50

Cassandra is not a great option for Datomic if you're worried about downtime because Datomic cannot work across Cassandra data centers, and a Cassandra data center must be in a single AWS availability zone.

potetm14:08:39

The fallacy there is that there is zero cost to managing your own machines vs using a hosted service. Even assuming the claims are accurate.

marshall14:08:19

Cassandra is a fine option, but Iā€™d be shocked if you could maintain a Cassandra ring with the same uptime and perf as DDB for anywhere near the cost

marshall14:08:46

not to mention you have to do all the work, which means hiring ops staff

ljosa14:08:09

DDB has been great for us cost wise. Memcached is very effective at reducing the number of DDB requests.

robert-stuttaford14:08:33

has anyone used DDB streams to set up real-time multi-region replication?

robert-stuttaford14:08:48

i wonder how quickly one can shift regions with Datomic and DDB

ljosa14:08:42

there's no guarantee that the replica will be consistent, so you're just praying that Datomic will be okay after being started in the other region, right?

robert-stuttaford14:08:12

well, that's why i'm asking, i guess -- is it even a valid strategy

robert-stuttaford14:08:57

so far we've done the old backup-datomic, restore-datomic thing to switch transactor+storage, just once, back when we moved off of our snowflake transactor+postgres server

ljosa14:08:47

we're doing hourly datomic backups to S3. we populate our dev environment with those. I guess we could in theory restore them to a disaster recovery environment in another region. but realistically, if the entire us-east-1 goes down, we're our of business until it comes back.

robert-stuttaford14:08:50

we're also doing the backups that way

robert-stuttaford14:08:05

although i'm planning to switch from hourly to continuously

robert-stuttaford14:08:37

is anything keeping any of you in US-EAST-1 in particular?

potetm14:08:44

No. I want off.

mitchelkuijpers14:08:11

We are creating a Atlassian Connect addon and their servers are also in US-EAST-1 That is the only reason

ljosa14:08:18

we could operate in other regions for maybe an hour or two before we'd have to shut down because we rely on processing in us-east-1 to turn off ad campaigns when they exceed their daily budgets, etc.

robert-stuttaford14:08:19

i'm about 90% of the way to having a fresh env set up in oregon - new AMIs and Ubuntu LTS and whatnot

robert-stuttaford14:08:06

switching from upstart to systemd has been fun

potetm14:08:43

But the cost of relocating is very non-trivial.

potetm14:08:29

And the gain is theoretical.

jdkealy15:08:59

I've read a bit about laziness in datomic and i wanted to ask a quick Q about my use case... i have accounts, accounts have collections, collections have photos, photos have tags. Tags are often removed / edited and i'd like to mark them as active / inactive. The criteria for being active/inactive is having just ONE photo that is not hidden. Is there any way I can have datomic fetch that criteria without scanning every photo in every collection? i.e. is it possible to write a query that returns true / false and will stop scanning after it hits a truthy value ?

marshall15:08:53

@jdkealy Depending on your schema, you might be able to use get-some: http://docs.datomic.com/query.html#get-some

jdkealy15:08:33

oh amazing... i think that might be perfect for my needs

jdkealy15:08:23

is get-some or missing faster ?

jdkealy15:08:46

i guess i'm looking for (not (missing ))

jdkealy15:08:36

anyways, those all look great many thanks

marshall15:08:02

sure. And I actually am not sure which would be faster. I would need to do some testing and thinking šŸ™‚

severed-infinity15:08:16

hey guys Iā€™ve this query

(defn mult-lookup-user [db phones]
  (let [result (d/q '[:find ?e ?phone
                      :in $ [?phone ...]
                      :where [?e :user/phone ?phone]] db phones)]
    (map second result)))

(mult-lookup-user (d/db connect) ["0862561423" "0877654321ā€])
where it will return only existing phones, and running it standalone works perfectly but my issues is using with a yada resource, the important section below
{:post {:parameters {:form {:users [String]}
                                  :body [String]}
            :consumes  #{"application/json" "application/x-www-form-urlencoded;q=0.9"}
            :produces   #{"application/json" "application/edn"}
            :response   (fn [ctx]
                                    (let [users (or (get-in ctx [:parameters :body])
                                                           (get-in ctx [:parameters :form :users]))]
                                                         
                                          (when-let [valid-users (mult-lookup-user (d/db connect) users)]
                                                (println "valid" valid-users)
                                                (if (seq? valid-users)
                                                    (json/generate-string valid-users)))))}}
using the same input as when called standalone returns an empty list, but if I include just one value (valid of course) it returns that singular result. Can any help explain this issue?

robert-stuttaford17:08:50

@severed-infinity i would trace the inputs going into mult-lookup-user in both cases and compare that to what happens when you call it directly

robert-stuttaford17:08:58

those are south african numbers, right? šŸ™‚

severed-infinity17:08:57

@robert-stuttaford Iā€™ve removed the println calls for clarity but input shows the list of numbers coming in, but the results are as follows before and after

["0862561423","0877654321"]
valid ()
they are Irish mobile phone numbers

robert-stuttaford17:08:50

against the same database value?

robert-stuttaford17:08:02

are you printing the result coming from datalog directly?

robert-stuttaford17:08:10

ie, put (prn :in phones db :out result) before (map second result)

severed-infinity17:08:22

I assume you mean like so

(defn mult-lookup-user [db phones]
  (let [result (d/q '[:find ?e ?phone
                      :in $ [?phone ...]
                      :where [?e :user/phone ?phone]] db phones)]
    (println results)
    (map second result)))

robert-stuttaford17:08:40

well, result not results šŸ™‚

robert-stuttaford17:08:45

and print the inputs

severed-infinity17:08:11

:in ["0862561423" "0877654321"] :out #{[17592186045419 "0862561423"] [17592186045453 "0877654321"]}

robert-stuttaford17:08:31

looking good so far

severed-infinity17:08:52

but when called from the resource modal :in ["\"0862561423\",\"0877654321\""] :out #{}

severed-infinity17:08:29

these are the two I am testing currently, as you can see with more than one I get an empty set but with one value I get the results

[0862561423,0877654321]
:in ["0862561423,0877654321"] :out #{}
valid ()
[0862561423]
:in ["0862561423"] :out #{[17592186045419 "0862561423"]}
valid (0862561423)

fenton17:08:30

is there api to insert into datomic...letting datomic set the :db/id? Seems odd to force the user to create a temp id for all the inserts...

robert-stuttaford17:08:31

looks like you're passing in a string

robert-stuttaford17:08:54

in your yada impl, before you pass the numbers to your query fn, first (clojure.edn/read-string) it

severed-infinity17:08:58

oh so does appear to be

robert-stuttaford17:08:10

@fenton, no. you have to make tempids every time

robert-stuttaford17:08:27

which, imho, is a far better tradeoff than some hidden magic you can't control šŸ™‚

fenton17:08:03

@robert-stuttaford I'd have preferred it to create it auto if not specified. šŸ˜ž

fenton17:08:41

do people do more than just call 'create temp id'?

robert-stuttaford17:08:42

if you absolutely must have it, write a function that does it for you. speaking as someone who's been there, and learned the hard way, you really just want to get used to providing them

fenton17:08:18

@robert-stuttaford ok...its a minor inconvenience only...and can be abstracted like u suggest. thx! šŸ™‚

robert-stuttaford17:08:13

the danger with the abstraction is it makes it harder for you to use them in more complex ways later on when you realise the full power of the design

robert-stuttaford17:08:51

you end up either ditching the abstraction part of the time, or making more convoluted abstractions. either way, you lose, either consistency, or simplicity

fenton17:08:06

why is this feature powerful?

robert-stuttaford17:08:24

i know. i've got several tens of thousands of lines of code written over several years by many people which bears the evidence of this

fenton17:08:39

i believe you šŸ™‚

severed-infinity17:08:41

@robert-stuttaford thank you for that, solution parsed-users (str/split (first users) #",ā€) though I do not know why are array of strings turned into a an array with a joined single string value

robert-stuttaford17:08:51

you can express complex relationships in a single transaction

robert-stuttaford17:08:06

@severed-infinity likely something yada or a middleware is doing

severed-infinity17:08:55

Yea I tired to ask in the yada chat before I got to the datomic stuff but got no response and continued on with what seemed like a working solution

robert-stuttaford17:08:53

@fenton e.g. transact an order and all of the individual order items together

robert-stuttaford17:08:14

with all the relationships expressed in the same transaction

robert-stuttaford17:08:11

(let [order-id (d/tempid :db.part/user)]
  [{:db/id      order-id
    :order/uuid (d/squuid)
    :order/user [:user/email ""]}
   {:db/id                    (d/tempid :db.part/user)
    :order.item/order         order-id
    :order.item/product       [:product/slug "tesla-model-s"]
    :order.item/unit-price    100000
    :order.item/unit-currency :usd}
   {:db/id                    (d/tempid :db.part/user)
    :order.item/order         order-id
    :order.item/product       [:product/slug "starbucks-venti"]
    :order.item/unit-price    20
    :order.item/unit-currency :usd}])

fenton17:08:30

@robert-stuttaford ok...yes that makes good sense for sure.

robert-stuttaford17:08:05

this makes mocking fake databases with d/with to test functions at the repl an absolute pleasure

fenton17:08:42

oh?! i've not seen the d/with thing...

fenton17:08:05

hmm... I'll have to keep that in mind...

fenton17:08:17

already there! šŸ™‚

fenton17:08:39

just trying to understand the d/with part a bit better...how do u use that in the repl for testing?

robert-stuttaford17:08:32

(def mock-db (->> some-made-up-tx-that-uses-real-data-and-adds-some-mock-data,-like-above (d/with some-actual-storage-backed-db-value) :db-after))

robert-stuttaford17:08:31

mock-db is a db you can pass into any api fn that takes a db (including d/with!) that you can query against as normal. you'll find all the stuff in storage, and all the stuff in your mock transaction, all together as though it was really transacted

robert-stuttaford17:08:42

you may have heard of time-travel databases, or speculative databases. this is that.

robert-stuttaford17:08:51

it's all just in local memory

fenton17:08:05

ok, obviously this is something I'll need to know and being slow will take a bit of time to grok...I'll share it with our local clojure meetup for discussion...

robert-stuttaford17:08:35

you'll get it sooner than you think, once you poke at it for a bit

fenton17:08:58

kk @robert-stuttaford thanks for taking the time to hand hold...really appreciate! šŸ™‚

robert-stuttaford17:08:00

10-15 are about Datomic

robert-stuttaford17:08:18

some good (peer-reviewed and nicely edited) explanation in there!

fenton17:08:46

oh really cool! thx!!! šŸ™‚

fenton19:08:20

@robert-stuttaford I get it now. Pretty straight forward actually. Just to re-iterate. d/with allows you to run transactions with a 'seeming' copy of the database. Then you can inspect the results to see that they are what you want them to be. Thereby allowing you to test new DB functions on a live database without mucking up the live database.

pheuter19:08:30

If our current Datomic Pro license supports 5 processes, and weā€™re currently running 2 transactors and 3 peers (different environments), what happens when a 4th peer attempts to connect?

pheuter19:08:33

Will it throw an error?

pheuter20:08:06

Or perhaps it will bump another peer off?

pesterhazy20:08:26

don't think it'll ever bump others off

dm320:08:48

license is # of peers per transactor IIRC

dm320:08:09

so you can have 5 peers. 6th peer will not be able to connect

pheuter20:08:13

oh interesting, so not a total number of processes

pheuter20:08:49

we have a big deploy coming up, any documentation around this just to make sure?

pheuter20:08:12

the website seems to suggest otherwise

pheuter20:08:23

as in total process count (transactor + peers)

potetm20:08:31

@pesterhazy that hasn't been my experience. When you cross the limit, the existing peers don't get to keep their connections.

potetm20:08:10

@pheuter my experience has been that it's peer count, txors don't go against the total

potetm20:08:19

But you can always fire it up in AWS to test before you get started.

pheuter20:08:36

makes sense šŸ‘

pesterhazy20:08:31

yeah in my experience the transactors don't count

pheuter20:08:56

> the existing peers don't get to keep their connections. thatā€™s scary, no?

pesterhazy20:08:03

but we have some distance to the limit of 5, so ymmv

pesterhazy20:08:28

what I've seen is that you can't connect if the limit is reached

pheuter20:08:06

the website says ā€œprocesses (transactors + peers)"

pheuter20:08:15

that seems to suggest transactors count towards process count, no?

potetm20:08:06

It does suggest that, but if you turn on CW logging, there's a specific peer count metric. And we ran into problems when that metric was over the max.

potetm20:08:13

I don't believe I've seen the "existing peers don't keep connections" documented anywhere, but that's what appeared to happen to me last week. So, def wanna confirm that with @marshall or @jaret

ljosa20:08:54

it's specifically the number of different IP addresses, it seems.

ljosa20:08:36

we haven't hit the limit of 22, but before we bought the licenses we kept hitting the limit of 2.

marshall20:08:56

The limit is transactor + peers (i.e. a 5 process license would be 1 txor, 4 peers). HA Transactors donā€™t count. Each license is contractually limited to a single production system, so if you have a 5 process license, you should be running no more than 1 transactor and 4 peers concurrently in production

pheuter20:08:19

is ā€œproductionā€ defined as a sql driver vs dev?

pheuter20:08:40

or can we run a sql backed transactor on stage as well without incurring license costs?

marshall20:08:22

you can fully replicate your system on staging/dev/etc

marshall20:08:42

production in this case is defined as your production application that faces users/runs the business, etc

marshall20:08:04

your testing/staging/dev instances can use whatever storage you like

marshall20:08:15

as long as their purpose is for staging, etc, not for production use

pheuter20:08:38

Thanks for clarifying! Makes more sense now.

marshall20:08:48

sure šŸ™‚

ljosa20:08:55

https://clojurians.slack.com/archives/datomic/p1472588454000286 ^ by this I meant that the technically enforced limit is a little more permissive than the agreement, so you'll still have to count manually to stay honest. the tech just prevents massive overruns from happening when you forget.

timgilbert21:08:38

Say, what's the quickest / simplest way to check whether an entity with a given value exists? I'm trying to come up with a sytem where some things use a String slug as their ID, like {:company/name "Boris LLC"} winds up as {:company/name "Boris LLC" :company/slug "boris-llc"}...

timgilbert21:08:32

So I'm looking at writing a loop where if there's already a [:company/slug "boris-llc"] I generate "boris-llc-1", "boris-llc-2" etc

timgilbert21:08:33

Right now I'm planning on (d/entity db [:company/slug "boris-llc"]) but I thought I'd check to see if anyone has some advice on it first

bhagany22:08:37

@timgilbert I think that would be fine. I do something like this with query, but my situation is complicated by my slugs not being globally unique. I wish I could do it with entity.

timgilbert22:08:06

Cool, thanks

adammiller22:08:22

@timgilbert believe you could do something like this to avoid a loop and get them all at once:

(q '[:find ?e
       :in $ ?slug-partial
       :where
       [?e :company/slug ?slug]
       [(.startsWith ?slug ?slug-partial)]]

adammiller22:08:55

that way you could just pass ā€œboris-llcā€ and get everything that begins with it in one call.

adammiller22:08:20

granted that may not be the most efficient