Fork me on GitHub
#datomic
<
2021-07-15
>
kenny14:07:47

We successfully moved to 884-9095! It took about a full day of work. Downtime was longer than 1 minute, less than 1 hour, due to removing the old, self-managed api gateway in place of the Datomic managed one and redeploying the app (realized this could have been avoided after the fact). Seriously awesome update Datomic team. Thanks for making our lives easier.

Tyler Nisonoff14:07:02

🥳 any advice to others about to embark in this migration? I’m currently trying to figure out how to minimize downtime… given we currently use a socks proxy, I’m guessing I’ll have to upgrade datomic, then change the app to use the new endpoint, redeploy the app, and incur downtime during that period?

kenny14:07:46

I think it is very domain specific. Our API gateway downtime was due to needing to sync multiple service updates to point to the new API Gateway endpoint. That could have been minimized by using dns & switching over with the push of a button. Curious, you're using the socks proxy in prod?

Tyler Nisonoff15:07:03

i am right now (although its pre-launch 😛 ) as you say that, I realized I could avoid the socks proxy in prod by using VPC peering, and just have avoided doing that, so maybe i’ll just upgrade and fix the connection in one go

Tyler Nisonoff15:07:43

(the datomic client is running in a separate VPC atm, so the socks proxy was a quick way to get access)

kenny15:07:42

Oh I see. IMO, that's pretty sketchy 🙂 You have a single point of failure -- the bastion host. I don't think the socks proxy was intended to be used in a prod HA environment.

Tyler Nisonoff15:07:53

yeah definitely sketch 🙂 Talking this through, I think if I fix that first, the upgrade will be more straightforward — so I’ll start by getting proper Inter-VPC access working first Thanks 🙂

duckie 2
stuarthalloway15:07:40

@U016TR1B18E you will not need the VPC peering at all post-upgrade, are you making unnecessary work?

Tyler Nisonoff15:07:50

possibly! re-reading the new release instructions and was seeing that its quite different now, so I think the right thing may be: 1. upgrade 2. fix inter-vpc connection and the little bit of downtime isn’t problematic for me

Tyler Nisonoff15:07:35

seems like I’d want to use the new VPC Endpoint

kenny15:07:25

I launched a 884-9095 query group stack with the "Default" metrics selection accidentally selected. Generally, I don't trust defaults since they always seem to change without notice, so I prefer to be explicit about what I select. In my situation, I cannot change from Default to Detailed because the CF stack detects no changes (in my case the Default is Detailed). I could change from Default -> Basic -> Detailed as a workaround.

redinger21:07:37

Yeah, this is because the CloudFormation evaluates the value of the metrics choice before doing an update. In your case, there was no change. Making any kind of change that results in a changeset would have caused the stack to update, which you discovered by change to Basic and back. Changing the metrics choice this way also caused your instances to restart, because the metrics setting is passed to the instances. To avoid restarting instances, an alternative solution is to change something like the MaxSize. Temporarily bump it higher, and do the metrics change, then you can lower the MaxSize again. Changing the MaxSize wouldn’t cause instances to cycle. Thanks for the feedback, I’ll give some thought to how we might improve this scenario.

2
kenny21:07:41

Thanks for that info @U0508TN2C.