Fork me on GitHub
#datomic
<
2021-03-24
>
furkan3ayraktar08:03:15

I’ve asked the same question in https://forum.datomic.com/t/message-listeners-in-ions/860/4 on Datomic forum but I wanted to shoot it here as well thinking that more people might see it. I’m wondering if anyone has a solution for this. My understanding is, it is okay to run background threads (such as polling from a queue) as long as it is coordinated through Lambda calls. However, this leaves me with another question. Let’s say I have a Lambda ion that starts the background thread after a deployment. I can trigger that Lambda function via an EventBridge rule that watches the Datomic Ion deployments on CodeDeploy. This way, it will be certain that the Lambda function run after each successful deployment and the background thread will be started. However, if I have one query group with let’s say three nodes where each node should start a background thread. If I’m understanding correctly, the Lambda ion call will only be executed on one of the nodes in the query group rather than all of them. Is there a specific Lambda ion type that executes in all of the nodes or am I missing something else?

tatut11:03:41

you don’t need lambda to run background thread on every node

tatut11:03:07

we have a thread that polls for configuration changes in SSM parameter store that is simply started when a particular ns is loaded

tatut11:03:35

well, a j.u.c executorservice to be exact

furkan3ayraktar12:03:50

How do you handle things like if you want to stop the background thread or if it’s stopped for some reason unintentionally, start it again? In the thread I mentioned above, Datomic team was suggesting to control the background threads via a Lambda Ion. Do you have a solution for that? Another question regarding your setup. How do you start the background thread initially, after deployment. Do you just have a line in one of your namespaces where once it’s loaded, it starts the background thread?

em16:03:56

Also curious about this, and more generally about if it's appropriate to put "initialize" (system/start) code as side-effecting function calls rather than a check every time a handler is called?

em19:03:16

Poking around more it seems like there are serious issues with that approach that could cause deploys to fail (one such experience report https://jacobobryant.com/post/2019/ion/, under "Deployment") like calling ion/get-params before any web requests. Given the official team's response on saying that all such stateful system updates should be done through lambda/external calls, @U2BDZ9JG3’s search for a better way of hitting all nodes with a request might be fairly important.

tatut08:03:26

we haven’t had need for stopping the background stuff

furkan3ayraktar14:03:52

@U0CJ19XAM Do you have any best practices / solutions in this topic?

Joe Lane14:03:12

What actual problem are you trying to solve @U2BDZ9JG3? • "Having background threads do work" isn't a problem, it's a capability. • "I have to process a lot of data and I need to apply back-pressure to ensure I don't overwhelm my system" is closer to a problem. I could tell you all sorts of things about lambdas, coordination, orchestration of stateful things in a distributed system, but I'd rather have the concrete scenario.

furkan3ayraktar15:03:53

Thanks! I’m trying to figure out the best way to implement this and it would be very helpful if you can direct me to the right direction. Here is a concrete scenario. Let’s say I have a SQS queue and have a query group that is named worker-query-group. The each node in the worker-query-group is tasked to have a background thread which will poll from SQS continuously and process the messages received. I have this setup: 1. Ion Deploy worker-query-group 2. A new deployment is created in CodeDeploy 3. CodeDeploy deployment is successful 4. An EventBridge event is triggered after successful deployment 5. EventBridge event triggers a Lambda Ion, named sqs-poll-controller 6. sqs-poll-controller Lambda Ion is executed in one of the nodes within the worker-query-group 7. Polling from SQS is started I can also call sqs-poll-controller Lambda Ion manually to start/stop the background thread for polling from SQS. I have a problem when I have more than one nodes in the worker-query-group. The Lambda Ion (sqs-poll-controller) executes only on one of the instances and I’m in the search of figuring out how I can control all of the nodes within the query group with a Lambda Ion or any other way that is recommended. I got this idea of controlling background threads via a Lambda Ion from a Datomic team member’s https://forum.datomic.com/t/message-listeners-in-ions/860/2 and other linked https://forum.datomic.com/t/kafka-consumer-as-an-ion/823/4 of him on the forum.

Joe Lane15:03:08

Is this actually your use-case? Do you actually have an SQS queue of work?

Joe Lane15:03:29

> I have a problem when I have more than one nodes in the worker-query-group. This is a problem with one of many possible SOLUTIONS to problem X. What is problem X?

furkan3ayraktar15:03:19

Yes, I’m not creating a non-existing issue. I have an SQS queue and a query-group dedicated for consuming messages from that SQS queue. Since there are many messages pushed to the queue, in order to increase the capacity, we wanted to add more nodes by increasing the number of instances in the auto-scaling group for the query-group. I agree, there might be different solutions to this problem. What I’m trying to learn is what is the best practice. I’m open to the ideas how to overcome the problem where there is one queue full of messages that needs to be read and processed.

furkan3ayraktar15:03:48

And, the problem you quoted above comes from the fact that I’m trying to control (start/stop) background threads in the nodes via a Lambda Ion. I got that idea from the forum, but you can direct me a totally different approach to the root problem I’m trying to solve, which is, having a queue full of messages and needing nodes to read those messages and process.

Joe Lane16:03:18

Instead of using a pull based integration (workers polling) you should assess a push based model combining https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html . You would need to enhance one of the roles we create for an ion with an additional policy (documented in the link above) to allow SQS to invoke the lambda. Technically this isn't something we officially support, but I've seen it done successfully. A downside with this approach is that you run the risk of overwhelming the primary group with transactions from your autoscaling query group if you're not careful. The upside is that you don't have any state to manage in a distributed system.

Joe Lane16:03:35

Using Step Functions with Ions is another approach, depending on the needs of the business problem. (e.g., it is a long running process that requires human approval steps)

furkan3ayraktar23:03:07

I’ve implemented polling SQS via Lambda in another project in the past, however, Lambda polling SQS has some issues. You can see a glimpse of it here in this https://zaccharles.medium.com/lambda-concurrency-limits-and-sqs-triggers-dont-mix-well-sometimes-eb23d90122e0. For that reason, I prefer polling from the query-group rather than relying on a Lambda. I’m having hard time imagining a solution with Step Functions to this problem. Also, both cases will incur additional costs and complexity.  Anyway, my understanding is that there is no best practice around communicating to all of the nodes within the Datomic query groups. Something like a special kind of Lambda Ion which could trigger a Clojure function in all of the nodes of a query-group would be a very nice to have in order to communicate with the running nodes easily.

joshkh15:03:25

i saw in the latest release notes: • Upgrade: AWS Lambda Runtimes moved to NodeJs 12.x does this mean faster cold starts for Ions?

Joe Lane15:03:49

No, it means that the NodeJS version used to power operational automation lambdas has been upgraded (there is a hard deprecation of the previous version around the end of March and if you don't upgrade you WILL experience an operational outage when a cluster node goes down because a new one won't come back up.)

joshkh15:03:01

thanks. i realised my question was a little absurd given that it's a compute upgrade but i thought i'd ask anyway. and thanks for the warning about the hard deprecation. that's good to know.

tatut11:03:54

an outage? what does it mean “cluster node goes down”?

Joe Lane13:03:42

If for some reason a machine needs to be replaced (an auto-scaling event, for example), it wont be.

tatut13:03:59

ok, so we should upgrade compute groups asap

Joe Lane16:03:56

FWIW @joshkh, I looked at the https://aws.amazon.com/lambda/pricing/#Provisioned_Concurrency_Pricing and the prices look reasonable for minimum provisioning of 2. It's ~$7.67 per month (672 hours, specifically) for 2 provisioned lambdas with 256MB (ions use this) processing 1 Million requests, each taking 1 second. This isn't limiting you to two Lambdas either.

kenny18:03:21

I've added a feature request to support newer AWS instance types in Datomic Cloud. If you're interested in saving 10-20% off your Datomic AWS bill, please vote this feature here: https://ask.datomic.com/index.php/604/support-for-recent-aws-instance-types.

kenny18:03:00

For those curious, I did try modifying the query group CF template to manually add those instance types in. This will fail in the ASG with the following message: > The instance configuration for this AWS Marketplace product is not supported. Please see the AWS Marketplace site for more information about supported instance types, regions, and operating systems. Launching EC2 instance failed. ARM based processors will fail with this message: > The architecture 'arm64' of the specified instance type does not match the architecture 'x86_64' of the specified AMI. Specify an instance type and an AMI that have matching architectures, and try again. You can use 'describe-instance-types' or 'describe-images' to discover the architecture of the instance type or AMI. Launching EC2 instance failed.