Planning for long-term tasks using AWS

My application is heavily dependent on AWS services, and I'm looking for the best solution based on them. The web application launches the scheduled task (presumably repeats endlessly), which requires a certain amount of resources that must be completed. One task run usually takes a maximum of 1 min.

The current idea is to transfer jobs through SQS and create workers in EC2 instances depending on the size of the queue. (this part is more or less clear) But I try my best to find the right solution for the actual launch of tasks at certain intervals. Suppose we are dealing with 10,000 jobs. Therefore, for the scheduler to run a 10k cronjob (the work itself is quite simple, just passing the job description via SQS) at the same time seems like a crazy idea. So the actual question is how do the scheduler itself autoscale (given the scenarios when restarting the scheduler, creating a new instance, etc.)? Or is the scheduler redundant as an application, and is it wiser to rely on AWS Lambda features (or other scheduling services)? The problem with using Lambda functions is a certain limitation, and the 128 MB memory provided by one function is actually too much (20 MB seems more than enough)

Alternatively, the worker himself can wait a while and tell the planner that he should start the work again. Let's say if the frequency is 1 hour:

1. Scheduler sends job to worker 1 2. Worker 1 performs the job and after one hour sends it back to Scheduler 3. Scheduler sends the job again 

However, the problem lies in the possibility of this worker being reached.

Bottom Line I am trying to create a lightweight scheduler that does not require autoscaling and serves as a hub for the sole purpose of transmitting job descriptions. And, of course, you should not fade when you restart the service.

+5
source share
1 answer

Lambda is perfect for this. You have many short starts (~ 1 minute), and Lambda - for short processes (up to five minutes now). It is very important to know that processor speed is directly related to RAM. The 1GB lambda function is equivalent to the t2.micro instance, if I remember correctly, and the 1.5 GB RAM is 1.5 times the processor speed. The cost of these functions is so small that you can just do it. 128 MB RAM has 1/8 the speed of the micro-instance processor, so I do not recommend using them in reality.

As a queuing mechanism, you can use S3 (yes, you read it correctly). Create a bucket and let the Lambda worker start when the object is created. When you want to schedule a task, put the file in a bucket. Lambda launches and processes it immediately.

Now you must observe some restrictions. Thus, you can only have 100 workers at a time (the total number of active Lambda instances), but you can ask AWS to increase this.

The costs are as follows:

  • 0.005 per 1000 PUT requests, so $ 5 per million job requests (more expensive than SQS).
  • Lambda working hours. Assuming a normal t2.micro processor speed (1 GB of RAM), it costs $ 0.0001 per job (60 seconds, the first 300,000 seconds - free = 5000 jobs).
  • Lambda requests. $ 0.20 per million triggers (first million free)

This setup does not require any servers on your part. It cannot come down (only if AWS itself).

(do not forget to remove the task from S3 when done)

+5
source

Source: https://habr.com/ru/post/1237836/


All Articles