How to establish a fast and reliable connection S3-EC2

EC2 provides a very convenient scalable on-demand mechanism for performing distributed (parallel processes), and S3 provides a reliable storage service.

I tried to use EC2 nodes for the ETL process and analytics, this process requires a lot of data (100 GB - 1 TB), which arrive very quickly (and several times a day), and sufficient computing resources that will be available for a short duration.

The above design needs

  • High speed / fast connection between S3 and EC2.
  • The S3 β†’ EC2 connection should also be reliable, since scheduling startup, data transfer, execution of processes and terminating nodes should be performed as soon as possible, not only to save costs, but also because SLAs are involved.

But for now

  • The only way to pull data from S3 is apparently through http and therefore is limited by the load restrictions of EC2 nodes.
  • In addition, the ingestion of data passes through the Internet and, therefore, can be unreliable for strict planning purposes, which requires adequate buffering for all tasks.

In a private data center installation, you can configure a faster (for example, 10 Gbps) leased line between the storage and physical nodes.

Are there any possible service options / options in the case of aws that may satisfy the above requirements.

+6
source share
3 answers

Depending on how active the network activity of other EC2 instances is on the same physical server, you click on a specific S3 node at any time, regardless of whether you are in the same region as your S3 endpoint, etc. d.

You can compare yourself, but even then it will be very different. From time to time I get a few megabytes per second and a couple of hundred kilobytes at other times.

+5
source

I think there is a better answer now.

There is a separate service Data Conveyor , which provides reliable data transfer between S3 and EC2

+3
source

At least I found this recently (although perhaps this is possible sometime). Cloudberry offers a very fast way to transfer data from S3 to EC2. Speed ​​ranges from 40 MB to 50 MB. Here is the process. download cb s / w from http://www.cloudberrylab.com/free-amazon-s3-explorer-cloudfront-IAM.aspx . Connect to S3. as soon as the files are visible, right-click the file to copy and select a website. This will show weburl for the file. Copy the entire URL and on AWS VM use wget to get the contents of the URL (wget [copied URL]

I'm still looking for tools to copy data from VM to S3. S3cmd is slow and breaks too often.

+2
source

Source: https://habr.com/ru/post/918130/


All Articles