Reading csv from S3 and pasting into MySQL table with AWS Lambda

I am trying to automate the loading of csv into a MySQL table when it was accepted into the S3 bucket.

My strategy is that S3 fires an event when it receives a file in the specified bucket (let it be called a "bucket file"). This event is notified of the AWS Lambda function, which will load and process the file by inserting each row into the MySql table (let's call it "target_table").

We must take into account that RDS is in VPC.

Current permission configuration for the bucket:

{ "Version": "2008-10-17", "Statement": [ { "Sid": "PublicReadForGetBucketObjects", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::bucket-file/*" } ] } 

I created a role with the following policies: AmazonS3FullAccess and AWSLambdaVPCAccessExecutionRole attached to the AWMS Lambda function.

Lambda code:

 from __future__ import print_function import boto3 import logging import os import sys import uuid import pymysql import csv import rds_config rds_host = rds_config.rds_host name = rds_config.db_username password = rds_config.db_password db_name = rds_config.db_name logger = logging.getLogger() logger.setLevel(logging.INFO) try: conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5) except Exception as e: logger.error("ERROR: Unexpected error: Could not connect to MySql instance.") logger.error(e) sys.exit() logger.info("SUCCESS: Connection to RDS mysql instance succeeded") s3_client = boto3.client('s3') def handler(event, context): bucket = event['Records'][0]['s3']['bucket']['name'] key = event['Records'][0]['s3']['object']['key'] download_path = '/tmp/{}{}'.format(uuid.uuid4(), key) s3_client.download_file(bucket, key,download_path) csv_data = csv.reader(file( download_path)) with conn.cursor() as cur: for idx, row in enumerate(csv_data): logger.info(row) try: cur.execute('INSERT INTO target_table(column1, column2, column3)' \ 'VALUES("%s", "%s", "%s")' , row) except Exception as e: logger.error(e) if idx % 100 == 0: conn.commit() conn.commit() return 'File loaded into RDS:' + str(download_path) 

I tested the function, and S3 sends an event when the file is loaded, Lambda connects to the RDS instance and receives a notification. I checked that the bucket name is "bucket-file" and the file name is also true. The problem is when the function reaches the s3_client.download_file(bucket, key,download_path) line s3_client.download_file(bucket, key,download_path) , where it gets stuck until the s3_client.download_file(bucket, key,download_path) reaches its expiration date.

Watching the magazines, he says:

 [INFO] 2017-01-24T14:36:52.102Z SUCCESS: Connection to RDS mysql instance succeeded [INFO] 2017-01-24T14:36:53.282Z Starting new HTTPS connection (1): bucket-files.s3.amazonaws.com [INFO] 2017-01-24T14:37:23.223Z Starting new HTTPS connection (2): bucket-files.s3.amazonaws.com 2017-01-24T14:37:48.684Z Task timed out after 60.00 seconds 

I also read that if you work in VPC, to access the S3 bucket you need to create a VPC endpoint that provides S3 access for this subnet. I also tried this solution, but the result is the same.

I would appreciate some ideas.

Thanks in advance!

+5
source share
1 answer

I finally got it!

The problem is related to the VPC problem. As I said, I created a VPC endpoint to make the S3 service available for my VPC, but I did not set up the route table correctly.

So in conclusion, if you are working in VPC with lambda and want to access S3, you need to create a VPC endpoint. In addition, if you want to access any other Internet service outside your VPC, you need to configure a NAT gateway.

+2
source

Source: https://habr.com/ru/post/1263266/


All Articles