AWS Glue Crawler does not create a table

I have a crawler that I created in AWS Clue that does not create a table in the Data Catalog after it successfully completes.

The crawler takes about 20 seconds to start, and the logs show that it completed successfully. CloudWatch Magazine shows:

  • Benchmark: starting a scan for a scanner
  • Benchmark: classification completed, writing results to DB
  • Test: Ready letter to the catalog
  • Benchmark: Crawler Done and Ready

I do not understand why tables in the data directory are not created. AWS documents do not really help debugging.

+12
source share
4 answers

check the IAM role associated with the crawler. Most likely, you do not have the correct permission.

When creating a crawler, if you decide to create an IAM role (default parameter), then it will create a policy only for the specified S3 object. if later you edit the crawler and change only the S3 path. The crawler-related role will not be eligible for the new S3 path.

+10
source

If you have tables in the target database, the scanner can link your new files to an existing table, rather than creating a new one.

This occurs when there are similarities in the data or folder structure that the adhesive can interpret as separation.

Also, sometimes I had to update the list of database tables to display new ones.

0
source

You can try to exclude some files from the s3 bucket, and these excluded files should appear in the log. I find this helpful in debugging what happens to the tracked mechanism.

0
source

Here is my example JSON role, which allows glue to access s3 and create a table.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "ec2:DeleteTags", "ec2:CreateTags" ], "Resource": [ "arn:aws:ec2:*:*:instance/*", "arn:aws:ec2:*:*:security-group/*", "arn:aws:ec2:*:*:network-interface/*" ], "Condition": { "ForAllValues:StringEquals": { "aws:TagKeys": "aws-glue-service-resource" } } }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "iam:GetRole", "cloudwatch:PutMetricData", "ec2:DeleteNetworkInterface", "s3:ListBucket", "s3:GetBucketAcl", "logs:PutLogEvents", "ec2:DescribeVpcAttribute", "glue:*", "ec2:DescribeSecurityGroups", "ec2:CreateNetworkInterface", "s3:GetObject", "s3:PutObject", "logs:CreateLogStream", "s3:ListAllMyBuckets", "ec2:DescribeNetworkInterfaces", "logs:AssociateKmsKey", "ec2:DescribeVpcEndpoints", "iam:ListRolePolicies", "s3:DeleteObject", "ec2:DescribeSubnets", "iam:GetRolePolicy", "s3:GetBucketLocation", "ec2:DescribeRouteTables" ], "Resource": "*" }, { "Sid": "VisualEditor2", "Effect": "Allow", "Action": "s3:CreateBucket", "Resource": "arn:aws:s3:::aws-glue-*" }, { "Sid": "VisualEditor3", "Effect": "Allow", "Action": "logs:CreateLogGroup", "Resource": "*" } ] 

}

0
source

Source: https://habr.com/ru/post/1273060/


All Articles