Overview
DvSum Data Quality can perform data quality checks on data stored in Amazon Simple Storage Service (S3). DvSum uses Amazon Athena to provide access to the files, and it uses AWS Glue Data Catalog to crawl the files and to gather metadata.
In order to define a data source in DvSum you must first configure an AWS user with appropriate permissions for S3, Athena, and Glue. Follow the steps below to create a new user with the required permissions or to verify that an existing user has the permissions needed.
The detailed steps below explain the process to create a new user in the AWS Console. A command-line version of the same steps is provided as well. Users should follow either the Detailed Steps or the the Command-Line Instructions.
Detailed Steps
Step 1: Add a New User
Typically a new user is created to be used with DvSum. Using an existing user is of course valid as well, but it's important to validate that the user has all permissions documented below.
To add a new user, open the AWS Console, navigate to IAM → Access Management → User, and click on the 'Create user' button.
1.1 Set User Details
Set the user name to any valid name, and click on the Next Button.
1.2 Set Permissions
Select "Attach policies directly"
1.3 Create a Policy named "dvsum-s3-source-policy"
Click on the "Create policy" button and you will be redirected to the Create Policy page.
Then click on the JSON button, and paste the JSON provided below. Click the "Next" button.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "ConfirmPoliciesAndPassRole", "Effect": "Allow", "Action": [ "iam:PassRole", "iam:GetUser", "iam:ListAttachedUserPolicies" ], "Resource": "*" } ] } |
1.4 Review and create the Policy
Important: the policy name must be exactly the name provide here. The description is optional.
- Policy name: dvsum-s3-source-policy
- Policy Description: Policy used by DvSum to confirm permissions
Click the "Create policy" button.
1.5 Add Permissions policies
Return to the Create user wizard already in progress from earlier. Select the following permission policies:
-
- AmazonS3FullAccess (AWS managed)
- AmazonAthenaFullAccess (AWS managed)
- AWSGlueServiceRole (AWS managed)
- dvsum-s3-source-policy (customer managed)
Click the "Next" button.
1.6 Review and Create User
Review User details and Permissions Summary. Then click the "Create user" button.
Step 2: Add Role
AWS Glue requires a role with appropriate permissions which will be passed to the crawler when it runs.
2.1 Create role
Navigate to IAM → Access management → Roles and click the "Create role" button.
2.2 Define trust policy
In the first step of the create role wizard, select "Custom trust policy". Paste the code provided below. Click the "Next" button.
Copy from here and paste in the Custom trust policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } |
2.3 Add permissions
Add the following permission policies to the role.
- AWSGlueServiceRole
- AmazonS3FullAccess
Click the "Next" button.
2.4 Name, review, and create
Important: the role name must be exactly the name provide here. The description is optional.
- Role name: dvsum-glue-service-role
- Role Description: Role used by DvSum to grant the Glue Crawler permission to access files
Click the "Create role" button.
Step 3: Generate Access Key
3.1 Navigate to Security Credentials
Navigate to IAM → Access Management → Users. Select the user you just created, and click the "Security credentials" tab.
3.2 Create Access Key
Click the "Create access key" button.
Select "Command Line Interface(CLI)". Click the "Next" button.
3.3 Add a description Tag
A description for the access key is optional, but setting it is a best practice.
- Description tag value: Access key used by DvSum to access S3, Athena, and Glue services
Click the "Create access key" button.
3.3 Retrieve access keys
Save the Access key and Secret access keys. You will use these values when configuring the S3 data source in DvSum.
Next Steps
Now that you have an AWS Access Key associated with a user that has all required permissions, the next step is to follow the instructions on how to Configure Amazon S3 as a Data Source in DvSum.
Command Line Instructions
The steps below achieve the same results as the detailed steps above using the AWS CLI.
# Create a new user (use any valid name for the user)
aws iam create-user --user-name dvsum-user
# Attach these 3 AWS managed policies
aws iam attach-user-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonAthenaFullAccess \
--user-name dvsum-user
aws iam attach-user-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess \
--user-name dvsum-user
aws iam attach-user-policy \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole \
--user-name dvsum-user
# Create dvsum-s3-source-policy (Use this exact name)
aws iam create-policy \
--policy-name dvsum-s3-source-policy \
--policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GetUserInfo",
"Effect": "Allow",
"Action": [
"iam:GetUser",
"iam:ListAttachedUserPolicies",
"iam:PassRole"
],
"Resource": [
"*"
]
}
]
}'
# Attach the policy (your policy-arn will be different from this example)
aws iam attach-user-policy \
--policy-arn arn:aws:iam::<account-id>:policy/dvsum-s3-source-policy \
--user-name dvsum-user
# Create role for crawler
aws iam create-role \
--role-name dvsum-glue-service-role \
--assume-role-policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}'
# Attach AWSGlueServiceRole policy to role
aws iam attach-role-policy \
--role-name dvsum-glue-service-role \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
# Attach AmazonS3FullAccess policy to role
aws iam attach-role-policy \
--role-name dvsum-glue-service-role \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
# Create an access key
aws iam create-access-key --user-name dvsum-user
0 Comments