S3 Source Configuration on AWS

Overview

DvSum Data Quality can perform data quality checks on data stored in Amazon Simple Storage Service (S3). DvSum uses Amazon Athena to provide access to the files, and it uses AWS Glue Data Catalog to crawl the files and to gather metadata.

In order to define a data source in DvSum you must first configure an AWS user with appropriate permissions for S3, Athena, and Glue. Follow the steps below to create a new user with the required permissions or to verify that an existing user has the permissions needed.

The detailed steps below explain the process to create a new user in the AWS Console. A command-line version of the same steps is provided as well. Users should follow either the Detailed Steps or the the Command-Line Instructions.

Detailed Steps

Step 1: Add a New User

Typically a new user is created to be used with DvSum. Using an existing user is of course valid as well, but it's important to validate that the user has all permissions documented below.

To add a new user, open the AWS Console, navigate to IAM → Access Management → User, and click on the 'Create user' button.

 

1.1 Set User Details

Set the user name to any valid name, and click on the Next Button.



1.2 Set Permissions

Select "Attach policies directly"

 

1.3 Create a Policy named "dvsum-s3-source-policy"

Click on the "Create policy" button and you will be redirected to the Create Policy page.


Then click on the JSON button, and paste the JSON provided below. Click the "Next" button.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ConfirmPoliciesAndPassRole",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole",
                "iam:GetUser",
                "iam:ListAttachedUserPolicies"
            ],
            "Resource": "*"
        }
    ]
}

 

1.4 Review and create the Policy

Important: the policy name must be exactly the name provide here. The description is optional.

  • Policy name: dvsum-s3-source-policy
  • Policy Description: Policy used by DvSum to confirm permissions

Click the "Create policy" button.

 

1.5 Add Permissions policies

Return to the Create user wizard already in progress from earlier. Select the following permission policies:

    • AmazonS3FullAccess (AWS managed)
    • AmazonAthenaFullAccess (AWS managed)
    • AWSGlueServiceRole (AWS managed)
    • dvsum-s3-source-policy (customer managed)

Click the "Next" button.

 

1.6 Review and Create User

Review User details and Permissions Summary. Then click the "Create user" button.

 

Step 2: Add Role

AWS Glue requires a role with appropriate permissions which will be passed to the crawler when it runs. 

2.1 Create role

Navigate to IAM → Access management → Roles and click the "Create role" button.

 

2.2 Define trust policy

In the first step of the create role wizard, select "Custom trust policy". Paste the code provided below. Click the "Next" button.

Copy from here and paste in the Custom trust policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

 

2.3 Add permissions

Add the following permission policies to the role.

  • AWSGlueServiceRole 
  • AmazonS3FullAccess

Click the "Next" button.

 

2.4 Name, review, and create

Important: the role name must be exactly the name provide here. The description is optional.

  • Role name: dvsum-glue-service-role
  • Role Description: Role used by DvSum to grant the Glue Crawler permission to access files

Click the "Create role" button.

 

Step 3: Generate Access Key

3.1 Navigate to Security credentials

Navigate to IAM → Access Management → Users. Select the user you just created, and click the "Security credentials" tab.

 

3.2 Create Access Key

Click the "Create access key" button.

 

Select "Command Line Interface(CLI)". Click the "Next" button.

3.3 Add a description Tag

A description for the access key is optional, but setting it is a best practice.

  • Description tag value: Access key used by DvSum to access S3, Athena, and Glue services

Click the "Create access key" button.

 

3.3 Retrieve access keys

Save the Access key and Secret access keys. You will use these values when configuring the S3 data source in DvSum.

 

Next Steps

Now that you have an AWS Access Key associated with a user that has all required permissions, the next step is to follow the instructions in Amazon S3 as a source to define a data source in DvSum.

 

 

Command Line Instructions

The steps below achieve the same results as the detailed steps above using the AWS CLI.

# Create a new user (use any valid name for the user)
aws iam create-user --user-name dvsum-user

# Attach these 3 AWS managed policies
aws iam attach-user-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonAthenaFullAccess \
--user-name dvsum-user
aws iam attach-user-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess \
--user-name dvsum-user
aws iam attach-user-policy \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole \
--user-name dvsum-user

# Create dvsum-s3-source-policy (Use this exact name)
aws iam create-policy \
--policy-name dvsum-s3-source-policy \
--policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GetUserInfo",
"Effect": "Allow",
"Action": [
"iam:GetUser",
"iam:ListAttachedUserPolicies",
"iam:PassRole"
],
"Resource": [
"*"
]
}
]
}'

# Attach the policy (your policy-arn will be different from this example)
aws iam attach-user-policy \
--policy-arn arn:aws:iam::318630576054:policy/dvsum-s3-source-policy \
--user-name dvsum-user

# Create role for crawler
aws iam create-role \
--role-name dvsum-glue-service-role \
--assume-role-policy-document \
'{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}'

# Attach AWSGlueServiceRole policy to role
aws iam attach-role-policy \
--role-name dvsum-glue-service-role \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

# Attach AmazonS3FullAccess policy to role
aws iam attach-role-policy \
--role-name dvsum-glue-service-role \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

# Create an access key
aws iam create-access-key --user-name dvsum-user

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk