Introduction to Data Preparation Workbench

Introducing DvSum Data Preparation Workbench, a place where you can access data from source conveniently, prepare and perform cleansing actions and commit validated data back to source. Let's start.

On your DvSum homepage, you will see Data Preparation in left navigation. Go to "Datasets"

mceclip0.png

You can create multiple datasets on a single table. With every dataset you create, the idea is to filter some part of data for analysis that is according to your requirements. That filtered data will open in a workbench.

Note: For now, Data Preparation Workbench functionality is only applicable for Oracle and SQL Server sources only

mceclip0.png

 

Create Dataset

There are 3 simple steps to create a dataset.

Step 1 - Click on Create button and provide source and table name on which dataset is to be created.

mceclip1.png

 

Step 2 - Choose Columns and Apply filters (if required)

Here, you can choose the columns you want to see on workbench and the order by which you want to view them on Workbench. You can also apply filters to dataset according to your requirements. 

mceclip2.png

 

Step 3 - Provide Unique Identifier and Sharing.

For this step, you have to mention the Unique identifier of the table. If you want to share this dataset with others, select the required option.

mceclip3.png

This is it. You will see the created dataset on dataset page and when you click on the name, it will open a workbench fetching the data accordingly.

mceclip4.png

 

Workbench Operations

There are multiple operations you can perform on workbench. Here is a list of few:

1. Inline Editing

Click on a cell and start editing. The changes will automatically be saved and you can commit the changes to write-back to source.

2. Drag And Copy

You can select a cell and drag cell-selection outline to the place where you want the copy to appear and release the mouse button. Pressing CTRL + D will copy the value to all selected cells.

mceclip5.png

mceclip6.png

 

3. Find and Replace

This is a bulk-update operation performed on a single column. You can even choose to fill all blanks with a particular value across the data source.

Select a column and you will see Find and Replace Option. 

mceclip7.png

mceclip8.png

 

4. Delete Rows

This is a row level operation. Once you select row(s), you will see the option to Remove Rows. You can remove single or multiple (at max 100) rows at a time. Please note that this will permanently remove the records form your data source and you will not be able to Undo after Committing.

mceclip9.png

 

5. Apply Validations for Write-back

Sometimes there might be certain validations required on a column for write-back operation. For example, Customer_ID should have no blanks, or for a string Column "Country" the input value should have only 2 characters (US, UK etc) 

Such validations can be applied from Profiling page using Master Completeness Ruleset (MDC Ruleset) on a table.

There are 3 types of data constraints you can apply from MDC ruleset

1. Blanks Check

On a column where blanks check is applied, workbench will highlight null fields are red and it will not take "Blank" as an input

2. Value Range

Allow only a specific range of values as valid

3. Data Format Constraints

Validations on string (character length limit), Number (only integers, whole numbers or decimal values) are to be allowed.

mceclip10.png

These rules are reflected on Workbench. You will see a small badge for each of the rules above (B, VR, DF) below the column name on which rule is applied. 

When you hover on the DF badge, it will show you the validation and highlight the cells which do not meet the criteria.

mceclip11.png

 

6. Apply DvSum Rules as Filters

The DvSum rules, like Custom Query, Blanks, Value Range, MDC Ruleset, Orphan Records, Integrity Check created on a particular table can help you filter out the data on workbench. You can select one rule at a time on workbench and it will bring only those records which are shown as exception on rule detail page. 

Click on the "Select a Rule" drop-down where you will see only the above mentioned rules. If the rule has not been executed, it will show as disabled. Similarly, rules other than the listed will be disabled.

mceclip12.png

 

The selected rule will start to show on top and it will fetch the records according to rule definition.  This is helpful when you want to perform cleansing actions on selected data. 

mceclip13.png

 

This is the basic introduction to Data Preparation workbench. Please use our new feature and let us know if you have any feedback.

 

 

 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk