In this article:
-
Overview
-
Prerequisites
-
Step-by-Step Configuration
-
Step 1: Adding a Databricks Source in DvSum
-
Step 2: Configure Connection
-
Step 3: Select Database
-
Step 4: Save & Test Connection
-
Step 5: Scan the Data Source
-
-
Authentication Options
-
Scenario 1: Authentication via Access Token
-
Scenario 2: Authentication via Client Secret
-
-
Reviewing Scan Insights
Overview:
Azure Databricks is an optimized platform for Azure, offering tight integration with services like Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics, and Power BI. It allows data storage in a unified, open lakehouse while consolidating analytics and AI workloads.
This article describes the process of configuring Databricks as a data source in DvSum, facilitating the integration of data cataloging and profiling. The steps outlined apply to both DvSum Data Insights (DI) and DvSum Data Quality (DQ), with minor platform-specific variations.
Adding Databricks source in DvSum:
Prerequisites:
Enabling Query History for Databricks
Before configuring Databricks as a source, ensure that query history is enabled for your Databricks account. This is crucial for tracking data lineage and gaining insights into usage patterns. For more information, refer to the Enabling Query History for Data Sources article.
Cluster and Account Setup: For authentication of the Databricks Source, a user must have an account on the Azure Databricks portal on which a cluster is running attached to a database. On the Azure Databricks portal, go to the Compute tab and start your cluster if it is in the stop state.
Step-by-Step Configuration
Step 1: Adding a Databricks Source in DvSum
- Navigate to Data Sources.
- Click on Add Source.
- In the modal, select Databricks.
- Provide a source name and click Save.
Step 2: Configure Connection
Once the source is saved, we will be redirected to the connection settings detail page of this new Databricks source. First, enable the checkbox of On-premise Web Service and then select the SAWS which is set up and is currently up and running. Now the host information can be authenticated by Access Token or Client Secret.
Note: By Default the SAWS type will be cloud. For more information regarding Cloud SAWS, click here
Authentication Options
You can authenticate using either:
- Access Token
- Client Secret
Scenario 1: Authentication via Access Token
- Enable the On-Premise Web Service checkbox.
- Select the SAWS (Secure Access Web Service) that is set up and running.
- Enter the following details:
- Server Hostname
- HTTP Path
- Personal Access Token
- Click Authenticate.
Scenario 2: Authentication using Client Secret
~Prerequisites for Configuring Azure Databricks (Service Principal Service):
Please refer to the article to configure Azure Databricks (Service Principal Service).
For Authentication using Client Secret, enter the following details:
- Server Hostname
- Http Path (same as used for authenticating via Access Token)
- Azure Subscription Id
- Azure Resource Group
- Azure Workspace
- Azure Tenant Id
- Azure Client Id
- Azure Client Secret
- OAuth Secret
Click the Authenticate button.
Note: To optimize job performance and memory usage with OAuth Secret (a confidential key used to securely authenticate and authorize applications when integrating with external services), select the checkbox and enter the OAuth Secret value. This will automatically initiate and connect to the clusters prior to execution.
The OAuth Secret can be generated by the admin from the Service Principal's secret tab.
Step 3: Select Database
After successful Authentication, the Database section will appear underneath. Here any database can be selected according to the requirement.
Step 4: Save & Test Connection
- Scroll to the top and click Done.
- Click Save.
- Click Test Connection to validate the setup.
After that click the “Save” button. The source will get saved successfully and after that click on the “Test Connection” button.
Step 5: Scan the Data Source
- Navigate to Scan History.
- Click Scan Now.
- Wait for the job status to change to Completed.
- Click on the Scan Name to view the Scan Summary
Reviewing Scan Insights
- Navigate to the Data Dictionary.
- Click on Recently Refreshed to view newly discovered tables.
- Click on table names to explore metadata details.
0 Comments