Configure Databricks as a Source

Adding a Databricks source in DvSum:

Azure Databricks is optimized for Azure and tightly integrated with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics, Power BI, and other Azure services to store all your data on a simple, open lakehouse and unify all your analytics and AI workloads

Prerequisite: For authentication of the Databricks Source, a user must have an account on the Azure Databricks portal on which a cluster is running attached to a database. On the Azure Databricks portal, go to the Compute tab and start your cluster if it is in the stop state.

mceclip0.png

mceclip1.png

Step 1: Go to the Data Sources tab and click on Add Source button. A modal will open which will ask us to choose a data source that is to be added. Select Databricks source, provide some source name and Save it.

mceclip2.png
mceclip3.png

Step 2: Once the source is saved, we will be redirected to the connection settings detail page of this new Databricks source. First, enable the checkbox of On-premise Web Service and then select the SAWS which is set up and is currently up and running. Now the host information can be authenticated by Access Token or Client Secret.

checkbox.pngselect.png

Note: By Default the SAWS type will be cloud. For more information regarding Cloud SAWS, click here

Scenario 1: Authentication using Access Token

For Authentication using Access Token, enter the correct Server Hostname, Http path, and Personal   Access Token and click the Authenticate button.

access_tokens.png

Scenario 2: Authentication using Client Secret

For Authentication using Client Secret, enter the correct Server Hostname, Http path (Server Hostname and Http path used for authenticating via Access Token will be the same here), Azure Subscription Id, Azure Resource Group, Azure Workspace, Azure Tenant Id, Azure Client Id, and Azure Client Secret. Click the Authenticate button.

authenticate.png

Step 3: After successful Authentication, the Database section will appear underneath. Here any database can be selected according to the requirement. 

mceclip6.png

Step 4: After credentials are authenticated and the database is selected, we need to save the source. For that, scroll up to the top. From the top right corner click the “Done” button.

mceclip8.png

After that click the “Save” button. The source will get saved successfully and after that click on the “Test Connection” button. 

mceclip9.png

Now we can move to the Scan History Page and click the "Scan Now" button. A job will be created and once its status gets Completed, our new Databricks source's scan will be completed successfully. After the scan completion, click on Scan Name and it will open the Scan Summary page of this scan.

mceclip10.png

On the Scan Summary page, it will show all the insights of the scan i.e how many new tables and columns are fetched in this scan from the database that we selected earlier.

mceclip11.png

In order to have more insights of the details of tables, click on "Data Dictionary" from the sidebar. A table listing view will appear. Click on the "Recently Refreshed" tab. In this tab, we will see all the tables that we have got in the recent scan. Click on table names to get to know more details of the table from the detail page.


mceclip12.png

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk