Adding a Databricks source in DvSum:
Azure Databricks is optimized for Azure and tightly integrated with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics, Power BI, and other Azure services to store all your data on a simple, open lakehouse and unify all your analytics and AI workloads
Prerequisite: For authentication of the Databricks Source, a user must have an account on the Azure Databricks portal on which a cluster is running attached to a database. On the Azure Databricks portal, go to the Compute tab and start your cluster if it is in the stop state.
Step 1: Go to the Data Sources tab and click on Add Source button. A modal will open which will ask us to choose a data source that is to be added. Select Databricks source, provide some source name and Save it.
Step 2: Once the source is saved, we will be redirected to the connection settings detail page of this new Databricks source. First, enable the checkbox of On-premise Web Service and then select the SAWS which is set up and is currently up and running. Now the host information can be authenticated by Access Token or Client Secret.
Note: By Default the SAWS type will be cloud. For more information regarding Cloud SAWS, click here
Scenario 1: Authentication using Access Token
For Authentication using Access Token, enter the correct Server Hostname, Http path, and Personal Access Token and click the Authenticate button.
Scenario 2: Authentication using Client Secret
For Authentication using Client Secret, enter the correct Server Hostname, Http path (Server Hostname and Http path used for authenticating via Access Token will be the same here), Azure Subscription Id, Azure Resource Group, Azure Workspace, Azure Tenant Id, Azure Client Id, and Azure Client Secret. Click the Authenticate button.
Step 3: After successful Authentication, the Database section will appear underneath. Here any database can be selected according to the requirement.
Step 4: After credentials are authenticated and the database is selected, we need to save the source. For that, scroll up to the top. From the top right corner click the “Done” button.
After that click the “Save” button. The source will get saved successfully and after that click on the “Test Connection” button.
Now we can move to the Scan History Page and click the "Scan Now" button. A job will be created and once its status gets Completed, our new Databricks source's scan will be completed successfully. After the scan completion, click on Scan Name and it will open the Scan Summary page of this scan.
On the Scan Summary page, it will show all the insights of the scan i.e how many new tables and columns are fetched in this scan from the database that we selected earlier.
In order to have more insights of the details of tables, click on "Data Dictionary" from the sidebar. A table listing view will appear. Click on the "Recently Refreshed" tab. In this tab, we will see all the tables that we have got in the recent scan. Click on table names to get to know more details of the table from the detail page.