Databricks as a Source

Azure Databricks is optimized for Azure and tightly integrated with Azure Data Lake Storage, Azure Data Factory, Azure Synapse Analytics, Power BI, and other Azure services to store all your data on a simple, open lakehouse and unify all your analytics and AI workloads.

Enabling Databricks Source in DvSum 

Step 1: Open the Dvsum application, select the Administration tab and click on, the Manage sources option,  Click on 'Add Source' and Select Data Bricks source'. Following error messages will be displayed if the DataBricks source is not enabled for users 'Owner' and 'Admin'.

Note: Only the owner is authorized to add a source. 

mceclip0.png

Owner:

mceclip1.png

Admin:

mceclip2.png

Step 1.2  Owner will click the 'Manage account' link and gets redirected to this page from where the source can be enabled. On other hand, Admin will request the owner to get the source enabled for the account. Click the 'Saws' tab, select the saws, and click the 'Enable source' button.

mceclip3.png
Step 1.3  From the list of available sources, select DataBricks and click the Upgrade button as shown below

mceclip4.png
Step 1.4 On returning back in the SAWS tab, it will take some time to process and after that, Databricks Icon will appear in the enabled sources column which means that the source is successfully enabled as shown below

mceclip5.png

Scenario 1: SAWS Error

On upgrading, if there's any issue with SAWS, an error message will be displayed "Please check if your SAWS is working correctly". 

Scenario 2:  Pending State

On upgrading, if any job(s) is running, it will go to a 'pending' state.

Adding Databricks source

Step 2.1  Open the Dvsum application, select 'Administration' and click on the 'Manage sources' option,  Click on 'Add Source' and Select Databricks source as shown below

mceclip6.png

Step 2.2  In the Basic information section, provide the source name, and description, and select web service on which Databricks source is made enabled, other fields are optional as shown below; 

mceclip7.png

Step 2.3 In order to get the Server hostname, HTTP path, and personal access token go to Databricks Dashboard.

mceclip8.png

Step 2.3.1 Click on Compute >> Cluster name >> Cluster configuration >> Advance Options >> JDBC/ODBC. And You’ll get “Server Hostname” and “HTTP path”.

mceclip9.png

Step 2.3.2 To get Personal Access Token, click on “User settings”. Add Name for Token and set Days limit for token and click on Generate.

Note: Make sure to copy the token now. You won't be able to see it again.

mceclip10.png

mceclip11.png

Step 2.3.3 Add server hostname, HTTP path and personal access token in Host information and click on Authenticate button.

mceclip12.png

Step 2.4  Database Name will be shown select the database from the dropdown and click on the save button.

mceclip13.png

Step 2.5 Edit the source and verify the “Test connection”

mceclip14.png

Step 2.6  Now Databricks is added as a source and the user will be able to Catalog it, profile it and Execute Rules.

mceclip15.png

Integrating Rules into the batch workflow

1: Executing the rule via API

2: Executing the rule API via ADF

Click here for more details on Rules integration into the batch workflow.

 

 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk