Connecting Databricks to your Streamline project lets you leverage your existing datasets, analytics tables, and data warehouse connections to power automated workflows. This guide walks you through setting up Databricks as a data source and using it to build reusable datasets for your processes.
Note: This article is specific to Databricks. If you’re using a different data source, check out our Data Source Library for platform-specific setup instructions.
Prerequisites
Before getting started, make sure you have:
- Access to the Databricks workspace and catalog containing the data you want to connect.
- Your Databricks host name or IP address, port number, client ID, client secret, and warehouse name.
- Proper workspace permissions to create integrations and query data from your chosen warehouse or SQL endpoint.
If unsure, ask your Databricks admin to confirm that you have the required credentials and access level.
Part 1: Connect to a Databricks Data Source
1. Go to the Integrations Page
From your Streamline dashboard, click Integrations in the left-hand menu.
Click + New Connection to begin adding a new data source.
2. Select Databricks
In the connection setup window, choose Databricks from the list of available data sources.
Give your connection a clear name and description.
Example: “Databricks – Customer Analytics” or “Finance Warehouse”.
Tip: Use descriptive names that make the connection’s purpose clear for other users.
3. Enter Databricks Connection Details
You’ll be prompted to provide the following connection information:
- Host: The name or IP address of the database host machine.
- Port: The port number used for the connection.
- Client ID: The unique client identifier used for authentication.
- Client Secret: The corresponding secret key used to securely connect.
-
Warehouse Name: The name of the Databricks SQL warehouse you want to connect to.
Once entered, click Next to test the connection. Streamline will verify your credentials and confirm successful access to your Databricks environment.
Firewall Configuration
Additionally, ensure the following Elastic IP addresses (EIPs) are permitted through your firewall to allow secure connectivity:
- 54.190.75.99
- 44.242.19.199
- 34.209.186.112
If your organization uses a network allowlist, make sure these IPs are added before testing the connection.
4. Choose Database and Schema
After authentication, select the Database and Schema you’d like to connect from your Databricks environment.
5. Finalize the Connection
Click Connect to complete the integration.
You’ll now see Databricks listed under your active integrations in the Integrations page.
Tip: You can edit this connection at any time by clicking the three dots on the right-hand side of the integration.
Part 2: Create and Manage a Dataset from Databricks
After connecting Databricks, you can organize your tables or queries into structured datasets to power workflows.
1. Access the Datasets Panel
Click Datasets in the left-hand navigation.
Select + Create New Dataset.
2. Define Dataset Basics
- Name your dataset clearly (for example, “Customer Metrics” or “Marketing Attribution”).
- Select your Databricks connection as the data source.
- Choose the primary entity (table or view) you want to use as the core entity.
- Assign display labels to simplify naming conventions (for example, “Customer Accounts” instead of tbl_customer_accounts).
3. Configure Fields and Access
- Review the available fields returned from your Databricks table or query.
- Click Create to save your dataset.
Optional Enhancements
Create a Dataset from the Data Catalog
You can browse Databricks tables from your connected catalog in the Data Catalog view.
Select a table and click New Dataset from Entity to build a dataset directly from that source.
Add Related Entities
If your Databricks tables reference related data (for example, relationships between customers and transactions), you can add them as related entities during dataset creation.
Streamline will auto-detect table relationships where foreign keys are present.
Edit an Existing Dataset
To edit an existing Databricks-based dataset:
- Go to Datasets.
- Find your dataset and click the three dots → Edit.
- Update joins, modify visibility or naming, or add new columns as your schema evolves.
- Click Next → Update to save your changes.
Limitations
The following field or data types may not be supported or may be read-only depending on your Databricks configuration and Streamline’s integration scope.
Not Supported:
- Binary or image-based fields
- Array and Map data types
- Complex nested structures
Read-Only:
- Auto-generated IDs
- Timestamps (created or modified time)
- System-managed metadata fields
Using Databricks Data in a Workflow
Once your Databricks connection and datasets are configured, you can use your data within Streamline workflows to automate actions and surface insights.
1. Search Data
Use the Search Data step to query data directly from your connected Databricks datasets.
This allows you to retrieve specific rows or records based on criteria you define, such as customer IDs, project statuses, or product categories.
2. Delivery Data
Use the Delivery Data step to send or deliver results from your Databricks dataset to another system or document output in your workflow.
Your Databricks connection is now active, and your datasets are ready for use across workflows, reports, and automations in Streamline.
For help with Databricks setup or data formatting, contact your Databricks admin or visit their support documentation.
If you have any Streamline-related questions or need assistance configuring your workflows, please contact our Support team.
Comments
0 comments
Article is closed for comments.