Databricks Lineage Connector
The Databricks Lineage Connector allows Bigeye to track query history and lineage from your Databricks instance. This connector requires appropriate permissions and configuration to access query history, metastore information, and system lineage tables.
Prerequisites
Before configuring the connector, ensure you have a Databricks service principal or user account with the required permissions.
Note: See the Databricks setup guide for detailed instructions on creating users or service principals and granting permissions.
Connector Parameters
The Databricks Lineage Connector requires the following parameters in addition to standard database connection parameters:
Parameter | Description |
---|---|
databricks.hostname | Hostname of your Databricks workspace. |
databricks.jersey.api.timeout | Timeout for Databricks REST API calls. |
db.catalog.name.include | Optional list of catalogs to include. |
db.catalog.name.exclude | Optional list of catalogs to exclude. |
db.syscatalogs.include | Optional list of system catalogs to include. |
databricks.jdbc.url | JDBC URL for connecting to your Databricks instance. |
databricks.jdbc.username | Databricks username or service principal for the connector. |
databricks.jdbc.password | Password or personal access token for the user/service principal. |
databricks.jdbc.class | JDBC driver class name for Databricks. |
databricks.source.table.lineage.view | System table/view for table-level lineage. |
databricks.source.column.lineage.view | System table/view for column-level lineage. |
databricks.oauth.clientid | OAuth client ID for authentication (if applicable). |
databricks.oauth.clientsecret.password | OAuth client secret/password for authentication. |
databricks.oauth.tenantid | OIDC tenant ID for authentication (if applicable). |
Getting Started
To configure the connector properly, follow these steps:
1. Get Metastore ID
You can get the Databricks metastore ID either via the API or the UI:
- API: Use the Databricks REST API to list metastores and obtain the ID.
- UI: In the Databricks workspace, navigate to Data → Metastores, and copy the Metastore ID.
2. Enable System Tables
The following system tables are required for lineage computation:
system.access.table_lineage
system.access.column_lineage
Enable access to these tables via the Databricks API. For example, use the Unity Catalog API to grant your Bigeye user access.
3. Ensure Read Access
The user or service principal created for Bigeye must have read access to:
- The system tables listed above.
- All catalogs for which lineage will be computed.
This ensures that the connector can extract query lineage and metadata correctly.
Example Configuration
# Databricks connection
databricks.hostname=<workspace-hostname>
databricks.jersey.api.timeout=30000
db.catalog.name.include=SALES,MARKETING
db.catalog.name.exclude=TEST
db.syscatalogs.include=SYSTEM
databricks.jdbc.url=jdbc:spark://<databricks-workspace-url>:443/default
databricks.jdbc.username=bigeye_user
databricks.jdbc.password=<personal-access-token>
databricks.jdbc.class=com.simba.spark.jdbc.Driver
databricks.source.table.lineage.view=system.access.table_lineage
databricks.source.column.lineage.view=system.access.column_lineage
databricks.oauth.clientid=<oauth-client-id>
databricks.oauth.clientsecret.password=<oauth-client-secret>
databricks.oauth.tenantid=<oidc-tenant-id>
Updated about 6 hours ago