Databricks Lineage Connector

The Databricks Lineage Connector allows Bigeye to track query history and lineage from your Databricks instance. This connector requires appropriate permissions and configuration to access query history, metastore information, and system lineage tables.


Prerequisites

Before configuring the connector, ensure you have a Databricks service principal or user account with the required permissions.

Note: See the Databricks setup guide for detailed instructions on creating users or service principals and granting permissions.


Connector Parameters

The Databricks Lineage Connector requires the following parameters in addition to standard database connection parameters:

ParameterDescription
databricks.hostnameHostname of your Databricks workspace.
databricks.jersey.api.timeoutTimeout for Databricks REST API calls.
db.catalog.name.includeOptional list of catalogs to include.
db.catalog.name.excludeOptional list of catalogs to exclude.
db.syscatalogs.includeOptional list of system catalogs to include.
databricks.jdbc.urlJDBC URL for connecting to your Databricks instance.
databricks.jdbc.usernameDatabricks username or service principal for the connector.
databricks.jdbc.passwordPassword or personal access token for the user/service principal.
databricks.jdbc.classJDBC driver class name for Databricks.
databricks.source.table.lineage.viewSystem table/view for table-level lineage.
databricks.source.column.lineage.viewSystem table/view for column-level lineage.
databricks.oauth.clientidOAuth client ID for authentication (if applicable).
databricks.oauth.clientsecret.passwordOAuth client secret/password for authentication.
databricks.oauth.tenantidOIDC tenant ID for authentication (if applicable).

Getting Started

To configure the connector properly, follow these steps:

1. Get Metastore ID

You can get the Databricks metastore ID either via the API or the UI:

  • API: Use the Databricks REST API to list metastores and obtain the ID.
  • UI: In the Databricks workspace, navigate to Data → Metastores, and copy the Metastore ID.

2. Enable System Tables

The following system tables are required for lineage computation:

  • system.access.table_lineage
  • system.access.column_lineage

Enable access to these tables via the Databricks API. For example, use the Unity Catalog API to grant your Bigeye user access.

3. Ensure Read Access

The user or service principal created for Bigeye must have read access to:

  • The system tables listed above.
  • All catalogs for which lineage will be computed.

This ensures that the connector can extract query lineage and metadata correctly.


Example Configuration

# Databricks connection
databricks.hostname=<workspace-hostname>
databricks.jersey.api.timeout=30000
db.catalog.name.include=SALES,MARKETING
db.catalog.name.exclude=TEST
db.syscatalogs.include=SYSTEM
databricks.jdbc.url=jdbc:spark://<databricks-workspace-url>:443/default
databricks.jdbc.username=bigeye_user
databricks.jdbc.password=<personal-access-token>
databricks.jdbc.class=com.simba.spark.jdbc.Driver
databricks.source.table.lineage.view=system.access.table_lineage
databricks.source.column.lineage.view=system.access.column_lineage
databricks.oauth.clientid=<oauth-client-id>
databricks.oauth.clientsecret.password=<oauth-client-secret>
databricks.oauth.tenantid=<oidc-tenant-id>