Microsoft Fabric and Databricks Unity Catalog — unraveling the integration scenarios
In a prior blog post, I explored options to write from Databricks to OneLake and from Fabric Spark to ADLS Gen2 (without Unity Catalog-enabled clusters). This blog post aims to uncover various scenarios related to Fabric + Databricks Unity Catalog integration (with Unity Catalog-enabled clusters). For lakehouse scenarios, see this post from Piethein.
- Can I sync UC tables into OneLake catalog? How?
- Can I write from UC-enabled clusters to OneLake?
- Can I integrate UC with OneLake? Can I run federated queries againts SQL endpoint / Fabric Data Warehouse?
Note: This article reflects my personal experiences and viewpoints, not the official stance of Microsoft or Databricks. Additionally, while this blog post outlines potential scenarios, it does not necessarily reflect the Fabric roadmap or intentions. Not all options mentioned may become operational in the future.
[Last update: 10/30/2024]
Databricks Unity Catalog and Microsoft Fabric
The integration scenarios can essentially be viewed based on the entry point, Unity Catalog and Fabric:
- Accessing Unity Catalog from Fabric (Fabric → Unity Catalog): This functionality could enable users to seamlessly access Unity Catalog catalog, schemas, and tables from within Fabric.
- Utilizing Fabric from Unity Catalog (DBX/Unity Catalog → Fabric): This feature could offer users the ability to access and use OneLake directly from within Unity Catalog and run federated queries over a SQL endpoint or Fabric Data Warehouse.
Let’s explore these scenarios further.
Fabric → Unity Catalog
Using Unity Catalog from Fabric
If you’re in Fabric, here are some of the options for accessing Unity Catalog tables from Fabric. You can also r/w directly from Fabric Spark to ADLS Gen2.
Current options
Currently, there are two options for syncing UC tables into Fabric:
- 1) Sync UC tables to OneLake using shortcuts (notebook-based solution): Users can create shortcuts to Unity Catalog (UC) tables either manually (tedious) or semi-automatically via a notebook. The semi-automatic method allows integration of UC Delta tables — both external and managed (with minor code adjustments) — by creating shortcuts (read/write). See instructions on executing the utility notebook.
# configuration
dbx_workspace = "<databricks_workspace_url>"
dbx_token = "<pat_token>"
dbx_uc_catalog = "catalog1"
dbx_uc_schemas = '["schema1", "schema2"]'
fab_workspace_id = "<workspace_id>"
fab_lakehouse_id = "<lakehouse_id>"
fab_shortcut_connection_id = "<connection_id>"
fab_consider_dbx_uc_table_changes = True
# sync UC tables to lakehouse
sc.addPyFile('https://raw.githubusercontent.com/microsoft/fabric-samples/main/docs-samples/onelake/unity-catalog/util.py')
from util import *
databricks_config = {
'dbx_workspace': dbx_workspace,
'dbx_token': dbx_token,
'dbx_uc_catalog': dbx_uc_catalog,
'dbx_uc_schemas': json.loads(dbx_uc_schemas)
}
fabric_config = {
'workspace_id': fab_workspace_id,
'lakehouse_id': fab_lakehouse_id,
'shortcut_connection_id': fab_shortcut_connection_id,
"consider_dbx_uc_table_changes": fab_consider_dbx_uc_table_changes
}
sync_dbx_uc_tables_to_onelake(databricks_config, fabric_config)
- 2) Sync UC tables to OneLake using Mirroring: Mirroring offers an automated, read-only sync of UC external and managed Delta tables to Fabric, using shortcuts (read-only) as an underlying mechanism. This method is ideal for scenarios where ongoing synchronization is needed without direct table manipulation. See instructions on using mirroring for Azure Databricks UC.
Note: Options such as Delta sharing, JDBC/ODBC to Databricks, Fabric data pipeline Databricks activity, and others are not mentioned here.
Databricks/Unity Catalog → Fabric
Using Fabric and OneLake from Databricks/Unity Catalog
Unity Catalog offers different ways to connect and leverage Cloud object storage connections (e.g. ADLS Gen2) as well as connect to external data systems to run federated queries (e.g. Azure Synapse). You can also r/w directly from Databricks to OneLake.
Current options
Currently, there is one way to connect Unity Catalog to Fabric:
- 2) UC federated queries with SQL Endpoint / Fabric Data warehouse: Users can establish read-only access to data in a SQL Endpoint or Fabric Data Warehouse using a UC foreign catalog (federated lakehouse). While Azure Synapse authentication currently relies on username/password, the SQL Server connector also supports SPN authentication, enabling secure connections to Fabric. See instructions for connection setup.
Note: Creating an external table using the OneLake abfss
path or mount path will now result in an exception in UC. Currently, you cannot register an external table in UC with OneLake as the underlying storage. This may lead to potential future scenarios.
Potential future options
Similar to ADLS Gen2, additional options may become available in the future to enable the approach in option 1.
- Use OneLake as default managed storage: Databricks started to roll out automatic enablement of Unity Catalog, i.e. automatically-provisioned Unity Catalog metastore with Databricks-managed storage (e.g. ADLS Gen2). However, user can also create user-managed metastore-level storage while creating Unity Catalog metastore pointing to OneLake in this case. Note: this is not possible yet.
Error: Invalid format: ADLS Gen 2 path
- OneLake as external location: External locations are used to define managed storage locations for catalogs and schemas, and to define locations for external tables and external volumes. For instance, if users are using external tables in Spark, OneLake could be leveraged as an external location. Note: this is not possible yet.
- OneLake for Volumes: Volumes represent a logical volume of storage in a cloud object storage location, adding governance over non tabular datasets. External and managed volumes could exist using OneLake, e.g., as for ADLS Gen2. Note: this is not possible yet.
Note: Other options like ODBC to SQL endpoint from Databricks, Partner Connect (Power BI + Databricks), and streaming options are not mentioned here.
Wait, what about access policies?
It’s still open to explore how Unity Catalog access policies could align with OneLake RBAC and OneSecurity, as well as whether it would be possible to carry over security and access policies from Unity Catalog to Fabric, and vice versa.
References:
- Non-UC scenarios: Databricks and Fabric — writing to OneLake and ADLS Gen2 | by Aitor Murguzur | Medium
- Lakehouse scenarios: Integrating Azure Databricks and Microsoft Fabric | by Piethein Strengholt | Jun, 2024 | Medium
- Integrate Databricks Unity Catalog with OneLake — Microsoft Fabric | Microsoft Learn