Databricks and Fabric — writing to OneLake and ADLS Gen2

Aitor Murguzur
4 min readFeb 23, 2024

--

This blog post explains how to write to OneLake from Azure Databricks as well as how to write to ADLS Gen2 from Fabric Spark.

Note: using the same storage layer across different compute engines (for writes) can result in unintended consequences. Be sure to grasp the implications.

Writing to OneLake from Databricks

There are different options for writing to OneLake from Databricks:

1) Credential passthrough
You can connect to OneLake via Azure Databricks using credential passthrough. This allows you to authenticate automatically to OneLake from Azure Databricks clusters using the identity that you use to log in to Azure Databricks. For instance, if your user identity has access privileges to the lakehouse, you’ll be able read/write to the Files and Tables sections. Refer to Integrate OneLake with Azure Databricks.

workspace_id = "<workspace_id>"
lakehouse_id = "<lakehouse_id>"

# read with creds passthrough
df = spark.read.format("parquet").load(f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Files/data")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/dbx_delta_credspass")

Note: you can specify OneLake paths using either GUIDs or names. If using names, ensure that workspace and lakehouse names do not contain any special characters or white spaces.

abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables | Files/<path>

abfss://<workspace_name>@onelake.dfs.fabric.microsoft.com/<lakehouse_name>.Lakehouse/Tables | Files/<path>

2) Using Service Principal
You can also access data in OneLake using service principal authentication, for example, from an Azure Databricks notebook.

workspace_name = "<workspace_name>"
lakehouse_name = "<lakehouse_name>"
tenant_id = "<tenant_id>"
service_principal_id = "<service_principal_id>"
service_principal_password = "<service_principal_password>"

spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", service_principal_id)
spark.conf.set("fs.azure.account.oauth2.client.secret", service_principal_password)
spark.conf.set("fs.azure.account.oauth2.client.endpoint", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

# read with spn
df = spark.read.format("parquet").load(f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/Files/data")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/Tables/dbx_delta_spn")

3) Mount points with Service Principal
Mount points enable users to attach OneLake storage to the Databricks File System (DBFS). Using dbutils mounts, you can create a local alias under the /mnt directory, and service principals can be used to mount a OneLake path. Once a OneLake path is mounted (see example below), users gain access to read/write within the Files and Tables sections of a Fabric lakehouse.

workspace_id = "<workspace_id>"
lakehouse_id = "<lakehouse_id>"
tenant_id = "<tenant_id>"
service_principal_id = "<service_principal_id>"
service_principal_password = "<service_principal_password>"

configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": service_principal_id,
"fs.azure.account.oauth2.client.secret": service_principal_password,
"fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"
}

mount_point = "/mnt/onelake-fabric"
dbutils.fs.mount(
source = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com",
mount_point = mount_point,
extra_configs = configs
)

# read with mount spn
df = spark.read.format("parquet").load(f"/mnt/onelake-fabric/{lakehouse_id}/Files/data")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"/mnt/onelake-fabric/{lakehouse_id}/Tables/dbx_delta_mount_spn")

Alternatively, you can also create a mount point to OneLake using credential passthrough as follows:

workspace_id = "<workspace_id>"
lakehouse_id = "<lakehouse_id>"

configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

mount_point = '/mnt/onelake-fabric-credspass'
dbutils.fs.mount(
source = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com",
mount_point = mount_point,
extra_configs = configs
)

# read with mount creds pass
df = spark.read.format("parquet").load(f"/mnt/onelake-fabric-credspass/{lakehouse_id}/Files/data")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"/mnt/onelake-fabric-credspass/{lakehouse_id}/Tables/dbx_delta_mount_credspass")

Note: Databricks doesn’t recommend using mount points, service principals or credential passthrough; instead, Unity Catalog is suggested. Although OneLake could potentially serve as metastore-level managed storage for Unity Catalog, that functionality and creating external locations (+volumes) to OneLake are not yet possible.

Writing to ADLS Gen2 from Fabric

There are different options for writing to ADLS Gen2 from Fabric Spark:

Note: shortcuts to ADLS Gen2 allow writing to a storage location path, but were not included in this list.

1) Credential passthrough
If your user identity has ADLS Gen2 access privileges (e.g., Storage Blob Contributor RBAC role), you can use credential passthrough.

storage_account = "<storage_account>"

# read with creds passthrough
df = spark.read.format("parquet").load(f"abfss://default@{storage_account}.dfs.core.windows.net/data/unmanaged/t_unmanag_parquet")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"abfss://default@{storage_account}.dfs.core.windows.net/data/unmanaged/fab_unmanag_delta_credspass")

2) Using Service Principal
If you prefer using a service principal (e.g. due to secret rotation), you can access ADLS Gen2 using a service principal from Fabric.

storage_account = "<storage_account>"
tenant_id = "<tenant_id>"
service_principal_id = "<service_principal_id>"
service_principal_password = "<service_principal_password>"

spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", service_principal_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", service_principal_password)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

# read with spn
df = spark.read.format("parquet").load(f"abfss://default@{storage_account}.dfs.core.windows.net/data/unmanaged/t_unmanag_parquet")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"abfss://default@{storage_account}.dfs.core.windows.net/data/unmanaged/fab_unmanag_delta_spn")

3) Mount points with Access Key and SAS token
Similar to Databricks’ dbutils, Fabric Spark also offers mssparkutils mount options. You can mount ADLS Gen2 using either account key and SAS token.

storage_account = "<storage_account>"
sas_token= "<sas_token>"
mount_point = "/mnt/adls-gen2-sas"

mssparkutils.fs.mount(
f"abfss://default@{storage_account}.dfs.core.windows.net",
mount_point,
{"sasToken":sas_token, "fileCacheTimeout": 120, "timeout": 120}
)

# read with mount sas
path = mssparkutils.fs.getMountPath("/mnt/adls-gen2-sas")
df = spark.read.format("parquet").load(f"file://{path}/data/unmanaged/t_unmanag_parquet")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"file://{path}/data/unmanaged/fab_unmanag_delta_mount_sas")

Note: currently, Cloud connections cannot be used directly in Fabric Spark but you can use shortcuts. Use managed private endpoints to connect to ADLS Gen2 deployed using private endpoints or trusted workspace access with workspace identity to ADLS Gen2 behind a firewall.

--

--

Aitor Murguzur
Aitor Murguzur

Written by Aitor Murguzur

All things data. Principal PM @Microsoft. PhD in Comp Sci. All views are my own. https://www.linkedin.com/in/murggu/

Responses (1)