Quering OneLake Delta Lake Tables from DuckDB CLI

Aitor Murguzur
4 min readDec 2, 2024

--

Inspired by Mehdi Ouazza’s blog on portable DuckDB catalogs and discussions with my colleague Mim (check out his blog), I decided to explore how DuckDB could serve as a catalog for Fabric OneLake Delta tables.

Wait, OneLake Delta tables? Yes, Fabric compute engines (Spark, Data Warehouse, KQL, Power BI, SQL Database) use OneLake to store Delta Lake tables, which can be accessed, for example, through the OneLake API or other external compute engines / clients.

This blog is divided into two parts:

  1. Creating demo Delta tables in OneLake using Fabric compute engines
  2. Querying those Delta tables from DuckDB, including auth setup

Creating Delta Tables in Fabric

Currently, six different Fabric item types can store Delta Lake tables in OneLake: Lakehouse (Spark), Mirrored Database, Warehouse, SQL Database, KQL Database (requires explicit enabling), and Semantic Model (requires explicit enabling).

Where are these Delta tables stored? If you have OneLake Explorer installed or using the OneLake API, you’ll notice that these tables are typically stored in the following path:
<workspace>/<item>/Tables/<schema>/<table>.

In this example, I created some placeholder Delta tables for all those item types, but you can use any tables you already have.

  • Lakehouse (schema enabled): a sample employees table.
  • Mirrored database: a sample territories table.
  • Warehouse: a sample sales table.
  • SQL database: a sample products table.
  • KQL database: a sample locations table.
  • Semantic model: a sample customers table.

Querying OneLake Delta Tables from DuckDB

Once Delta tables created, you can use DuckDB to query those tables. Let see how this works.

Step 1: Create OneLake Secret

The first step in setting up OneLake authentication is to create a secret in DuckDB. DuckDB supports secure integrations through a simple CREATE SECRET statement.

Here’s how you create the secret:

-- Create OneLake secret
CREATE OR REPLACE SECRET onelake (
TYPE AZURE,
PROVIDER ACCESS_TOKEN,
ACCESS_TOKEN '<storage_token>'
);

This command creates or replaces a secret named onelake. Replace <storage_token> with the OneLake bearer token.

After creating the secret, you can validate its creation with querying the duckdb_secrets() function. This will return a list of all the secrets in your current DuckDB session, allowing you to confirm that the OneLake secret is stored correctly.

-- Validate secret creation
FROM duckdb_secrets();

Step 2: Use DuckDB Views Pointing to OneLake Tables

Once you’ve created and validated the OneLake secret, the next step is to set up views in DuckDB that point to your OneLake tables. DuckDB allows you to create views that query external data sources, such as OneLake, so you can seamlessly access your cloud data without moving it.

Here’s an example of how you can create a DuckDB view that points to a OneLake table:

-- KQL database
CREATE VIEW onelake_kql_locations AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/d<kqldb_id>/Tables/locations')
;

-- Lakehouse
CREATE VIEW onelake_lakehouse_employees AS SELECT *
FROM delta_scan('abfss://duckdb@onelake.dfs.fabric.microsoft.com/lh_duck.Lakehouse/Tables/dbo/employees')
;

-- Mirrored database
CREATE VIEW onelake_mirrored_sql_territories AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<mirroreddb_id>/Tables/dbo/territories')
;

-- SQL database
CREATE VIEW onelake_sql_products AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<sqldb_id>/Tables/dbo/products')
;

-- Warehouse
CREATE VIEW onelake_warehouse_customers AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<warehouse_id>/Tables/dbo/customers')
;

CREATE VIEW onelake_warehouse_sales AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<warehouse_id>/Tables/dbo/sales')
;

Note: The Semantic Model Delta table did not work because it uses column mapping (minReader 2, minWriter 5) by default. This is a known limitation on delta-rs/duckdb.

You can list the view definitions like this:

SELECT sql FROM duckdb_views() where temporary=false;

You can view the tables by:

SHOW TABLES;

Step 3: Run Queries

Once the views are created, you can query those Delta tables using DuckDB. Here’s an example of a query:

Final notes

This was a simple test to query OneLake tables using the DuckDB CLI. The DuckDB file holds metadata, while the actual data resides in OneLake. You can query these tables just like regular tables.

--

--

Aitor Murguzur
Aitor Murguzur

Written by Aitor Murguzur

All things data. Principal PM @Microsoft. PhD in Comp Sci. All views are my own. https://www.linkedin.com/in/murggu/

No responses yet