Quering OneLake Delta Lake Tables from DuckDB CLI
Inspired by Mehdi Ouazza’s blog on portable DuckDB catalogs and discussions with my colleague Mim (check out his blog), I decided to explore how DuckDB could serve as a catalog for Fabric OneLake Delta tables.
Wait, OneLake Delta tables? Yes, Fabric compute engines (Spark, Data Warehouse, KQL, Power BI, SQL Database) use OneLake to store Delta Lake tables, which can be accessed, for example, through the OneLake API or other external compute engines / clients.
This blog is divided into two parts:
- Creating demo Delta tables in OneLake using Fabric compute engines
- Querying those Delta tables from DuckDB, including auth setup
Creating Delta Tables in Fabric
Currently, six different Fabric item types can store Delta Lake tables in OneLake: Lakehouse (Spark), Mirrored Database, Warehouse, SQL Database, KQL Database (requires explicit enabling), and Semantic Model (requires explicit enabling).
Where are these Delta tables stored? If you have OneLake Explorer installed or using the OneLake API, you’ll notice that these tables are typically stored in the following path:<workspace>/<item>/Tables/<schema>/<table>
.
In this example, I created some placeholder Delta tables for all those item types, but you can use any tables you already have.
- Lakehouse (schema enabled): a sample
employees
table. - Mirrored database: a sample
territories
table. - Warehouse: a sample
sales
table. - SQL database: a sample
products
table. - KQL database: a sample
locations
table. - Semantic model: a sample
customers
table.
Querying OneLake Delta Tables from DuckDB
Once Delta tables created, you can use DuckDB to query those tables. Let see how this works.
Step 1: Create OneLake Secret
The first step in setting up OneLake authentication is to create a secret in DuckDB. DuckDB supports secure integrations through a simple CREATE SECRET
statement.
Here’s how you create the secret:
-- Create OneLake secret
CREATE OR REPLACE SECRET onelake (
TYPE AZURE,
PROVIDER ACCESS_TOKEN,
ACCESS_TOKEN '<storage_token>'
);
This command creates or replaces a secret named onelake
. Replace <storage_token>
with the OneLake bearer token.
After creating the secret, you can validate its creation with querying the duckdb_secrets()
function. This will return a list of all the secrets in your current DuckDB session, allowing you to confirm that the OneLake secret is stored correctly.
-- Validate secret creation
FROM duckdb_secrets();
Step 2: Use DuckDB Views Pointing to OneLake Tables
Once you’ve created and validated the OneLake secret, the next step is to set up views in DuckDB that point to your OneLake tables. DuckDB allows you to create views that query external data sources, such as OneLake, so you can seamlessly access your cloud data without moving it.
Here’s an example of how you can create a DuckDB view that points to a OneLake table:
-- KQL database
CREATE VIEW onelake_kql_locations AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/d<kqldb_id>/Tables/locations')
;
-- Lakehouse
CREATE VIEW onelake_lakehouse_employees AS SELECT *
FROM delta_scan('abfss://duckdb@onelake.dfs.fabric.microsoft.com/lh_duck.Lakehouse/Tables/dbo/employees')
;
-- Mirrored database
CREATE VIEW onelake_mirrored_sql_territories AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<mirroreddb_id>/Tables/dbo/territories')
;
-- SQL database
CREATE VIEW onelake_sql_products AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<sqldb_id>/Tables/dbo/products')
;
-- Warehouse
CREATE VIEW onelake_warehouse_customers AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<warehouse_id>/Tables/dbo/customers')
;
CREATE VIEW onelake_warehouse_sales AS SELECT *
FROM delta_scan('abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<warehouse_id>/Tables/dbo/sales')
;
Note: The Semantic Model Delta table did not work because it uses column mapping (minReader 2, minWriter 5) by default. This is a known limitation on delta-rs/duckdb.
You can list the view definitions like this:
SELECT sql FROM duckdb_views() where temporary=false;
You can view the tables by:
SHOW TABLES;
Step 3: Run Queries
Once the views are created, you can query those Delta tables using DuckDB. Here’s an example of a query:
Final notes
This was a simple test to query OneLake tables using the DuckDB CLI. The DuckDB file holds metadata, while the actual data resides in OneLake. You can query these tables just like regular tables.
- Performance is not considered here.
- DuckDB considering External Tables 🙏🙏🙏 · duckdb/duckdb · Discussion #14422.
- Used bearer token for simple test. Here an example of SPN authentication.