Using a Shared Hive Metastore Across Azure Synapse, HDInsight, and Databricks
[Kudos to Umesh Pawar and Lee Hilton-Smith for their contributions]
When building a data lake or lakehouse on Azure, users can keep their data in ADLS Gen2 using delta tables and read that data from different compute engines. This allows for switching from one platform to another, making data interoperable across different compute engines.
While delta tables can be stored in ADLS Gen2, enabling unified data access from different computes, metadata can also be stored in a separate metastore. However, sharing that metastore can be tricky when switching platforms, often requiring metadata migration or rebuilding the metastore from external tables.
In this article, I will demonstrate how an external Hive metastore (HMS) can be shared across three different Spark-based compute engines on Azure — Azure Synapse, HDInsight, and Databricks. Let’s get started!
Architecture

- Compute layer: HDInsight 5.0 (Spark 3.1.3), Databricks (Runtime 9.1 LTS— Spark 3.1.2), and Azure Synapse Spark (Spark 3.1.3).
- Storage layer: ADLS Gen2 as a data store, Azure SQL Database as an external Hive metastore (3.1.0).
Versions & Compatibility
The following table summarizes the supported versions by Azure Synapse, HDInsight and Databricks. Considering all three support Spark 3.1.x (this is the latest Spark version supported by HDInsight 5.0), we will go with that option here — Spark 3.1.x, HMS 3.1.0 and Delta 1.0.x.
+---------------+-------+---------------+-----------+------------+
| Spark Version | HMS | Synapse Spark | HDInsight | Databricks |
+---------------+-------+---------------+-----------+------------+
| 3.1 | 3.1.x | Yes | Yes | Yes |
| 3.3 | 3.1.x | Yes | No | Yes |
+---------------+-------+---------------+-----------+------------+
> HDInsight already plan to release 5.1 version, including Spark 3.3.
Prerequisites
Follow the steps below to create the required infrastructure on Azure. If you already have the required infrastructure (with the right versioning!), you can skip the prerequisites listed below.
- Step 1: Create a HDInsight 5.0 Spark cluster (comes with Spark 3.1.3) with ADLS Gen2 and select a custom Hive metastore during cluster creation — Azure SQL Database. This will be used as a shared HMS.
- Step 2: Create a Synapse workspace with ADLS Gen2 (public endpoints) and Spark pool (Spark 3.1). Use the same ADLS Gen2 created in Step 1 during the Synapse workspace creation.
- Step 3: Create a Premium Databricks workspace (public endpoints) and an all-purpose compute cluster (with 9.1 LTS runtime), enabling credential passthrough for user-level data access.
- Step 4: Access the Azure SQL Database created in Step 1 and validate HMS schema was created automatically. Run a sample query to see all databases in the Hive catalog — sys, default and information_schema.
SELECT * FROM dbo.DBS;
Once the infrastructure is ready, you need to configure Azure Synapse, HDInsight and Databricks to connect to external HMS.
Connecting HDInsight to External HMS
Since the external HMS was already linked during HDInsight cluster creation, you only need set up metastore.catalog.default
property to hive
. HDInsight sets that property to spark
by default.
To change that property, log in to Ambari UI and change that property on the spark3-hive-site-override.

After settings changes and restart is completed for spark components, try listing HMS databases by running the query below in Jupyter notebook. You will see the same three databases — sys, default and information_schema.
%%sql
SHOW DATABASES
Connecting Synapse to External HMS
Connecting Synapse Spark pool to an external HMS (Azure SQL Database) involves two steps:
- Set up a linked service to HMS (Azure SQL Database): Create a Linked Service within the Synapse Studio to connect to the HMS (Azure SQL Database). Ensure that connection works.
- Configure a Spark pool to use the external HMS: You can do this by configuring at Spark session level in a Spark notebook (see example below) or at Spark pool level using Spark configurations. See more on Synapse Spark configurations here.
%%configure -f
{
"conf":{
"spark.sql.hive.metastore.version":"3.1",
"spark.hadoop.hive.synapse.externalmetastore.linkedservice.name":"<your linked service name>",
"spark.sql.hive.metastore.jars":"/opt/hive-metastore/lib-3.1/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-client/*"
}
}
After all these settings, try listing catalog objects by running below query in a Spark notebook to check the connectivity to the external HMS.
%%sql
SHOW DATABASES
Connecting Databricks to External HMS
Connecting Databricks to an external HMS is also pretty simple.
- Download the metastore jars and point to them: Synapse provides the jars by default, but to make external hms work with Databricks, you need to download the required dependencies and make them available. You can use this notebook.
- Set up external metastore configuration: You can connect to an existing Hive metastore by setting required Spark configurations using the UI or init script. You could also configure the same properties at the Spark session level in a Notebook. The code below shows the required cluster configuration.
spark.sql.hive.metastore.version 3.1
spark.hadoop.javax.jdo.option.ConnectionUserName <username>
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://<sql_server_name>.database.windows.net:1433;database=<sql_db_name>
spark.hadoop.javax.jdo.option.ConnectionPassword <password>
spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver
spark.sql.hive.metastore.jars /dbfs/metastore_jars/hive-v3_1/*
spark.databricks.delta.preview.enabled true
After the cluster is created (make sure credential passthrough is checked), try listing catalog objects by running the query below in a Notebook.
%%sql
SHOW DATABASES
Create Sample Table
HDInsight, Azure Synapse and Databricks are now connected to the external HMS. Let’s create a table and query it using different compute engines!
- Import Notebook in Synapse: Jump to Synapse Studio and import this notebook. Attach it to a Spark pool and run the first two cells.
- Create a Database: Create a
sharedhmsdb
database.
%%sql
CREATE DATABASE IF NOT EXISTS sharedhmsdb
COMMENT "Test database for shared HMS"
LOCATION "abfss://<container>@<adls_gen2_name>.dfs.core.windows.net/hive/warehouse";
- Create a Table: Create a dummy table from HDInsight csv samples.
spark.read.format("csv") \
.load("abfss://<container>@<adls_gen2_name>.dfs.core.windows.net/HdiSamples/HdiSamples/SensorSampleData/building/*csv") \
.write.format("delta") \
.mode("overwrite") \
.saveAsTable("sharedhmsdb.building_summary")
- Validate table creation: Query the table just created in
sharedhmsdb
.
%%sql
SELECT COUNT(*) FROM sharedhmsdb.building_summary
Swapping Compute Engines
The external HMS is working on Synapse (only accessible from Spark pool). Let’s move now to HDInsight and Databricks and see if the same database and table exist.
- Querying from HDInsight: Open Jupyter and import this notebook. Run all cells and you will see that
sharedhmsdb
andbuilding_summary
table are accessible from HDInsight.

- Querying from Databricks: Import this notebook and run all cells. You will see that the same
sharedhmsdb
andbuilding_summary
table are also accessible (using credential passthrough to access ADLS Gen2).

For any suggestions or questions, feel free to reach out :)
References:
- Use external Hive Metastore for Azure Synapse Spark Pool — Azure Synapse Analytics | Microsoft Learn
- Use external metadata stores — Azure HDInsight | Microsoft Learn
- External Apache Hive metastore — Azure Databricks | Microsoft Learn
- Sharing External Hive-Metastore Across Jupyter on Kubernetes, Databricks and Azure Synapse | by Ziang Jia | Medium
- Shared External Hive Metastore with Azure Databricks and Synapse Spark Pools | by Mikael Hermansson | Towards Data Science