In a recent Synapse Analytics Serverless SQL Pools QnA session, the Serverless SQL Pools team stated that using External Tables rather than Views was preferred if you wanted to restrict user permissions and only provide access to the SQL object rather than the underlying dataset. I wanted to look into this further, so in this blog post we’ll walkthrough user permissions and how we provide access to query data lake files via External Tables and Views.
In this blog post we’ll create an External Table and a View in Serverless SQL Pools then test user permissions
What we’ll see is that the syntax used to create External Tables is different to Views, and therefore has an impact on security. The permissions required to select from External Tables can be more restricted than Views, we’ll see this in the examples.
The scenario we’ll use in this blog post is a requirement to allow a group of Data Analysts to query data in an Azure Data Lake Gen2 storage account using Serverless SQL Pools. The requirement is to allow the Data Analysts to query the data without any access to the underlying data lake storage account itself. We’ll do this by creating an External Table and View and testing the required permissions for both these objects.
The Permissions between Serverless SQL Pools and the Data Lake will be based on the Synapse workspace Managed Identity whilst the user permissions will be assigned to the database objects.
- Create and configure a new Serverless SQL Pools database
- Create credential and data source to connect to an Azure Data Lake Gen2 storage account
- Create a new login and user based on an Active Directory Group
- Create an External Table and assign permissions to user
- Create a View and assign permissions to user
- Test permissions and show differences
Create Serverless SQL Pools Objects
We’ll now walkthrough the solution and test the user permissions by creating a Serverless SQL Pools database, external table and view, then assigning user permissions.
Create and Configure Serverless SQL Pools Database
In the following SQL script we’ll create a new Serverless SQL Pools database and create a connection to an Azure Data Lake Gen2 account using Managed Identity. This means we do not pass any authentication information to the Data Lake other than the Managed Identity.
Please note that the Managed Identity of the Synapse workspace has been added as a Storage Blob Data Reader to the data lake storage account.
--create new database CREATE DATABASE SQLWebTelemetryAnalysis; --switch to new database USE SQLWebTelemetryAnalysis; GO --Create a schema to hold our objects CREATE SCHEMA LDW authorization dbo; GO --enable support for UTF8 ALTER DATABASE SQLWebTelemetryAnalysis COLLATE Latin1_General_100_BIN2_UTF8; --encryption to allow authentication by Managed Identity CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<strong_password>'; --Create a credential using Managed Identity CREATE DATABASE SCOPED CREDENTIAL DataLakeManagedIdentity WITH IDENTITY='Managed Identity' --Create a data source to point to data lake CREATE EXTERNAL DATA SOURCE ExternalDataSourceDataLakeMI WITH ( LOCATION = 'https://<storage_account>.dfs.core.windows.net/<container>', CREDENTIAL = DataLakeManagedIdentity ); --create file format for use with external table CREATE EXTERNAL FILE FORMAT SynapseParquetFormat WITH ( FORMAT_TYPE = PARQUET );
Create External Table and View
We’ll now create both the External Table and View, both these objects point to the same folder location in the source data lake account. Note that the External Table does not have any date columns defined, hold this thought for a next blog post on External Tables and partition pruning/source folder filtering. It isn’t relevant to this security post.
--Create External Table CREATE EXTERNAL TABLE LDW.WebTelemetryParquet ( UserID INT, EventType VARCHAR(50), ProductID SMALLINT, URL VARCHAR(50), Device VARCHAR(10), SessionViewSeconds INT ) WITH ( LOCATION = 'raw/webtelemetryparquet/**', DATA_SOURCE = ExternalDataSourceDataLakeMI, FILE_FORMAT = SynapseParquetFormat ); --Create view CREATE VIEW LDW.vwWebTelemetryParquet AS SELECT UserID, EventType, ProductID, [URL], Device, SessionViewSeconds, MessageKey, CAST(fct.filepath(1) AS SMALLINT) AS FolderEventYear, CAST(fct.filepath(2) AS TINYINT) AS FolderEventMonth, CAST(fct.filepath(3) AS DATE) AS FolderEventDate FROM OPENROWSET ( BULK 'raw/webtelemetryparquet/EventYear=*/EventMonth=*/EventDate=*/*.parquet', DATA_SOURCE = 'ExternalDataSourceDataLakeMI', FORMAT = 'Parquet' ) WITH ( UserID INT, EventType VARCHAR(50), ProductID SMALLINT, URL VARCHAR(50), Device VARCHAR(10), SessionViewSeconds INT, MessageKey VARCHAR(50) ) AS fct;
We’ll now add a login based on an Active Directory Group containing a user account (it’s recommended practice to configure permissions based on AD Groups). An Active Directory security group called ADGSynapseWebTelemetry has been created and a domain user called andycsynapse has been added to this group.
Create Login and User
We’ll create the login in the master database of the Serverless SQL Pools service, then switch to the database created earlier and create a user based on the login.
--create login in master database from active directory group USE MASTER; CREATE LOGIN [ADGSynapseWebTelemetry] FROM EXTERNAL PROVIDER; --switch to database and create user from login USE SQLWebTelemetryAnalysis; CREATE USER [ADGSynapseWebTelemetry] FROM LOGIN [ADGSynapseWebTelemetry];
Assign Permissions to User
Please note that we’ll be granting permissions to the user directly, we can also create a database role to manage permissions and assign users to this role (we will cover this in a later blog).
--we need to grant references on the MI credential GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[DataLakeManagedIdentity] TO [ADGSynapseWebTelemetry]; --grant select on the external table and view GRANT SELECT ON LDW.WebTelemetryParquet TO [ADGSynapseWebTelemetry]; GRANT SELECT ON LDW.vwWebTelemetryParquet TO [ADGSynapseWebTelemetry];
Test User Permissions
Let’s now test the user permissions and we’ll start to see the differences between External Tables and Views, and the issues we have.
Select from External Table and View
If we now run a simple SELECT TOP 5 on both the External Table and View by logging in as the andycsynapse user. We’re using SQL Server Management Studio (SSMS) to connect to the Serverless SQL Pools endpoint and login.
--select from the external table SELECT TOP 5 UserID, EventType, ProductID, URL, Device, SessionViewSeconds FROM LDW.WebTelemetryParquet --select from the view SELECT TOP 5 UserID, EventType, ProductID, URL, Device, SessionViewSeconds FROM LDW.vwWebTelemetryParquet
As you can see, we’re able to see data from both the External Table and the View. However…let’s look at some unwanted access.
Access to Azure Data Lake Gen2 Account
We’ve been able to select data from both the External Table and View successfully, but we have an issue. If we run a SELECT query with the OPENROWSET command on another folder in the data lake and bypass the External Table and View, we see that the query executes successfully. To do this the user would need to know the DATA_SOURCE name that’s configured to access the data lake. This isn’t out of the realms of possibility and we really don’t want to be using a “let’s hope they don’t know the name…” security strategy…
This is because OPENROWSET usage is enabled by default…we need to deny access to this by running the following command to deny usage of the BULK command to the user. Alternatively we can DENY access to the public role, which will ensure all users have this permission denied.
--deny use of the OPENROWSET BULK command USE master; DENY ADMINISTER BULK OPERATIONS TO [ADGSynapseWebTelemetry]; --alternatively deny bulk operations on the public role USE master; DENY ADMINISTER BULK OPERATIONS TO [public]; --we can then grant access to BULK/OPENROWSET on a database-by-database basis USE SQLWebTelemetryAnalysis; GRANT ADMINISTER DATABASE BULK OPERATIONS TO [public];
If we now run the SELECT TOP 5 command, we can see that the DENY command has removed permissions to use the OPENROWSET BULK command.
This now stops the user account being able to run adhoc OPENROWSET queries on the data lake, but it’s now also stopped the View from working as the SQL definition in the view uses OPENROWSET.
In this blog post we’ve looked at how user permissions differ between External Tables and Views in a Serverless SQL Pools database and what we can do to restrict users accessing the underlying data lake data by denying the use of the BULK statement in the OPENROWSET command.