User Permissions in Serverless SQL Pools: External Tables vs Views

In a recent Synapse Analytics Serverless SQL Pools QnA session, the Serverless SQL Pools team stated that using External Tables rather than Views was preferred if you wanted to restrict user permissions and only provide access to the SQL object rather than the underlying dataset. I wanted to look into this further, so in this blog post we’ll walkthrough user permissions and how we provide access to query data lake files via External Tables and Views.

In this blog post we’ll create an External Table and a View in Serverless SQL Pools then test user permissions

What we’ll see is that the syntax used to create External Tables is different to Views, and therefore has an impact on security. The permissions required to select from External Tables can be more restricted than Views, we’ll see this in the examples.

Scenario

The scenario we’ll use in this blog post is a requirement to allow a group of Data Analysts to query data in an Azure Data Lake Gen2 storage account using Serverless SQL Pools. The requirement is to allow the Data Analysts to query the data without any access to the underlying data lake storage account itself. We’ll do this by creating an External Table and View and testing the required permissions for both these objects.

The Permissions between Serverless SQL Pools and the Data Lake will be based on the Synapse workspace Managed Identity whilst the user permissions will be assigned to the database objects.


Steps

  • Create and configure a new Serverless SQL Pools database
  • Create credential and data source to connect to an Azure Data Lake Gen2 storage account
  • Create a new login and user based on an Active Directory Group
  • Create an External Table and assign permissions to user
  • Create a View and assign permissions to user
  • Test permissions and show differences

Create Serverless SQL Pools Objects

We’ll now walkthrough the solution and test the user permissions by creating a Serverless SQL Pools database, external table and view, then assigning user permissions.

Create and Configure Serverless SQL Pools Database

In the following SQL script we’ll create a new Serverless SQL Pools database and create a connection to an Azure Data Lake Gen2 account using Managed Identity. This means we do not pass any authentication information to the Data Lake other than the Managed Identity.

Please note that the Managed Identity of the Synapse workspace has been added as a Storage Blob Data Reader to the data lake storage account.

--create new database
CREATE DATABASE SQLWebTelemetryAnalysis;

--switch to new database
USE SQLWebTelemetryAnalysis;
GO

--Create a schema to hold our objects
CREATE SCHEMA LDW authorization dbo;
GO

--enable support for UTF8
ALTER DATABASE SQLWebTelemetryAnalysis COLLATE Latin1_General_100_BIN2_UTF8;

--encryption to allow authentication by Managed Identity
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<strong_password>';

--Create a credential using Managed Identity
CREATE DATABASE SCOPED CREDENTIAL DataLakeManagedIdentity
WITH IDENTITY='Managed Identity'

--Create a data source to point to data lake
CREATE EXTERNAL DATA SOURCE ExternalDataSourceDataLakeMI
	WITH (
		LOCATION   = 'https://<storage_account>.dfs.core.windows.net/<container>',
        CREDENTIAL = DataLakeManagedIdentity
	    );

--create file format for use with external table
CREATE EXTERNAL FILE FORMAT SynapseParquetFormat
WITH ( 
        FORMAT_TYPE = PARQUET
     );

Create External Table and View

We’ll now create both the External Table and View, both these objects point to the same folder location in the source data lake account. Note that the External Table does not have any date columns defined, hold this thought for a next blog post on External Tables and partition pruning/source folder filtering. It isn’t relevant to this security post.

--Create External Table
CREATE EXTERNAL TABLE LDW.WebTelemetryParquet
(
    
    UserID INT,
    EventType VARCHAR(50),
    ProductID SMALLINT,
    URL VARCHAR(50),
    Device VARCHAR(10),
    SessionViewSeconds INT
)
WITH (
    LOCATION = 'raw/webtelemetryparquet/**',
    DATA_SOURCE = ExternalDataSourceDataLakeMI,
    FILE_FORMAT = SynapseParquetFormat
);

--Create view
CREATE VIEW LDW.vwWebTelemetryParquet
AS
SELECT 
    UserID,
    EventType,
    ProductID,
    [URL],
    Device,
    SessionViewSeconds,
    MessageKey,
    CAST(fct.filepath(1) AS SMALLINT) AS FolderEventYear,   
    CAST(fct.filepath(2) AS TINYINT) AS FolderEventMonth, 
    CAST(fct.filepath(3) AS DATE) AS FolderEventDate
FROM
 OPENROWSET
(
    BULK 'raw/webtelemetryparquet/EventYear=*/EventMonth=*/EventDate=*/*.parquet',
    DATA_SOURCE = 'ExternalDataSourceDataLakeMI',
    FORMAT = 'Parquet'
) 
WITH
(
    UserID INT,
    EventType VARCHAR(50),
    ProductID SMALLINT,
    URL VARCHAR(50),
    Device VARCHAR(10),
    SessionViewSeconds INT,
    MessageKey VARCHAR(50)
) AS fct;

Setup Permissions

We’ll now add a login based on an Active Directory Group containing a user account (it’s recommended practice to configure permissions based on AD Groups). An Active Directory security group called ADGSynapseWebTelemetry has been created and a domain user called andycsynapse has been added to this group.

Create Login and User

We’ll create the login in the master database of the Serverless SQL Pools service, then switch to the database created earlier and create a user based on the login.

--create login in master database from active directory group
USE MASTER;
CREATE LOGIN [ADGSynapseWebTelemetry] FROM EXTERNAL PROVIDER;

--switch to database and create user from login
USE SQLWebTelemetryAnalysis;
CREATE USER [ADGSynapseWebTelemetry] FROM LOGIN [ADGSynapseWebTelemetry];

Assign Permissions to User

Please note that we’ll be granting permissions to the user directly, we can also create a database role to manage permissions and assign users to this role (we will cover this in a later blog).

--we need to grant references on the MI credential
GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[DataLakeManagedIdentity] TO [ADGSynapseWebTelemetry];

--grant select on the external table and view
GRANT SELECT ON LDW.WebTelemetryParquet TO [ADGSynapseWebTelemetry];
GRANT SELECT ON LDW.vwWebTelemetryParquet TO [ADGSynapseWebTelemetry];

Test User Permissions

Let’s now test the user permissions and we’ll start to see the differences between External Tables and Views, and the issues we have.

Select from External Table and View

If we now run a simple SELECT TOP 5 on both the External Table and View by logging in as the andycsynapse user. We’re using SQL Server Management Studio (SSMS) to connect to the Serverless SQL Pools endpoint and login.

--select from the external table
SELECT TOP 5 
    UserID,
    EventType,
    ProductID,
    URL,
    Device,
    SessionViewSeconds
FROM LDW.WebTelemetryParquet

--select from the view
SELECT TOP 5 
    UserID,
    EventType,
    ProductID,
    URL,
    Device,
    SessionViewSeconds
FROM LDW.vwWebTelemetryParquet

As you can see, we’re able to see data from both the External Table and the View. However…let’s look at some unwanted access.

Access to Azure Data Lake Gen2 Account

We’ve been able to select data from both the External Table and View successfully, but we have an issue. If we run a SELECT query with the OPENROWSET command on another folder in the data lake and bypass the External Table and View, we see that the query executes successfully. To do this the user would need to know the DATA_SOURCE name that’s configured to access the data lake. This isn’t out of the realms of possibility and we really don’t want to be using a “let’s hope they don’t know the name…” security strategy…

This is because OPENROWSET usage is enabled by default…we need to deny access to this by running the following command to deny usage of the BULK command to the user. Alternatively we can DENY access to the public role, which will ensure all users have this permission denied.

--deny use of the OPENROWSET BULK command
USE master;
DENY ADMINISTER BULK OPERATIONS TO [ADGSynapseWebTelemetry];

--alternatively deny bulk operations on the public role
USE master;
DENY ADMINISTER BULK OPERATIONS TO [public];

--we can then grant access to BULK/OPENROWSET on a database-by-database basis
USE SQLWebTelemetryAnalysis;
GRANT ADMINISTER DATABASE BULK OPERATIONS TO [public];

If we now run the SELECT TOP 5 command, we can see that the DENY command has removed permissions to use the OPENROWSET BULK command.

This now stops the user account being able to run adhoc OPENROWSET queries on the data lake, but it’s now also stopped the View from working as the SQL definition in the view uses OPENROWSET.


Conclusion

In this blog post we’ve looked at how user permissions differ between External Tables and Views in a Serverless SQL Pools database and what we can do to restrict users accessing the underlying data lake data by denying the use of the BULK statement in the OPENROWSET command.