View Generation in Serverless SQL PoolsDecember 16, 2021
One of the objects that can be created within a Serverless SQL Pools database is a View. A View, along with an External Table, allows us to cast structure over external data in a Data Lake, Cosmos DB, and the Dataverse. A View also allows us to abstract the SQL syntax necessary to connect to an external data source as there can be many options to consider when connecting to differing file types.
In this blog post we’ll go through one option to create a set of views based on user-defined metadata. This enables us to define external storage locations and file formats and then trigger a SQL process to generate the required Views without needing to hard-code the create view syntax.
There are other options available which includes using services like Pipelines to generate metadata, we’ll explorer these options in future posts.
The code for the process below is available on GitHub in the serverlesssqlpooltools repository. Please note this is a work-in-progress and improvements/enhancements will be added over time.
The process uses dynamic SQL to generate the syntax necessary to create the views. A list of metadata including the external location, file format, folder partitioning, options, and view name are used as input to this process. A system stored procedure sp_describe_first_result_set is used to generate the list of fields and data types which is then used to generate the CREATE VIEW syntax.
The following image shows an example result generated when executing the sp_describe_first_result_set with a SELECT statement. In this example, it has identified 11 columns with their relevant data types and properties.
Metadata File and View
As we are not using any other service to generate the Views, E.G Pipelines/Data Factory, we have a metadata file that contains the relevant information required to build the Views.
The DataLakeMetaData.csv file contains 8 columns which define each view and the metadata associated to each view. The creation process will select this data and iterate over the list, creating a view for each line item.
|FileFormat||Parquet or CSV|
|HeaderRow||For CSV. True or False/Blank|
|FieldTerminator||For CSV. E.G pipe delimiter ||
|ViewName||Name of the view to be created|
|ExternalDataSourceDataLake||Name of the appropriate external data source|
|Location||Folder location including partitioning/wildcards|
|FolderHierarchyDepth||Number of folders under the root location. This is used to generate filepath() columns|
|MaxVarcharValue||Override source data defaults and specify a character length|
Once the file has been created with the relevant metadata and placed in an Azure storage account, we can create a View over this file.
CREATE VIEW dbo.vwViewMetadata AS SELECT ROW_NUMBER() OVER (ORDER BY ViewName) AS ViewCreationOrder, FileFormat, HeaderRow, [FieldTerminator], ViewName, ExternalDataSourceDataLake, [Location], FolderHierarchyDepth, MaxVarcharValue FROM OPENROWSET ( BULK 'metadata/DataLakeMetaData.csv', DATA_SOURCE = 'ExternalDataSourceDataLake', FORMAT = 'CSV', PARSER_VERSION = '2.0', HEADER_ROW = TRUE, FIELDTERMINATOR ='|' ) rwt
If we query the view we can see the relevant metadata contained within the file.
There are 2 Stored Procedures that we create in a Serverless SQL Pools database. The spGenerateViews stored procedure contains the logic necessary to identify the source data schema and create the views, the spTriggerGenerateViews stored procedure contains the logic necessary to iterate over the views that need to be created and trigger the spGenerateViews stored procedure.
The SQL code for the spGenerateViews stored procedure is in GitHub here.
This stored procedure contains the code necessary to generate the schema information from the source location and data types specified in the metadata file. The process then generates and executes the CREATE VIEW statement to create the View. The code also creates filepath() columns to expose source folder partitions, the FolderHierarchyDepth parameter in the metadata file controls how many filepath columns are created.
The SQL code for the spTriggerGenerateViews stored procedure is in GitHub here.
This stored procedure triggers the spGenerateViews stored procedure by reading from the vwViewMetadata view and looping through each row, passing in the relevant parameters to enable the identification of schema information and generating the CREATE VIEW statement.
Run the Process
To trigger the process, we can run an EXEC dbo. spTriggerGenerateViews and pass in either 1 or 0 if we wish to drop views before attempted recreation.
EXEC dbo.spTriggerGenerateViews 1
In the blog post we have looked at one way of generating Views within a Serverless SQL Pools database by using a metadata file containing source file formats, location, folder hierarchy depth and intended view name.