Lake Databases And The Importance of Data Types

1

I encountered an issue with Azure Synapse Analytics Lake Databases and I’d like to share the problem and the solution. This really should be filed under “the solution was obvious! face palm!” but it took me a while to realise what the issue was and how to fix it.

There is also a video on my Datahai YouTube channel about this here.

Lake Databases Overview

I’ll provide some context to this blog with a brief overview of Lake Databases. These databases are a specific type of database within Synapse Analytics that can be used by both Spark and Serverless SQL Pools. Lake Databases can be created using the new visual Database Designer and have some great features such as creating relationships between external tables to visualise the data model, and also a simple method of setting up partitioned tables.

There is a getting started blog post I wrote here which walks through creating a new Lake Database.

The Problem

I initially created a Lake Database using the Database Designer and created several external tables over Parquet data stored in a Data Lake Gen2 account. The following shows the Lake Database in the database designer, this was then published, and the tables are visible in the Lake Database area on the left hand side. It’s worth noting that when clicking “preview data” when creating the external tables, the data from the data lake was visible.

So far so good. I then opened a new SQL query and started to write SELECT statements for the 3 tables using the inbuilt Serverless SQL Pool service. DimCustomer and FactWebTelemetry worked fine, I could see a resultset when querying. However, I received an error when querying DimProduct.

Although the external tables had been published successfully and were visible in the Lake database area on the left menu, I couldn’t query the table. I then ran a SQL statement to view system information about the external tables registered in the lake database and saw that the DimProduct external table did not exist. Could this mean that the table itself had not been registered successfully?

The Solution

After a lot of head scratching, Azure support, and more head scratching I noticed that one of the columns in DimProduct in the database designer did not have a data type assigned!! The column ThumbNailPhoto which is a binary data type in the source Parquet file did not show a data type assignment in the Columns list for the DimProduct table.

I then clicked the data type dropdown and selected binary from the options. I then published the changes and queried the sys.external_tables system table again, and could see the table. I could then query the table successfully. It does take a few seconds from when publishing an external table to actually seeing it in the sys.external_tables view.

Conclusion

Well it seemed such an obvious solution, check to ensure the data types are assigned correctly in the database designer before publishing. I would have expected an error when publishing to alert me of the data type issue, one for the Synapse Product Team. I hope this helps anyone else who may have encountered this issue too.

1 thought on “Lake Databases And The Importance of Data Types

Leave a Reply

Your email address will not be published. Required fields are marked *