Select column name dynamically SQL Azure Synapse

If you are developing data pipelines that use SQL queries, you might already design pipelines in such a way that you can configure things without touching the pipeline for every change. In this article, let’s discuss a way to dynamically select column names from a table in case of column additions to that table.

Example : Let’s take you are inserting data to a table using INSERT INTO clause.

INSERT INTO <oldtable> SELECT * FROM <newtable>

The above SQL query will insert the data to oldtable using newtable data.

If you have an identity (auto increment) column, you have to select all the column excluding the identity column because we can’t insert data into identity(auto increment) column It will throw error saying that you are trying to insert data to identity column and to set identity off. You can’t really set identity off because you will lose the order.

INSERT INTO <oldtable> SELECT col1,col2,col3 FROM <newtable>

After excluding the identity column, the query looks like the above example.

If you are working on a table that rarely changes the schema, and you were notified when there is change in schema, and you are okay to lengthy process of approval steps. Then you will be happy with the above query.

If you are lazy and the source team changes the schema as they like, you don’t have any option but come with dynamic solution.

If you are working on SQL Server, there are many examples online with help of FOR XML PATH. But in synapse for xml path is not supported. So let’s work on preparing a dynamic query to select the required column in Synapse.

So let’s start with the inner query, which is the main query of the whole process. To get all the column names in a table, we usually query INFORMATION_SCHEMA.COLUMNS table for any of the tables present in the Database.

SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS   where Table_Name = ‘<tablename>’ AND TABLE_SCHEMA = '<schemaname>' 

The above query will help you get all the queries present in the table, but we need columns excluding the identity column. But we don’t have any information on whether column is identity or not from INFORMATION_SCHEMA.COLUMNS table. For table, we need to query a separate table SYS.IDENTITY_COLUMNS.

SELECT NAME FROM SYS.IDENTITY_COLUMNS WHERE OBJECT_NAME(OBJECT_ID) = '<tablename>' AND OBJECT_SCHEMA_NAME(object_id) = '<schemaname>'

The above query will give you the identity column present in that respective table. If we combine both the queries to dynamically get column names of a table.

SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS   where Table_Name = '<tablename>' AND TABLE_SCHEMA = '<schemaname>' and column_name not in 
(SELECT NAME FROM SYS.IDENTITY_COLUMNS WHERE OBJECT_NAME(OBJECT_ID) = '<tablename>' AND OBJECT_SCHEMA_NAME(object_id) = '<schemaname>')

Now the result set will contain records equivalent to the no of column names. But we want to create a comma separated string just like we write the select statement. We can achieve that using STRING_AGG function. We are one step away from having the query to select column name dynamically.

(SELECT STRING_AGG(COLUMN_NAME, ',') FROM (
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS   where Table_Name = '<tablename>' AND TABLE_SCHEMA = '<schemaname>' and column_name not in 
(SELECT  NAME FROM SYS.IDENTITY_COLUMNS WHERE OBJECT_NAME(OBJECT_ID) = '<tablename>' AND OBJECT_SCHEMA_NAME(object_id) = '<schemaname>')) A)

Now in the final part we have to combine with insert into statement. That is a little tricky, we need to convert the above result set into string along with INSERT INTO. That can be achieved by converting the whole query to a string and execute as below.

DECLARE @query NVARCHAR(MAX);

SET @query=N' INSERT INTO <oldtable> select '+(SELECT STRING_AGG(COLUMN_NAME, ',') FROM (
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS   where Table_Name = '<tablename>' AND TABLE_SCHEMA = '<schemaname>' and column_name not in 
(SELECT NAME FROM SYS.IDENTITY_COLUMNS WHERE OBJECT_NAME(OBJECT_ID) = '<tablename>' AND OBJECT_SCHEMA_NAME(object_id) = '<schemaname>')) A)+' from <newtable>';

EXEC Sp_EXECUTESQL @query;

You can tweak the above query as per your scenario.

Feel free to comment below for questions and clarifications.

© 2023 – 2022, Datalieve . All rights reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *

You cannot copy content of this page