How can I inexpensively determine if a column contains only NULL records?

I have a large table with 500 columns and 100 M rows. Based on a small sample, I believe that only about 50 columns contain any values, and the remaining 450 contain only NULL values. I want to specify columns that do not contain data.

On my current hardware, for each column ( select count(1) from tab where col_n is not null )

It will take about 24 hours.

Is there a less expensive way to determine if a column is completely empty / NULL?

+6
source share
8 answers

How about this:

 SELECT SUM(CASE WHEN column_1 IS NOT NULL THEN 1 ELSE 0) column_1_count, SUM(CASE WHEN column_2 IS NOT NULL THEN 1 ELSE 0) column_2_count, ... FROM table_name 

?

You can easily create this query if you use the INFORMATION_SCHEMA.COLUMNS table.

EDIT:

Another idea:

SELECT MAX (column_1), MAX (column_2), ..... FROM table_name

If the result contains a value, the column is populated. This requires a table scan.

+13
source

Try this option -

DDL:

 IF OBJECT_ID ('dbo.test2') IS NOT NULL DROP TABLE dbo.test2 CREATE TABLE dbo.test2 ( ID BIGINT IDENTITY(1,1) PRIMARY KEY , Name VARCHAR(10) NOT NULL , IsCitizen BIT NULL , Age INT NULL ) INSERT INTO dbo.test2 (Name, IsCitizen, Age) VALUES ('1', 1, NULL), ('2', 0, NULL), ('3', NULL, NULL) 

Request 1:

 DECLARE @TableName SYSNAME , @ObjectID INT , @SQL NVARCHAR(MAX) SELECT @TableName = 'dbo.test2' , @ObjectID = OBJECT_ID(@TableName) SELECT @SQL = 'SELECT' + CHAR(13) + STUFF(( SELECT CHAR(13) + ', [' + c.name + '] = ' + CASE WHEN c.is_nullable = 0 THEN '0' ELSE 'CASE WHEN ' + totalrows + ' = SUM(CASE WHEN [' + c.name + '] IS NULL THEN 1 ELSE 0 END) THEN 1 ELSE 0 END' END FROM sys.columns c WITH (NOWAIT) CROSS JOIN ( SELECT totalrows = CAST(MIN(p.[rows]) AS VARCHAR(50)) FROM sys.partitions p WHERE p.[object_id] = @ObjectID AND p.index_id IN (0, 1) ) r WHERE c.[object_id] = @ObjectID FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, ' ') + CHAR(13) + 'FROM ' + @TableName PRINT @SQL EXEC sys.sp_executesql @SQL 

Output 1:

 SELECT [ID] = 0 , [Name] = 0 , [IsCitizen] = CASE WHEN 3 = SUM(CASE WHEN [IsCitizen] IS NULL THEN 1 ELSE 0 END) THEN 1 ELSE 0 END , [Age] = CASE WHEN 3 = SUM(CASE WHEN [Age] IS NULL THEN 1 ELSE 0 END) THEN 1 ELSE 0 END FROM dbo.test2 

Request 2:

 DECLARE @TableName SYSNAME , @SQL NVARCHAR(MAX) SELECT @TableName = 'dbo.test2' SELECT @SQL = 'SELECT' + CHAR(13) + STUFF(( SELECT CHAR(13) + ', [' + c.name + '] = ' + CASE WHEN c.is_nullable = 0 THEN '0' ELSE 'CASE WHEN '+ 'MAX(CAST([' + c.name + '] AS CHAR(1))) IS NULL THEN 1 ELSE 0 END' END FROM sys.columns c WITH (NOWAIT) WHERE c.[object_id] = OBJECT_ID(@TableName) FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, ' ') + CHAR(13) + 'FROM ' + @TableName PRINT @SQL EXEC sys.sp_executesql @SQL 

Output 2:

 SELECT [ID] = 0 , [Name] = 0 , [IsCitizen] = CASE WHEN MAX(CAST([IsCitizen] AS CHAR(1))) IS NULL THEN 1 ELSE 0 END , [Age] = CASE WHEN MAX(CAST([Age] AS CHAR(1))) IS NULL THEN 1 ELSE 0 END FROM dbo.test2 

Results:

 ID Name IsCitizen Age ----------- ----------- ----------- ----------- 0 0 0 1 
+1
source

Could you check if colux idexing helps to achieve a certain performance?

 CREATE UNIQUE NONCLUSTERED INDEX IndexName ON dbo.TableName(ColumnName) WHERE ColumnName IS NOT NULL; GO 
0
source

SQL Server query to get a list of columns in a table along with data type restrictions, NOT NULL and PRIMARY KEY

Run SQL in the best answer of the above questions and create a new query as shown below.

Select ISNULL(column1,1), ISNULL(column2,1), ISNULL(column3,1) from table

0
source

500 columns ?!
Well, the correct answer to your question is: normalize the table.

Here is what is happening at the moment:

You do not have an index in this column, so SQL Server should do a full check of your extensive table.
SQL Server will certainly read each row completely (this means that each column, even if you are only interested in one).
And since your row most likely exceeds 8kb ... http://msdn.microsoft.com/en-us/library/ms186981%28v=sql.105%29.aspx

Seriously, normalize your table and, if necessary, divide it horizontally (place the columns “theme grouped” inside a separate table to read them only when you need them).

EDIT: You can rewrite your request as follows

 select count(col_n) from tab 

and if you want to get all columns at once (better):

 SELECT COUNT(column_1) column_1_count, COUNT(column_2) column_2_count, ... FROM table_name 
0
source

You do not need to “count” all 100M records. When you simply return from a query using TOP 1, as soon as you click on a column with a nonzero value, save a lot of time by providing the same information.

0
source

If most of the records are not null, perhaps you can mix the proposed approach (for example, check only fields with a null value) as follows:

 if exists (select * from table where field is not null) 

this should speed up the search, because the existing one stops the search as soon as the condition is satisfied, in this example, one non-zero record is enough to determine the status of the field. If the field has an index, it should be almost instantaneous.

Normally adding top 1 to this query is not required because the query optimizer knows that you do not need to retrieve all the relevant records.

0
source

You can use this stored procedure for a trick. You need to specify the name of the table that you want to query, note that if you go to the procedure of the parameter @exec = 1, it will execute a select query

  SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [dbo].[SP_SELECT_NON_NULL_COLUMNS] ( @tablename varchar (100)=null, @exec int =0) AS BEGIN SET NOCOUNT ON IF @tablename IS NULL RAISERROR('CANT EXECUTE THE PROC, TABLE NAME IS MISSING',16 ,1) ELSE BEGIN IF OBJECT_ID('tempdb..#table') IS NOT NULL DROP TABLE #table DECLARE @i VARCHAR (max)='' DECLARE @sentence VARCHAR (max)='' DECLARE @SELECT VARCHAR (max) DECLARE @LocalTableName VARCHAR(50) = '[' +@tablename +']' CREATE TABLE #table (ColumnName VARCHAR (max)) SELECT @i+= ' IF EXISTS ( SELECT TOP 1 '+column_name+' FROM ' +@LocalTableName +' WHERE ' +column_name+ ' '+'IS NOT NULL) INSERT INTO #table VALUES ('''+column_name+''');' FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name=@tablename INSERT INTO #table EXEC (@i) SELECT @sentence = @sentence+' '+columnname+' ,' FROM #table DROP TABLE #table IF @exec=0 BEGIN SELECT 'SELECT '+ LTRIM (left (@sentence,NULLIF(LEN (@sentence)-1,-1)))+ +' FROM ' +@LocalTableName END ELSE BEGIN SELECT @SELECT= 'SELECT '+ LTRIM (left (@sentence,NULLIF(LEN (@sentence)-1,-1)))+ +' FROM ' +@LocalTableName EXEC (@SELECT) END END END 

Use it as follows:

 EXEC [dbo].[SP_SELECT_NON_NULL_COLUMNS] 'YourTableName' , 1 
0
source

Source: https://habr.com/ru/post/946243/


All Articles