Updating each row in a table

I have a table with some columns of values, a formula and a column of results.

|rownum|value1|value2|value3|formula |result| |------|------|------|------|--------------------|------| |1 |11 |30 |8 |value1/value2*value3| | |2 |43 |0 |93 |value1-value2+value3| | 

I want to populate the result column with the result the formula.

I am currently doing this with this query:

 DECLARE @v_sql NVARCHAR(MAX) SET @v_Sql = CAST ((SELECT ' UPDATE [table] ' + ' SET [result] = ' + table.[formula] + ' WHERE [rownum] = ' + CAST(table.[rownum] as nvarchar(255)) + ';' FROM [table] FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') AS NVARCHAR (MAX)) EXEC (@v_Sql) 

The problem is that it takes a very long time. # Rows in the table will be 5 - 10 million.

Is there any way to speed this up? Alternative approaches to this problem?

Thank you very much!

+5
source share
6 answers

Thanks for all the answers and ideas. As a result, the problem was solved by storing the formula in the dimension instead of the fact table. This generates 1 update statement for each row in the dimension and applies it to all relevant fact lines with a where clause, as opposed to 1 update statement for each fact line. Processing time was reduced s> 1.5 hours to less than a second.

0
source

Assuming operator order rules and covering just your simple formula example:

 UPDATE [table] SET [result] = case replace(replace(replace([formula],'value1', ''), 'Value2', ''), 'Value3', '') when '++' then [value1] + [value2] + [Value3] when '+-' then [value1] + [value2] - [Value3] when '+*' then [value1] + [value2] * [Value3] when '+/' then [value1] + [value2] / [Value3] when '-+' then [value1] - [value2] + [Value3] when '--' then [value1] - [value2] - [Value3] when '-*' then [value1] - [value2] * [Value3] when '-/' then [value1] - [value2] / [Value3] when '*+' then [value1] * [value2] + [Value3] when '*-' then [value1] * [value2] - [Value3] when '**' then [value1] * [value2] * [Value3] when '*/' then [value1] * [value2] / [Value3] when '/+' then [value1] / [value2] + [Value3] when '/-' then [value1] / [value2] - [Value3] when '/*' then [value1] / [value2] * [Value3] when '//' then [value1] / [value2] / [Value3] end from [Table] 
+3
source

Two simple things that come to mind:

  • Make sure there is an index in the rownum column if you are updating each row separately.

  • If there are only a few different formulas, you can update all rows with the same formula in one UPDATE instead of updating each row individually. In this case, the index in the formula column will help.

+1
source

Is this a faster mass update by formula type? An index is also needed for [formula]:

 DECLARE @v_sql NVARCHAR(MAX) SET @v_Sql = CAST ((SELECT ' UPDATE [table] ' + ' SET [result] = ' + [table].[formula] + ' WHERE [formula] = ''' + [table].[formula] + ''';' FROM [table] group by [table].[formula] FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') AS NVARCHAR (MAX)) exec(@v_Sql) 
+1
source

Go with the trigger parameter, but for now, updating in pieces will have less impact.

TOP(5000) will only update 5000 rows each time WHERE [result] is null OR [result]=''

GO 20000 will execute this query 20,000 times (10 million lines). It will continue to execute until 0 records for the UPDATE statement are returned.

 DECLARE @v_sql NVARCHAR(MAX) SET @v_Sql = CAST ((SELECT ' UPDATE TOP (5000) [table] ' + ' SET [result] = ' + [table].[formula] + ' WHERE [formula] = ''' + [table].[formula] + ''' AND ([result] is null OR [result]='');' FROM [table] group by [table].[formula] FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') AS NVARCHAR (MAX)) exec(@v_Sql) GO 20000 

After that create a trigger

0
source

I just created a table with 5 million rows. With a table structure like:

 rn t1 t2 t3 formula 1 80 23 93 t1 / t2 * t3 2 80 87 30 t1 / t2 * t3 3 92 83 63 t1 / t2 * t3 4 68 19 36 t1 / t2 * t3 5 65 63 10 t1 / t2 * t3 

If you are sure that all your formulas are valid and you will not have division by zero, for example, or data type overflows, in this case you can create your own eval () function on the SQL server.

I created my own function for 3 values ​​in a formula with signs such as: '+', '-', '*', '/'.

Function Code:

 use db_test; go alter function dbo.eval(@a varchar(max)) returns float as begin set @a = replace(@a, ' ', ''); declare @pos1 int = PATINDEX('%[+/*-]%', @a); declare @t1 float = cast(substring(@a, 1, @pos1 - 1) as float); declare @sign1 char(1) = substring(@a, @pos1, 1); set @a = substring(@a, @pos1 + 1, len(@a) - @pos1); declare @pos2 int = PATINDEX('%[+/*-]%', @a); declare @t2 float = cast(substring(@a, 1, @pos2 - 1) as float); declare @sign2 char(1) = substring(@a, @pos2, 1); set @a = substring(@a, @pos2 + 1, len(@a) - @pos2); declare @t3 float = cast(@a as float); set @t1 = ( case @sign1 when '+' then @t1 + @t2 when '-' then @t1 - @t2 when '*' then @t1 * @t2 when '/' then @t1 / @t2 end ); set @t1 = ( case @sign2 when '+' then @t1 + @t3 when '-' then @t1 - @t3 when '*' then @t1 * @t3 when '/' then @t1 / @t3 end ); return @t1; end; 

And he is working on the following data:

 select dbo.eval('7.6*11.3/4.5') as eval, 7.6*11.3/4.5 as sqlServerCalc; eval sqlServerCalc 19,0844444444444 19.084444 

After that, you can replace the values ​​in the formula with the values ​​of the columns and calculate them:

 with cte as ( select rn, t1, t2, t3, formula, REPLACE(REPLACE(REPLACE(formula, 't1', cast(t1 as varchar(max))), 't2', cast(t2 as varchar(max))), 't3', cast(t3 as varchar(max))) as calc from db_test.dbo.loop ) select rn, t1, t2, t3, formula, db_test.dbo.eval(calc) as result into db_test.dbo.loop2 from cte; 

Time is right for me, it takes 3 minutes on my Sql 2016 server and gives good results:

 select top 5 * from db_test.dbo.loop2; rn t1 t2 t3 formula result 1 80 23 93 t1 / t2 * t3 323,478260869565 2 80 87 30 t1 / t2 * t3 27,5862068965517 3 92 83 63 t1 / t2 * t3 69,8313253012048 4 68 19 36 t1 / t2 * t3 128,842105263158 5 65 63 10 t1 / t2 * t3 10,3174603174603 

If you have a list of all the operations that are applicable in the formula, you can write a general function for several variables. But if there is something more complicated in the formula, then you should use the CLR.

0
source

Source: https://habr.com/ru/post/1271324/


All Articles