LAG and NULLS Functions

How can I tell LAG functions to get the last non-null value?

For example, see my table below, where I have several NULL values ​​in columns B and C. I would like to fill nulls with the last non-zero value. I tried to do this using the LAG function, for example:

case when B is null then lag (B) over (order by idx) else B end as B, 

but this does not work when I have two or more zeros in a row (see the NULL value in column C of row 3 - I would like it to be 0.50 as the original).

Any idea how I can achieve this? (he should not use the LAG function, any other ideas are welcome)

A few assumptions:

  • The number of lines is dynamic;
  • The first value will always be non-zero;
  • Once I have NULL, it's still NULL - so I want to populate it with the last value.

thanks

enter image description here

+5
source share
4 answers

if it is zero to the end, then you can make a short segment

 declare @b varchar(20) = (select top 1 b from table where b is not null order by id desc); declare @c varchar(20) = (select top 1 c from table where c is not null order by id desc); select is, isnull(b,@b) as b, insull(c,@c) as c from table; 
+1
source

You can do this using the outer apply statement:

 select t.id, t1.colA, t2.colB, t3.colC from table t outer apply(select top 1 colA from table where id <= t.id and colA is not null order by id desc) t1 outer apply(select top 1 colB from table where id <= t.id and colB is not null order by id desc) t2 outer apply(select top 1 colC from table where id <= t.id and colC is not null order by id desc) t3; 

This will work regardless of the number of zeros or zero islands. You can have values, then zeros, then values ​​again, zeros again. It will work anyway.


If, however, there is an assumption (in your question):

As soon as I have NULL , NULL all to the end - so I want to fill it with the last value.

there is a more effective solution. We only need to find the latter (when ordering with idx values). By changing the above query, removing where id <= t.id from the subqueries:

 select t.id, colA = coalesce(t.colA, t1.colA), colB = coalesce(t.colB, t2.colB), colC = coalesce(t.colC, t3.colC) from table t outer apply (select top 1 colA from table where colA is not null order by id desc) t1 outer apply (select top 1 colB from table where colB is not null order by id desc) t2 outer apply (select top 1 colC from table where colC is not null order by id desc) t3; 
+6
source

You can make changes to your ORDER BY to make NULL be the first in your order, but it can be expensive ...

 lag(B) over (order by CASE WHEN B IS NULL THEN -1 ELSE idx END) 

Or use a helper query to calculate the replacement value once. Perhaps cheaper on large sets, but very awkward.
- relies on all NULLs that go at the end
- LAG does not rely on this

 COALESCE( B, ( SELECT sorted_not_null.B FROM ( SELECT table.B, ROW_NUMBER() OVER (ORDER BY table.idx DESC) AS row_id FROM table WHERE table.B IS NOT NULL ) sorted_not_null WHERE sorted_not_null.row_id = 1 ) ) 

(This should be faster for large datasets than LAG or using OUTER APPLY with correlated subqueries, simply because the value is calculated once. For convenience, you can calculate and save [last_known_value] for each column in variables, then just use COALESCE(A, @last_known_A), COALESCE(B, @last_known_B), etc )

+4
source
 UPDATE table SET B = (@n := COALESCE(B , @n)) WHERE B is null; 
-3
source

Source: https://habr.com/ru/post/1247858/


All Articles