Fill in the missing values with the first nonzero next value in Redshift

Question

Fill in the missing values with the first nonzero next value in Redshift

I'm on Redshift. Given the following data:

CREATE TABLE test (
id INT,
val1 INT,
val2 INT
);

INSERT INTO test VALUES
(1, 0,  NULL),
(2, 0,  NULL),
(3, 13, 1),
(4, 0,  NULL),
(5, 0,  NULL),
(6, 0,  NULL),
(7, 0,  NULL),
(8, 21, 2),
(9, 0,  NULL),
(10, 143,3)
;

I would like to fill in the missing val2 values with the first next non-zero value, for example.

   INSERT INTO results VALUES
    (1, 0,  1),
    (2, 0,  1),
    (3, 13, 1),
    (4, 0,  2),
    (5, 0,  2),
    (6, 0,  2),
    (7, 0,  2),
    (8, 21, 2),
    (9, 0,  3),
    (10,143,3)
    ;

What is the best way to do this in Redshift / Postgres 8.0.2?

+4

sql amazon-redshift

Roberto Jul 01 '14 at 23:01

source share

4 answers

Roberto · Answer 1 · 2014-07-01T23:18:13+0000

One of the ways that I was able to solve it (taking advantage of the fact that non-zero values of val2 are sequential) below. Performance is terrible, so any better solutions would be more than enjoyable.

SELECT
  t1.id
  , t1.val1
  , COALESCE(t1.val2, MIN(t2.val2)) as val2
FROM test t2 LEFT JOIN test t1 ON t2.id >= t1.id
WHERE t2.val2 IS NOT NULL
AND t1.val1 IS NOT NULL
GROUP BY 1, 2, t1.val2
ORDER BY t1.id
;

SQLFiddle link

mdahlman · Answer 2 · 2014-07-02T06:20:28+0000

, . , val2 , , , , . COALESCE . ... , , val2. .

SELECT
  t1.id
  , t1.val1
  , min(t2.val2)
FROM test t1
LEFT OUTER JOIN test t2 on (t1.id <= t2.id and t2.val2 is not null)
GROUP BY t1.id, t1.val1
ORDER BY t1.id
;

Erwin brandstetter · Answer 3 · 2014-07-02T13:08:00+0000

val2. , NULL ( ).

SELECT t1.id, t1.val1, COALESCE(t1.val2, t2.val2) as val2
FROM   test t1
LEFT   JOIN test t2
          ON  t2.id > t1.id
          AND t1.val2 IS NULL
          AND t2.val2 IS NOT NULL
          AND NOT EXISTS (
             SELECT 1
             FROM   test t3
             WHERE  t3.id > t1.id
             AND    t3.id < t2.id
             AND    t3.val2 IS NOT NULL
             )
ORDER  BY t1.id;

It also removes the error in your case: the WHERE clause removes trailing lines with val2 IS NULL. You would need to translate this condition into a JOIN clause. Details:
A query with LEFT JOIN does not return rows for quantity 0

Not sure if it will be faster than CROSS JOIN/ min()in Redshift.

rom_j · Answer 4 · 2017-05-16T13:15:21+0000

You can avoid JOINs and play with window functions with the following:

SELECT id, val1, val2, 
       COALESCE(val2, LEAD(val2, dist::int) OVER (ORDER BY id)) AS notNullVal2
FROM (
  SELECT id, val1, val2, c,
          ROW_NUMBER() OVER (PARTITION BY c ORDER BY id DESC) AS dist
  FROM (
    SELECT id, val1, val2,
      COUNT(val2) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS c
    FROM test
  )
)
ORDER BY id

Fill in the missing values ​​with the first nonzero next value in Redshift

More articles:

Fill in the missing values with the first nonzero next value in Redshift