Aggregating a fixed-size JSONB array in PostgreSQL

I am trying to aggregate in a JSONB field in a PostgreSQL database. This is probably easier to explain with an example, so if you create and populate a table called analysis with two columns ( id and analysis ) as follows: -

 create table analysis ( id serial primary key, analysis jsonb ); insert into analysis (id, analysis) values (1, '{"category" : "news", "results" : [1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, null, null]}'), (2, '{"category" : "news", "results" : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, null, 26]}'), (3, '{"category" : "news", "results" : [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]}'), (4, '{"category" : "sport", "results" : [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]}'), (5, '{"category" : "sport", "results" : [71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]}'), (6, '{"category" : "weather", "results" : [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106]}'); 

As you can see, the analysis JSONB field always contains 2 category and results attributes. The result attribute will always contain an array of a fixed length of size 16. I used various functions like jsonb_array_elements , but I am trying to do the following: -

  • Group by analysis → 'category'
  • The average value for each element of the array

When I want, this is an operator that returns 3 rows grouped by category (i.e. news , sport and weather ) and 16 fixed-length arrays containing average values. To complicate matters even further if there is null in the array, we should ignore them (i.e. we do not just sum and average over the number of rows). The result should look something like this: -

  category | analysis_average -----------+-------------------------------------------------------------------------------------------------------------- "news" | [14.33, 15.33, 16.33, 17.33, 18.33, 19.33, 20.33, 21.33, 22.33, 23.33, 24.33, 25.33, 26.33, 27.33, 45, 36] "sport" | [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76] "weather" | [91, 92, 93, 94, 95, 96, 97, 98, 99, 00, 101, 102, 103, 104, 105, 106] 

NOTE. Note 45 and 36 in the last 2 itmes arrays in line 1, which illustrates ignoring nulls s.

I was thinking of creating a view that exploded an array of 16 columns, i.e.

 create view analysis_view as select a.*, (a.analysis->'results'->>0)::int as result0, (a.analysis->'results'->>1)::int as result1 /* ... etc for all 16 array entries .. */ from analysis a; 

This seems extremely inappropriate to me and eliminates the advantages of using an array in the first place, but it can probably crack something with it using this approach.

Any pointers or tips will be most appreciated!

Performance is also important here, so the higher the performance, the better!

+5
source share
3 answers

This will work for any array length.

 select category, array_agg(average order by subscript) as average from ( select a.analysis->>'category' category, subscript, avg(v)::numeric(5,2) as average from analysis a, lateral unnest( array(select jsonb_array_elements_text(analysis->'results')::int) ) with ordinality s(v,subscript) group by 1, 2 ) s group by category ; category | average ----------+---------------------------------------------------------------------------------------------------------- news | {14.33,15.33,16.33,17.33,18.33,19.33,20.33,21.33,22.33,23.33,24.33,25.33,26.33,27.33,45.00,36.00} sport | {61.00,62.00,63.00,64.00,65.00,66.00,67.00,68.00,69.00,70.00,71.00,72.00,73.00,74.00,75.00,76.00} weather | {91.00,92.00,93.00,94.00,95.00,96.00,97.00,98.00,99.00,100.00,101.00,102.00,103.00,104.00,105.00,106.00} 

table functions - with routine

side

+3
source

Since the array always has the same length, you can use generate_series instead of entering the index of each element of the array yourself. You are CROSS JOIN with this generated series so that the index applies to each category, and you can get each element at position s from the array. Then it just aggregates the data using GROUP BY.

Then the query will look like this:

 SELECT category, array_agg(val ORDER BY s) analysis_average FROM ( SELECT analysis->'category' category, s, AVG((analysis->'results'->>s)::numeric) val FROM analysis CROSS JOIN generate_series(0, 15) s GROUP BY category,s ) q GROUP BY category 

15 in this case is the last index of the array (16-1).

0
source

This can be done in a more traditional way, for example

 select (t.analysis->'category')::varchar, array_math_avg(array(select jsonb_array_elements_text(t.analysis->'results')::int))::numeric(9,2)[] from analysis t group by 1 order by 1; 

but we need to do some preparation:

 create type t_array_math_agg as( c int[], a numeric[] ); create or replace function array_math_sum_f(in t_array_math_agg, in numeric[]) returns t_array_math_agg as $$ declare r t_array_math_agg; i int; begin if $2 is null then return $1; end if; r := $1; for i in array_lower($2,1)..array_upper($2,1) loop if coalesce(ra[i],$2[i]) is null then ra[i] := null; else ra[i] := coalesce(ra[i],0) + coalesce($2[i],0); rc[i] := coalesce(rc[i],0) + 1; end if; end loop; return r; end; $$ immutable language plpgsql; create or replace function array_math_avg_final(in t_array_math_agg) returns numeric[] as $$ declare r numeric[]; i int; begin if array_lower($1.a, 1) is null then return null; end if; for i in array_lower($1.a,1)..array_upper($1.a,1) loop r[i] := $1.a[i] / $1.c[i]; end loop; return r; end; $$ immutable language plpgsql; create aggregate array_math_avg(numeric[]) ( sfunc=array_math_sum_f, finalfunc=array_math_avg_final, stype=t_array_math_agg, initcond='({},{})' ); 
0
source

Source: https://habr.com/ru/post/1244587/


All Articles