I am trying to aggregate in a JSONB field in a PostgreSQL database. This is probably easier to explain with an example, so if you create and populate a table called analysis
with two columns ( id
and analysis
) as follows: -
create table analysis ( id serial primary key, analysis jsonb ); insert into analysis (id, analysis) values (1, '{"category" : "news", "results" : [1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, null, null]}'), (2, '{"category" : "news", "results" : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, null, 26]}'), (3, '{"category" : "news", "results" : [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]}'), (4, '{"category" : "sport", "results" : [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]}'), (5, '{"category" : "sport", "results" : [71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]}'), (6, '{"category" : "weather", "results" : [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106]}');
As you can see, the analysis
JSONB field always contains 2 category
and results
attributes. The result attribute will always contain an array of a fixed length of size 16. I used various functions like jsonb_array_elements
, but I am trying to do the following: -
- Group by analysis → 'category'
- The average value for each element of the array
When I want, this is an operator that returns 3 rows grouped by category (i.e. news
, sport
and weather
) and 16 fixed-length arrays containing average values. To complicate matters even further if there is null
in the array, we should ignore them (i.e. we do not just sum and average over the number of rows). The result should look something like this: -
category | analysis_average -----------+-------------------------------------------------------------------------------------------------------------- "news" | [14.33, 15.33, 16.33, 17.33, 18.33, 19.33, 20.33, 21.33, 22.33, 23.33, 24.33, 25.33, 26.33, 27.33, 45, 36] "sport" | [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76] "weather" | [91, 92, 93, 94, 95, 96, 97, 98, 99, 00, 101, 102, 103, 104, 105, 106]
NOTE. Note 45
and 36
in the last 2 itmes arrays in line 1, which illustrates ignoring nulls
s.
I was thinking of creating a view that exploded an array of 16 columns, i.e.
create view analysis_view as select a.*, (a.analysis->'results'->>0)::int as result0, (a.analysis->'results'->>1)::int as result1 from analysis a;
This seems extremely inappropriate to me and eliminates the advantages of using an array in the first place, but it can probably crack something with it using this approach.
Any pointers or tips will be most appreciated!
Performance is also important here, so the higher the performance, the better!