I'm looking for an efficient way to store sets of objects that happened together during events, so that I can generate aggregate statistics on them on a daily basis.
To make an example, imagine a system that tracks meetings in an office. For each meeting, we record how many minutes it was and in which room it happened.
I want to get statistics broken by both a person and a room. I donโt need to keep track of individual meetings (so there is no meeting_id or something like that), all I want to know is the daily summary. My real application has hundreds of thousands of events per day, so saving each of them is impossible.
I would like to answer questions such as:
In 2012, how many minutes did Bob, Sam and Julie spend in each conference room (not necessarily together)?
This is probably normal with 3 queries:
>>> query(dates=2012, people=[Bob]) {Board-Room: 35, Auditorium: 279} >>> query(dates=2012, people=[Sam]) {Board-Room: 790, Auditorium: 277, Broom-Closet: 71} >>> query(dates=2012, people=[Julie]) {Board-Room: 190, Broom-Closet: 55}
In 2012, how many minutes did Sam and Julie have a MEETING TOGETHER in each conference room? What about Bob, Sam, and Julia?
>>> query(dates=2012, people=[Sam, Julie]) {Board-Room: 128, Broom-Closet: 55} >>> query(dates=2012, people=[Bob, Sam, Julie]) {Board-Room: 22}
In 2012, how many minutes did each person spend in the Board-Room?
>>> query(dates=2012, rooms=[Board-Room]) {Bob: 35, Sam: 790, Julie: 190}
In 2012, how many minutes were used in the Board-Room?
This is actually quite complicated, since the naive strategy of summing up the number of minutes spent by each person will lead to a serious recalculation. But we can probably solve this problem by storing the number separately as an Anyone meta-person:
>>> query(dates=2012, rooms=[Board-Room], people=[Anyone]) 865
What are some good data structures or databases that I can use to include this kind of query? Since the rest of my application uses MySQL, I am tempted to define a string column containing the (sorted) identifiers of each person in the meeting, but the size of this table will grow quite quickly:
2012-01-01 | "Bob" | "Board-Room" | 2 2012-01-01 | "Julie" | "Board-Room" | 4 2012-01-01 | "Sam" | "Board-Room" | 6 2012-01-01 | "Bob,Julie" | "Board-Room" | 2 2012-01-01 | "Bob,Sam" | "Board-Room" | 2 2012-01-01 | "Julie,Sam" | "Board-Room" | 3 2012-01-01 | "Bob,Julie,Sam" | "Board-Room" | 2 2012-01-01 | "Anyone" | "Board-Room" | 7
What else can I do?