Say I have time data (time on the x axis, coordinates on the yz plane.
Given the seed set of infected users, I want to get all users at distance d from the seed points set over t . This is just contact tracking.
What is the smart way to do this?
The naive approach looks something like this:
points_at_end_of_iteration = [] for p in seed_set: other_ps = find_points_t_time_away(t) points_at_end_of_iteration += find_points_d_distance_away_from_set(other_ps)
What is a more reasonable way to do this - it is advisable to store all the data in RAM (although I'm not sure if this is possible). Is Pandas a good option? I was thinking about Bandicoot , but it seems he can't do this for me.
Please let me know if I can improve the question - perhaps it is too wide.
Edit:
I think the algorithm presented above is wrong.
This is better:
for user,time,pos in infected_set: info = get_next_info(user, time)
infected_set I think that in fact it will be hashmap {user_id: {last_time: ..., last_pos: ...}, user_id2: ...}
One potential problem is that users are processed independently, so the next timestamp for user2 can be hours or days after user1.
Contact tracing can be easier if I interpolate so that each user has information for each moment in time (for example, every hour), although this would increase the amount of data by a huge amount.
Data Format / Example
user_id = 123 timestamp = 2015-05-01 05:22:25 position = 12.111,-12.111
There is one csv file with all entries:
uid1,timestamp1,position1 uid1,timestamp2,position2 uid2,timestamp3,position3
There is also a file directory (same format) where each file corresponds to the user.
entries / uid1.csv
entries / uid 2.csv