Contacts Tracing in Python - working with timers

Question

Contacts Tracing in Python - working with timers

Say I have time data (time on the x axis, coordinates on the yz plane.

Given the seed set of infected users, I want to get all users at distance d from the seed points set over t . This is just contact tracking.

What is the smart way to do this?

The naive approach looks something like this:

 points_at_end_of_iteration = [] for p in seed_set: other_ps = find_points_t_time_away(t) points_at_end_of_iteration += find_points_d_distance_away_from_set(other_ps)

What is a more reasonable way to do this - it is advisable to store all the data in RAM (although I'm not sure if this is possible). Is Pandas a good option? I was thinking about Bandicoot , but it seems he can't do this for me.

Please let me know if I can improve the question - perhaps it is too wide.

Edit:

I think the algorithm presented above is wrong.

This is better:

 for user,time,pos in infected_set: info = get_next_info(user, time) # info will be a tuple: (t, pos) intersecting_users = find_intersecting_users(user, time, delta_t, pos, delta_pos) # intersect if close enough to the user pos/time infected_set.add(intersecting_users) update_infected_set(user, info) # change last_time and last_pos (described below)

infected_set I think that in fact it will be hashmap {user_id: {last_time: ..., last_pos: ...}, user_id2: ...}

One potential problem is that users are processed independently, so the next timestamp for user2 can be hours or days after user1.

Contact tracing can be easier if I interpolate so that each user has information for each moment in time (for example, every hour), although this would increase the amount of data by a huge amount.

Data Format / Example

 user_id = 123 timestamp = 2015-05-01 05:22:25 position = 12.111,-12.111 # lat,long

There is one csv file with all entries:

 uid1,timestamp1,position1 uid1,timestamp2,position2 uid2,timestamp3,position3

There is also a file directory (same format) where each file corresponds to the user.

entries / uid1.csv
entries / uid 2.csv

+5

python python-2.7 pandas time-series

pushkin Nov 30 '15 at 19:21

source share

1 answer

Olivier pellier-cuit · Accepted Answer · 2015-12-10T03:53:43+0000

The first solution with interpolation:

 # i would use a shelf (a persistent, dictionary-like object, # included with python). import shelve # hashmap of clean users indexed by timestamp) # { timestamp1: {uid1: (lat11, long11), uid12: (lat12, long12), ...}, # timestamp2: {uid1: (lat21, long21), uid2: (lat22, long22), ...}, # ... # } # clean_users = shelve.open("clean_users.dat") # load data in clean_users from csv (shelve use same syntax than # hashmap). You will interpolate data (only data at a given timestamp # will be in memory at the same time). Note: the timestamp must be a string # hashmap of infected users indexed by timestamp (same format than clean_users) infected_users = shelve.open("infected_users.dat") # for each iteration for iteration in range(1, N): # compute current timestamp because we interpolate each user has a location current_timestamp = timestamp_from_iteration(iteration) # get clean users for this iteration (in memory) current_clean_users = clean_user[current_timestamp] # get infected users for this iteration (in memory) current_infected_users = infected_user[current_timestamp] # new infected user for this iteration new_infected_users = dict() # compute new infected_users for this iteration from current_clean_users and # current_infected_users then store the result in new_infected_users # remove user in new_infected_users from clean_users # add user in new_infected_users to infected_users # close the shelves infected_users.close() clean_users.close()

Second solution without interpolation:

 # i would use a shelf (a persistent, dictionary-like object, # included with python). import shelve # hashmap of clean users indexed by timestamp) # { timestamp1: {uid1: (lat11, long11), uid12: (lat12, long12), ...}, # timestamp2: {uid1: (lat21, long21), uid2: (lat22, long22), ...}, # ... # } # clean_users = shelve.open("clean_users.dat") # load data in clean_users from csv (shelve use same syntax than # hashmap). Note: the timestamp must be a string # hashmap of infected users indexed by timestamp (same format than clean_users) infected_users = shelve.open("infected_users.dat") # for each iteration (not time related as previous version) # could also stop when there is no new infected users in the iteration for iteration in range(1, N): # new infected users for this iteration new_infected_users = dict() # get timestamp from infected_users for an_infected_timestamp in infected_users.keys(): # get infected users for this time stamp current_infected_users = infected_users[an_infected_timestamp] # get relevant timestamp from clean users for a_clean_timestamp in clean_users.keys(): if time_stamp_in_delta(an_infected_timestamp, a_clean_timestamp): # get clean users for this clean time stamp current_clean_users = clean_users[a_clean_timestamp] # compute infected users from current_clean_users and # current_infected_users then append the result to # new_infected_users # remove user in new_infected_users from clean_users # add user in new_infected_users to infected_users # close the shelves infected_users.close() clean_users.close()

Contacts Tracing in Python - working with timers

More articles: