Here is an example dataset. The actual data set may exceed 200,000 records.
Unique people are those who have the same name and dob. For example, John A, James, Mark A and Mark B are unique people. However, Mark A has different id values.
I usually use R for the procedure and generate a list of data frames based on the name / dob combination and sort each dataframe with sample_date. Then I used the list application function to determine if the difference between the date between the fist and the last index in each data frame is different, to return the oldest if it has been less than 8 weeks from the most recent date. It is required forever.
I would welcome a few pointers regarding how I can try this with python / pandas. I started by creating a MultiIndex named / dob / id. The structure looks the way I want. I need to try applying some of the functions that I use in R to select the lines I need. I tried to select using df.xs() , but I'm not very far away.
Here is a data dictionary that loads easily into pandas (albeit with a different column order).
{'dob': {0: '12 / 07/1969 ', 1: '10 / 01/1964', 2: '30 / 08/1958 ', 3: '30 / 08/1958', 4: '12 / 05/1935 ', 5: '12 / 07/1969', 6: '12 / 05/1935 ', 7:' 5/12/1921 ', 8:' 6/08/1986 ', 9:' 4 / 03/1992 ', 10:' 1/10/1977 ', 11:' 1/06/1955 ', 12:' 1/06/1955 ', 13:' 9/12/1984 ', 14:' 9 / 12/1984 '},' id ': {0: 12345, 1: 54321, 2: 87878, 3: 45454,
4: 33322, 5: 12345, 6: 33322, 7: 65655, 8: 65459, 9: 41211, 10: 12345, 11: 56465, 12: 45456, 13: 55544, 14: 55544}, 'labno': { 0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11:12, 12: 13, 13: 14, 14: 15} 'location': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'C', 5: 'A' , 6: "A", 7: "B", 8: "A", 9: "C", 10: "A", 11: "C", 12: "C", 13: 'A', 14 : 'A'}, 'name': {0: 'John A', 1: 'John B', 2: โJamesโ, 3: โJamesโ, 4: โPeterโ, 5: โJohn Aโ, 6: Peter, 7: Jack, 8: Jill, 9: Julia, 10: Angela, 11: Mark A,
12: "Mark A", 13: "Mark B", 14: "Mark B"), "sample_date": {0: '12 / 05/2112 ', 1:' 6/12/2010 ', 2:' 04/30/2012 ', 3: '29 / 04/2012', 4: '15 / 07/2011 ', 5: '14 / 05/2012', 6: '23 / 03/2011 ', 7:' 08/15/2011 ', 8: '16 / 02/2012', 9: '15 / 09/2011 ', 10: '23 / 10/2006', 11: '4/04/2011', 12: ' 04/04/2011 ', 13: '13 / 09/2012', 14: '1/01/2012'}, 'sex': {0: 'M', 1: 'M', 2: 'M' , 3: 'M', 4: 'M', 5: 'M', 6: 'M', 7: 'M', 8: 'F', 9: 'F',
10: 'F', 11: 'M', 12: 'M', 13: 'M', 14: 'M'}}