I have a text field interaction dataset in several tens of users of my application for several months. I am trying to calculate the average time between keystrokes in pandas. The data looks something like this:
timestamp before_text after_text
1453481138188 NULL a
1453481138600 a ab
1453481138900 ab abc
1453481139400 abc abcd
1453484000000 Enter some numbers 1
1453484000100 1 12
1453484000600 12 123
timestampcontains the unix time that the user pressed the key before_text— this is what the text field contained before the user pressed the key, and after_textthis is how the field looked after the key was pressed.
What is the best way to do this? I know that it is not as simple as doing something like:
(df["timestamp"] - df["timestamp"].shift()).mean()
. , - df.groupby, . magic_function, - :
df.groupby(magic_function).apply(lambda x: x["timestamp"] - x["timestamp"].shift()).mean()
magic_function, ?