Calculating the time between text field interactions

I have a text field interaction dataset in several tens of users of my application for several months. I am trying to calculate the average time between keystrokes in pandas. The data looks something like this:

timestamp                before_text     after_text
1453481138188                  NULL               a
1453481138600                     a              ab 
1453481138900                    ab             abc
1453481139400                   abc            abcd
1453484000000    Enter some numbers               1
1453484000100                     1              12
1453484000600                    12             123

timestampcontains the unix time that the user pressed the key before_text— this is what the text field contained before the user pressed the key, and after_textthis is how the field looked after the key was pressed.

What is the best way to do this? I know that it is not as simple as doing something like:

(df["timestamp"] - df["timestamp"].shift()).mean()

. , - df.groupby, . magic_function, - :

df.groupby(magic_function).apply(lambda x: x["timestamp"] - x["timestamp"].shift()).mean()

magic_function, ?

+4
2

, "" "". , .

from Levenshtein import distance as ld. pip :

pip install python-levenshtein

:

from Levenshtein import distance as ld
import pandas as pd

# taking just these two columns and transposing and back filling.
# I back fill for one reason, to fill that pesky NA with after text.
before_after = df[['before_text', 'after_text']].T.bfill()

distances = before_after.apply(lambda x: ld(*x))

# threshold should be how much distance constitutes an obvious break in sessions.
threshold = 2
magic_function = (distances > 2).cumsum()

df.groupby(magic_function) \
  .apply(lambda x: x["timestamp"] - x["timestamp"].shift()) \
  .mean()

362.4
+2

, , . , timestamp s, , , .

thresh = 1e5
ts = (df['timestamp'] - df['timestamp'].shift()) > thresh
grp = [0]
for i in range(len(ts)):
    if ts.iloc[i]:
        grp.append(grp[-1] + 1)
    else:
        grp.append(grp[-1])
grp.append(grp[-1])
df['grouper'] = grp

: grouped = df.groupby('grouper'), timestamp .

, , .

0

Source: https://habr.com/ru/post/1649577/


All Articles