In the first case, I use a very simple DataFrame
to try to use pandas.cut()
to count the number of unique values in one column within the range of another column. The code works as expected:
However, in the following code, pandas.cut()
counts the number of unique values. I expect the first bit (1462320000, 1462406400] to have 5 unique values, and other bins, including the last bit (1462752000, 1462838400), to have 0 unique values.
Instead, as shown in the result, the code returns 5 unique values in the last hopper (1462752000, 1462838400], while 2 highlighted values should not be taken into account because they are out of range.
So can anyone explain why pandas.cut()
behaves so strongly in these two cases? In addition, I would be very grateful if you could also tell me how I can correct the code to correctly count the number of unique values in one column in the range of values of another column.
MORE INFO: (please import pandas
and numpy
to run the code, my version of pandas is 0.19.2, and I am using python 2.7)
For complete help, I provide my DataFrame
and codes for playing my code:
Case 1:
df = pd.DataFrame({'No': [1,1.5,2,1,3,5,10], 'useragent': ['a', 'c', 'b', 'c', 'b','a','z']}) print type(df) print df df.groupby(pd.cut(df['No'], bins=np.arange(0,4,1))).useragent.nunique()
Case 2:
print type(df) print len(df) print df.time.nunique() print df.hash.nunique() print df[['time','hash']] df.groupby(pd.cut(df['time'], bins =np.arange(1462320000,1462924800,86400))).hash.nunique()
Case 2 data:
time hash 1462328401 qo 1462328401 qQ 1462838401 q1 1462328401 q1 1462328401 qU 1462328401 qU 1462328401 qU 1462328401 qU 1462328401 qX 1462838401 qX