Include and exclude in pandas (python)

The code calculates the rating of all users (user_id). I want to count ratings only for each user with ALLAN as sName. Main code

grouped_data = ratings['rating'].groupby(ratings['movie_id'])
average_ratings = grouped_data.mean()
movie_count = ratings.movie_id.value_counts()
higher_than_50_votes = movie_count.index[movie_count > 50]
average_ratings.ix[higher_than_50_votes].sort_values(ascending=False).head(5)

Part of the first table.

 user_id     movie_id  rating    
    196        242        3       
    22        302         3       
    90        377         1       
    10         51         2      
    2         346         1       
    1         474         4       
    8         265         2       
    4         465         5       
    2         451         3      
    1         451         5       

Part of the second table.

  user_id       Sname
    1|          AKERS
    2|          other
    3|          ALEXANDER
    4|          ALBERT  
    5|          ALBERT  
    6|          ANSEL   
    7|          ALLARD 
    8|          ALLAN 
    9|          ALLAN
+4
source share
1 answer

A few ways to do this:

1.Merge Sname columns in the ratings data frame, at "user_id"

ratings_with_names = ratings.merge(names, on='user_id')

This gives you something like:

    user_id     movie_id    rating  unix_timestamp  Sname
0   6           86          3       883603013       ANSEL
1   6           14          5       883599249       ANSEL
2   6           98          5       883600680       ANSEL
3   6           463         4       883601713       ANSEL 

So, now it's easy to select the rows you need, logical indexing

ratings_with_names[ratings_with_names.Sname == 'ALLAN']

2. Insert user_id that matches the condition in the second data frame, and use it to filter on the first data frame:

ratings[ratings.user_id.isin(names.ix[names.Sname == 'ALLAN', 'user_id'])]
+2
source

Source: https://habr.com/ru/post/1660129/


All Articles