Extract a subset of a pandas frame based on values (with repetition)?

Question

Extract a subset of a pandas frame based on values (with repetition)?

Let's say I have the following data framework:

elements =  [1,1,1,1,1,2,3,4,5]
df = pd.DataFrame({'elements': elements})
df.set_index(['elements'])
print df
   elements
0      1
1      1
2      1
3      1
4      1
5      2
6      3

I have a list [1, 1, 2, 3]and I want a subset of the data frame, including these 4 elements, for example:

I managed to handle this by building a dict, counting the occurrences of the elements in the array and creating a new dataframe, adding the substrings of the original.

Do you know some dataframe methods to help me find a more elegant solution?

After the comment by @jezrael: I have to add that I need to track the starting index (in df format).

We can see df (the first dataframe) as a resource repository, and I need to keep track of which rows / indexes are assigned:

: df 1, 2 3. , 0 1 1, 4 2 5 3.

+4

pandas indexing duplicates dataframe subset

tokiloutok 26 . '16 10:54

2

merge GroupBy.cumcount:

L = [1,1,2,3]
df1 = pd.DataFrame({'elements':L})

df['g'] = df.groupby('elements')['elements'].cumcount()
df1['g'] = df1.groupby('elements')['elements'].cumcount()

print (df)
   elements  g
0         1  0
1         1  1
2         1  2
3         1  3
4         1  4
5         2  0
6         3  0
7         4  0
8         5  0

print (df1)
   elements  g
0         1  0
1         1  1
2         2  0
3         3  0

print (pd.merge(df,df1, on=['elements', 'g']))
   elements  g
0         1  0
1         1  1
2         2  0
3         3  0

print (pd.merge(df.reset_index(),df1, on=['elements', 'g'])
                  .drop('g', axis=1)
                  .set_index('index')
                  .rename_axis(None))
   elements
0         1
1         1
5         2
6         3

0

jezrael 26 . '16 11:11

jrjc · Accepted Answer · 2016-07-26T12:17:30+0000

Series list ( . ), :

L = [1, 1, 2, 3]
df[df.elements.apply(lambda x: x == L.pop(0) if x in L else False)]
       elements
0         1
1         1
5         2
6         3

list.pop(i) list i. elements L , (i==0) L elements.

, lambda elements, L :

| element |       L      |   Output  |
|=========|==============|===========|
|    1    | [1, 1, 2, 3] |    True   |
|    1    |    [1, 2, 3] |    True   |
|    1    |       [2, 3] |   False   |
|    1    |       [2, 3] |   False   |
|    1    |       [2, 3] |   False   |
|    2    |       [2, 3] |    True   |
|    3    |          [3] |    True   |
|    4    |           [] |   False   | 
|    5    |           [] |   False   |

, , , , . , !

df.elements , , , , ( True):

df
   elements
0         5
1         4
2         3
3         1
4         2
5         1
6         1
7         1
8         1
cp = df.elements.copy()
cp.sort_values(inplace=True)
tmp = df.loc[cp.apply(lambda x: x == L.pop(0) if x in L else False)]
print tmp
   elements
2         3
3         1
4         2
5         1

Extract a subset of a pandas frame based on values ​​(with repetition)?

More articles:

Extract a subset of a pandas frame based on values (with repetition)?