Iterate over Pandas framework using List List

I can solve this other way; but I am interested to understand why trying to iterate over pandas DataFrame with a list does not work. (Here a is the Dataframe)

 def func(a,seed1,seed2): for i in range(0,3): # Sum of squares. Results in a series containing 'date' and 'num' sorted1 = ((a-seed1)**2).sum(1) sorted2 = ((a-seed2)**2).sum(1) # This makes a list out of the dataframe. a = [a.ix[i] for i in a.index if sorted1[i]<sorted2[i]] b = [a.ix[i] for i in a.index if sorted1[i]>=sorted2[i]] # The above line throws the exception: # TypeError: 'builtin_function_or_method' object is not iterable # Throw it back into a dataframe... a = pd.DataFrame(a,columns=['A','B','C']) b = pd.DataFrame(b,columns=['A','B','C']) # Update the seed. seed1 = a.mean() seed2 = b.mean() print a.head() print "I'm computing." 
+4
source share
1 answer

The problem occurs after the first line, a is no longer a DataFrame:

 a = [a.ix[i] for i in a.index if sorted1[i]<sorted2[i]] b = [a.ix[i] for i in a.index if sorted1[i]>=sorted2[i]] 

This is a list, and therefore does not have an index attribute (hence, errors).

One python trick is to do this on a single line (define them at the same time), i.e.:

 a, b = [a.ix[i] for ...], [a.ix[i] for ...] 

perhaps the best option is to use a different variable name here (e.g. df).

As you say, there are better ways to do this in pandas, the use of a mask is obvious:

 msk = sorted1 < sorted2 seed1 = df[msk].mean() seed2 = df[~msk].mean() 
+3
source

Source: https://habr.com/ru/post/1498044/


All Articles