How to build specific pandas data rows?

I have this dataframe example:

animal gender name first second third 0 dog m Ben 5 6 3 1 dog f Lilly 2 3 5 2 dog m Bob 3 2 1 3 cat f Puss 1 4 4 4 cat m Inboots 3 6 5 5 wolf f Lady NaN 0 3 6 wolf m Summer 2 2 1 7 wolf m Grey 4 2 3 8 wolf m Wind 2 3 5 9 lion f Elsa 5 1 4 10 lion m Simba 3 3 3 11 lion f Nala 4 4 2 

Now I suspect that for this I may need hierarchical indexing, but so far I have not received it in Pandas. However, I really need to do some (apparently too advanced) things with this and haven't figured out how to do this. In principle, what I would like to have at the end is the plot in this case (probably a scatter plot, although the line will be just as good now).

1) I would like to have a figure of 4 subnets - one subplot for each animal. The name of each subtitle must be an animal.

2). In each of the subheadings, I would like to build numbers (for example, the number of births every year), that is, the values ​​“first”, “second” and “third” for a given line and give this a label that will show the “name” in the legend. And for each subtitle (each animal), I would like to separately separate the male and the female (for example, the male in blue and the female in red) and, in addition, also calculate the average value for the animal (i.e. the average value in each column for this animal ) in black.

3) note: building it against 1,2,3 for exaple - referring to the column number, So, for example, for the first subheading with the name “dog” I would like to build something like plt.plot(np.array([1,2,3]),x,'b', np.array([1,2,3]),y,'r', np.array([1,2,3]), np.mean(x,y,axis=1),'k') , where x will be (in the first case) 5,6,3, and the legend for this blue plot will show 'Ben', y will be 2,3,5, and the legend for the red graph will display “Lilly”, and the black plot will be 3,5, 4,5, 4, and in the legend I would define what this “means” (for each of the subplots).

I hope I made myself clear enough. I understand that without seeing the final figure, it can be difficult to imagine it, but ... well, if I knew how to do this, I would not ask ...

So, in conclusion, I would like to skip the data frame at different levels, having animals in separate subplots and comparisons of men and women and the average between them in each of the subplots.

My actual framework is much larger, so in the ideal case, I would like the solution to be reliable, but understandable (for a novice programmer).

To understand what a subtask should look like, this is a product in excel:

briefly outlined plot

+5
source share
1 answer

I'm not sure if I understood well what you meant. But I think that you need to convert your dataframe to longform or tidy format , since many of the operations that you will have on it will be easier with this format, starting with creating graphs based on categorical variables.

If df is your data framework, to convert it to a neat format, just use:

 df2 = pd.melt(df, id_vars=["animal","gender","name"]) df2 animal gender name variable value 0 dog m Ben first 5.0 1 dog f Lilly first 2.0 2 dog m Bob first 3.0 3 cat f Puss first 1.0 4 cat m Inboots first 3.0 ... 31 wolf m Grey third 3.0 32 wolf m Wind third 5.0 33 lion f Elsa third 4.0 34 lion m Simba third 3.0 35 lion f Nala third 2.0 

Then (almost) everything becomes simple, just use the seabed:

 g = sns.factorplot(data=df2, # from your Dataframe col="animal", # Make a subplot in columns for each variable in "animal" col_wrap=2, # Maximum number of columns per row x="variable", # on x-axis make category on the variable "variable" (created by the melt operation) y="value", # The corresponding y values hue="gender", # color according to the column gender kind="strip", # the kind of plot, the closest to what you want is a stripplot, legend_out=False, # let the legend inside the first subplot. ) 

Then you can improve the overall aesthetics:

 g.set_xlabels("year") g.set_titles(template="{col_name}") # otherwise it "animal = dog", now it just "dog" sns.despine(trim=True) # trim the axis. 

stripplot seaborn

To add averages, you must do this manually. I'm afraid, however, if you have more data, you might also consider a box or a violin that you can use on top of the strip list, by the way.

I suggest you check out the Seaborn documentation to further improve your plot.

NTN

+1
source

Source: https://habr.com/ru/post/1236023/


All Articles