Side by side a box of multiple pandas DataFrame columns

One year of sample data:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))

I want to put this data side by side, grouped by month (that is, two blocks per month, one for Aand one for B).

For one column it sns.boxplot(df.index.month, df["A"])works fine. However, it sns.boxplot(df.index.month, df[["A", "B"]])throws an error ( ValueError: cannot copy sequence with size 2 to array axis with dimension 365). Merging data by index ( pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")) to use the seaborn property hueas a workaround also does not work ( TypeError: unhashable type: 'DatetimeIndex').

(Itโ€™s not necessary to use seaborn for the solution, if it is easier to use a simple matplotlib.)

Edit

, , . , , DataFrame , . , / , , !

df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)

: Side-by-side boxplot of A and B, grouped by month.

+10
4

:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
                          "B": rnd.randn(n)+1,
                          "C": rnd.randn(n) + 10, # will not be plotted
                         },
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)
+1

, , matplotlib. , .

1) df 12 DataFrames month s,

DFList = []
for group in df_3.groupby(df_3.index.month):
    DFList.append(group[1])

2) :

for _ in range(12):
    DFList[_].plot(kind='box', subplots=True, layout=(2,2), sharex=True, sharey=True, figsize=(7,7))

plt.show()

3) :

enter image description here

matplotlib add_subplot

0
month_dfs = []
for group in df.groupby(df.index.month):
    month_dfs.append(group[1])

plt.figure(figsize=(30,5))
for i,month_df in enumerate(month_dfs):
    axi = plt.subplot(1, len(month_dfs), i + 1)
    month_df.plot(kind='box', subplots=False, ax = axi)
    plt.title(i+1)
    plt.ylim([-4, 4])

plt.show()

Not quite what you are looking for, but you can keep a readable DataFrame by adding more variables.

You can also easily remove the axis using

if i > 0:
        y_axis = axi.axes.get_yaxis()
        y_axis.set_visible(False)

in the loop before plt.show()

0
source

This is pretty simple using Altair :

alt.Chart(
    df.reset_index().melt(id_vars = ["index"], value_vars=["A", "B"]).assign(month = lambda x: x["index"].dt.month)
).mark_boxplot(
    extent='min-max'
).encode(
    alt.X('variable:N', title=''),
    alt.Y('value:Q'),
    column='month:N',
    color='variable:N'
)

enter image description hereThe code above fuses the DataFrame and adds the column month. Altair then creates block diagrams for each variable by month as graph columns.

0
source

Source: https://habr.com/ru/post/1015707/


All Articles