Annotate emissions on a ship

The β€œtips” graphical dataset is a jointplot, I would like to mark the top 10 outputs (or top-n outliers) on the graph by their indices from the β€œhourly” data frame. I calculate the residual (dotted distance from the midline) to find outliers. Please ignore the merits of this outlier detection method. I just want to annotate the schedule by specification.

import seaborn as sns sns.set(style="darkgrid", color_codes=True) tips = sns.load_dataset("tips") model = pd.ols(y=tips.tip, x=tips.total_bill) tips['resid'] = model.resid #indices to annotate tips.sort_values(by=['resid'], ascending=[False]).head(5) 

enter image description here

 tips.sort_values(by=['resid'], ascending=[False]).tail(5) 

enter image description here

 %matplotlib inline g = sns.jointplot("total_bill", "tip", data=tips, kind="reg", xlim=(0, 60), ylim=(0, 12), color="r", size=7) 

How to annotate the top 10 deviations (the largest 5 and the smallest 5 residuals) on the chart using each point index value (largest residuals) for this:

enter image description here

+5
source share
1 answer

You can use matplotlib annotate to create point annotations. The idea is to "total_bill" over the data and put the annotation at the appropriate position given by the "tip" and "total_bill" .

 import pandas as pd import seaborn as sns import matplotlib.pyplot as plt sns.set(style="darkgrid", color_codes=True) tips = sns.load_dataset("tips") model = pd.ols(y=tips.tip, x=tips.total_bill) tips['resid'] = model.resid g = sns.jointplot("total_bill", "tip", data=tips, kind="reg", xlim=(0, 60), ylim=(0, 12), color="r", size=7) #indices to annotate head = tips.sort_values(by=['resid'], ascending=[False]).head(5) tail = tips.sort_values(by=['resid'], ascending=[False]).tail(5) def ann(row): ind = row[0] r = row[1] plt.gca().annotate(ind, xy=(r["total_bill"], r["tip"]), xytext=(2,2) , textcoords ="offset points", ) for row in head.iterrows(): ann(row) for row in tail.iterrows(): ann(row) plt.show() 

enter image description here


Please note that pandas.ols has been removed with pandas version 0.20. To replace it, you can use
+5
source

Source: https://habr.com/ru/post/1265897/


All Articles