Tick mark in scatter chart with Pandas not drawn correctly

Question

Tick mark in scatter chart with Pandas not drawn correctly

I draw a scatter chart matrix with Pandas, but the label label of the first graph is sometimes displayed correctly, and sometimes incorrectly constructed. I can’t understand what happened!

Here is an example:

enter image description here

the code:

from pandas.tools.plotting import scatter_matrix import pylab import numpy as np import pandas as pd def create_scatterplot_matix(X, name): """ Outputs a scatterplot matrix for a design matrix. Parameters: ----------- X:a design matrix where each column is a feature and each row is an observation. name: the name of the plot. """ pylab.figure() df = pd.DataFrame(X) axs = scatter_matrix(df, alpha=0.2, diagonal='kde') for ax in axs[:,0]: # the left boundary ax.grid('off', axis='both') ax.set_yticks([0, .5]) for ax in axs[-1,:]: # the lower boundary ax.grid('off', axis='both') ax.set_xticks([0, .5]) pylab.savefig(name + ".png")

Guys who? !!

Edit (Example X):

 X = np.random.randn(1000000, 10)

+5

python pandas

Jack twain Sep 29 '14 at 14:06

source share

2 answers

rwflash · Answer 1 · 2014-10-10T19:09:20+0000

This is the intended behavior. The y-axis values show the y-axis values for the 0th column. 0th row, 0th column contains a graph of probability density. The 0th row, 1st-3rd columns contain data used to create graphs on the diagonals.

the example in the Pandas documentation looks the same.

Demonstration:

 from pandas.tools.plotting import scatter_matrix import pylab import numpy as np import pandas as pd def create_scatterplot_matix(X, name): pylab.figure() df = pd.DataFrame(X) axs = scatter_matrix(df, alpha=0.2, diagonal='kde') pylab.savefig(name + ".png") create_scatterplot_matix([[0,0,0,0] ,[1,1,1,1] ,[1,1,1,1] ,[2,2,2,2]],'test')

In this code example, I used an extremely simple dataset for demo purposes. I also deleted the code section that ticks y and x.

This is the result:

Each diagonal has a probability density graph. Each of the off-diagonals uses the data used to create graphs in the diagonals. The y axis of the 0th row shows the y axis of the probability density graph located at the 0.0th position. The y-axis of the 1st, 2nd and 3rd rows show the y-axis of the data at positions 0.1, 0.2 and 0.3, used to create diagonal probability density graphs.

In our example, you can see the following sketched points: [0,0] [1,1] [2,2]. The point in [1,1] is darker, because in this place there are more points than the rest.

What happens is that your dataset, all values are between 0 and 1, so 0.5 shows both axes perfectly in the center of the rows / columns. However, the data is strongly distorted with respect to the value 0, therefore, the probability density diagrams grow closer the closer to 0. The maximum value of the probability density graph in the 0th line looks like this: (eyeball test) about 8 -10.

What I personally would do is edit my left border code something like this:

 autoscale = True # We want the 0,0th item y-axis to autoscale for ax in axs[:,0]: # the left boundary ax.grid('off', axis='both') if autoscale == True: ax.set_autoscale_on(True) autoscale = False else: ax.set_yticks([0, 0.5])

In our example dataset, using this method, we create the following diagram:

amball · Answer 2 · 2014-11-01T00:59:52+0000

This would seem to be a bug in pandas. See https://github.com/pydata/pandas/issues/5662

In the meantime, you can adjust the marks manually. First set the number of labels and intervals you want, depending on the range on the kernel density graph.

 axs[0,0].set_yticks([0.24,0.33,0.42])

Then manually change the text in your shortcuts.

 axs[0,0].set_yticklabels([0.0, 1.0, 2.0])

Tick ​​mark in scatter chart with Pandas not drawn correctly

More articles:

Tick mark in scatter chart with Pandas not drawn correctly