Python - Gini coefficient calculation using Numpy

I am a beginner, first of all, I just started to learn Python, and I'm trying to write code to calculate the Gini index for a fake country. I came up with the following:

GDP = (653200000000) A = (0.49 * GDP) / 100 # Poorest 10% B = (0.59 * GDP) / 100 C = (0.69 * GDP) / 100 D = (0.79 * GDP) / 100 E = (1.89 * GDP) / 100 F = (2.55 * GDP) / 100 G = (5.0 * GDP) / 100 H = (10.0 * GDP) / 100 I = (18.0 * GDP) / 100 J = (60.0 * GDP) / 100 # Richest 10% # Divide into quintiles and total income within each quintile Q1 = float(A + B) # lowest quintile Q2 = float(C + D) # second quintile Q3 = float(E + F) # third quintile Q4 = float(G + H) # fourth quintile Q5 = float(I + J) # fifth quintile # Calculate the percent of total income in each quintile T1 = float((100 * Q1) / GDP) / 100 T2 = float((100 * Q2) / GDP) / 100 T3 = float((100 * Q3) / GDP) / 100 T4 = float((100 * Q4) / GDP) / 100 T5 = float((100 * Q5) / GDP) / 100 TR = float(T1 + T2 + T3 + T4 + T5) # Calculate the cumulative percentage of household income H1 = float(T1) H2 = float(T1+T2) H3 = float(T1+T2+T3) H4 = float(T1+T2+T3+T4) H5 = float(T1+T2+T3+T4+T5) # Magic! Using numpy to calculate area under Lorenz curve. # Problem might be here? import numpy as np from numpy import trapz # The y values. Cumulative percentage of incomes y = np.array([Q1,Q2,Q3,Q4,Q5]) # Compute the area using the composite trapezoidal rule. area_lorenz = trapz(y, dx=5) # Calculate the area below the perfect equality line. area_perfect = (Q5 * H5) / 2 # Seems to work fine until here. # Manually calculated Gini using the values given for the areas above # turns out at .58 which seems reasonable? Gini = area_perfect - area_lorenz # Prints utter nonsense. print Gini 

The result of Gini = area_perfect - area_lorenz just doesn't make sense. I took out the values โ€‹โ€‹given by the region variables and did the math manually, and it worked out pretty well, but when I try to get the program to do this, does it give me full ??? value (-1.7198 ...). What am I missing? Can someone point me in the right direction?

Thanks!

+2
source share
1 answer

Stardust

Your problem is not numpy.trapz ; these are: 1) your definition of a perfect distribution of equality, and 2) the normalization of the Gini coefficient.

First, you determined the ideal distribution of equality as Q5*H5/2 , which is half the product of the income of the fifth quintile and the aggregate percentage (1.0). I'm not sure what this number means.

Secondly, you need to normalize the area under the full distribution of equality; i.e:.

gini = (area under full equality - area under Lorentz) / (area under full equality)

You do not need to worry about this if you define a perfect equality curve so that it has an area of โ€‹โ€‹1, but it is good protection in the event of an error in determining the ideal equality curve.

To solve both of these problems, I defined a perfect equality curve using numpy.linspace . The first advantage of this is that you can use the actual properties of the distribution to define it in the same way. In other words, if you use quartiles or quintiles or deciles, then the ideal CDF equality ( y_pe , below) will have the correct form. The second advantage is that its area is calculated using numpy.trapz , and the parallelism bit, which makes the code more readable and protects against erroneous calculations.

 import numpy as np from matplotlib import pyplot as plt from numpy import trapz GDP = (653200000000) A = (0.49 * GDP) / 100 # Poorest 10% B = (0.59 * GDP) / 100 C = (0.69 * GDP) / 100 D = (0.79 * GDP) / 100 E = (1.89 * GDP) / 100 F = (2.55 * GDP) / 100 G = (5.0 * GDP) / 100 H = (10.0 * GDP) / 100 I = (18.0 * GDP) / 100 J = (60.0 * GDP) / 100 # Richest 10% # Divide into quintiles and total income within each quintile Q1 = float(A + B) # lowest quintile Q2 = float(C + D) # second quintile Q3 = float(E + F) # third quintile Q4 = float(G + H) # fourth quintile Q5 = float(I + J) # fifth quintile # Calculate the percent of total income in each quintile T1 = float((100 * Q1) / GDP) / 100 T2 = float((100 * Q2) / GDP) / 100 T3 = float((100 * Q3) / GDP) / 100 T4 = float((100 * Q4) / GDP) / 100 T5 = float((100 * Q5) / GDP) / 100 TR = float(T1 + T2 + T3 + T4 + T5) # Calculate the cumulative percentage of household income H1 = float(T1) H2 = float(T1+T2) H3 = float(T1+T2+T3) H4 = float(T1+T2+T3+T4) H5 = float(T1+T2+T3+T4+T5) # The y values. Cumulative percentage of incomes y = np.array([H1,H2,H3,H4,H5]) # The perfect equality y values. Cumulative percentage of incomes. y_pe = np.linspace(0.0,1.0,len(y)) # Compute the area using the composite trapezoidal rule. area_lorenz = np.trapz(y, dx=5) # Calculate the area below the perfect equality line. area_perfect = np.trapz(y_pe, dx=5) # Seems to work fine until here. # Manually calculated Gini using the values given for the areas above # turns out at .58 which seems reasonable? Gini = (area_perfect - area_lorenz)/area_perfect print Gini plt.plot(y,label='lorenz') plt.plot(y_pe,label='perfect_equality') plt.legend() plt.show() 
+1
source

Source: https://habr.com/ru/post/1275605/


All Articles