I need to get a graph of the Lorentz curve of the cumulative variable depending on the number of observations. I want both axes to be displayed on a percentage basis (for example, for example, the number of observations is the number of customers, and the variable y is the amount they bought, customers have already taken a place in descending order, I want to get a plot that says: "The top 10% of buyers purchased 90% of the total purchase amount.") My dataset is several million observations.
What is the best way to do this? Sub questions:
If I need to add two variables for the summary observation quantiles and the total $ bought (to use them to build), what is the object that returns the line number? I tried:
user_quantile <- row(df)/nrow(df)
but I get a matrix from the same columns (user_quantile.1, user_quantile.2), of which I need only one column.
Is there any way to skip adding percentages as variables and only have them for axis values?
The plot has many ways than I need to get the line. What is the best approach to minimize computational effort and get a good schedule?
Thank.
source
share