Weight Data with R Part II

Below is the following data frame:

structure(list(UH6401 = c(1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1), UH6402 = c(1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1), UH6403 = c(1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1), UH6404 = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1), UH6409 = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0 ), UH6410 = c(1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0 ), UH6411 = c(0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1 ), UH6412 = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1 ), UH6503 = c(1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1 ), UH66 = c(1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), UH68 = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), UH6501a = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), UH6405a = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), UH6407a = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1), weight = c(405.002592353822, 479.360356183825, 526.548105855472, 810.005184707644, 312.321528531308, 930.961115757095, 567.383058387095, 475.323944260643, 1226.91439266118, 517.086839792615, 1200.2669656949, 810.005184707644, 656.723784884795, 605.370463928298, 668.467435759576, 558.112457492436, 793.751055244424, 479.360356183825, 1226.91439266118, 1606.54816212786, 1657.48609449633, 300.803580980276, 605.370463928298, 1140.55078447979, 669.102760422943, 810.005184707644, 1657.48609449633, 305.569853371963, 2994.30343152033, 762.922030382216, 479.360356183825, 1147.36030437824, 668.467435759576, 517.086839792615, 479.360356183825, 399.141865860217, 656.723784884795, 913.364738988386, 312.321528531308, 569.10576379231, 775.630259688922, 1207.22952429547, 1053.09621171094, 1140.55078447979, 314.857225320909, 668.467435759576, 2416.57081451012, 573.680152189121, 396.875527622212, 605.370463928298, 1036.3159447043, 3088.62283807823, 569.10576379231, 1140.55078447979, 2416.57081451012, 1147.36030437824, 762.922030382216, 702.064141140629, 351.032070570315, 629.714450641817, 517.086839792615, 1996.20228768022, 828.743047248167, 475.323944260643, 920.185794495882, 793.751055244424, 796.08788273764, 1197.42559758065, 405.002592353822, 418.584343119327, 300.803580980276, 654.76828203733, 2740.09421696516, 351.032070570315, 1069.6202614693, 2094.91447516374, 399.141865860217, 654.76828203733, 1003.65414063441, 573.680152189121, 851.074587580641, 913.364738988386, 762.922030382216, 1034.17367958523, 573.680152189121, 479.360356183825, 3208.8607844079, 654.76828203733, 908.055695892447, 328.361892442398, 1036.3159447043, 702.064141140629, 613.457196330588, 601.607161960551, 567.383058387095, 479.360356183825, 306.261087672466, 920.185794495882, 654.76828203733, 828.743047248167)), .Names = c("UH6401", "UH6402", "UH6403", "UH6404", "UH6409", "UH6410", "UH6411", "UH6412", "UH6503", "UH66", "UH68", "UH6501a", "UH6405a", "UH6407a", "weight" ), row.names = c(NA, 100L), class = "data.frame") 

In social science, we often have a weight variable to weigh the case (row) by the coefficient of this variable, to correct the sample to fit, for example. population by age classes. If the string weight variable is β€œ1.6”, this means that this string should be observed 1.6 times to fit the base population.

In SPSS, I would write

 WEIGHT BY weight. 

and all procedures after this command will weight the data accordingly.

In R, I can do this with stabs using the command

 xtabs(weight ~ UH6401, data=df) 

But what if I want to do SVD or PCA analysis? There is no function for weighting data, for example, in xtabs.

So the question is, is there a way to weight data in R, how is this possible in SPSS? A point with integers would be easy, and with a coefficient of "2" we would just double the line, but what about all the decimal factors?


UPDATE:

SVD or PCA were just an example! Take any other statistical procedure. In social science, samples are never perfect, but for statistical analysis with sample data, the sample should represent the main population, but the samples basically do not. Therefore, we are trying to correct this deficit with weights; therefore, the sample represents the main aggregate!

+6
source share
4 answers

You probably need to familiarize yourself with the search engines for R. Baron RSiteSearch and Rseek: This is one of the first hits on the "weighted PCA" on the Baron website:

http://finzi.psych.upenn.edu/R/library/aroma.light/html/wpca.matrix.html

With clarification in the commentary on Joris Mace’s answer, the answer often arises that you need to clearly understand that everyone wants sample weights compared to other types of weighing. Regression weighing is done using the survey package. Lumley's book on shooting methods distinguishes between three types of weights. ("Scales" in the lm function are the weights of the variance, NOT the weight. Sample).

Note. The examination package includes both PCA and factor analysis (experimental). So, perhaps, the question about Dominick's question about a unified approach to weighting in regression methods has one β€œanswer”.

+4
source

First of all, it does not make sense to make a PCA from these data. Secondly, SPSS does not perform PCA, but a factor analysis, which is something else. I know that they call it PCA, but it is not.

WEIGHT in BY SPSS is nothing more than the weight of replication, and just like your analysis, repeating your affairs with rep() : complete insanity. To reference your example: in SPSS FACTOR (which is used for the so-called PCA) does not accept fractional weights.

If you want to perform weighted procedures, the only reasonable way to do this is to use the correct method / function / package for this. In statistics, there is no single procedure for the size of everything, despite the fact that SPSS likes to make you believe.

In your example: a weighted PCA in R is contained in FactoMineR and aroma.light . But I highly recommend that you also take a look at the vegan package, as it contains much more useful coordination methods for the data you are describing.

+7
source

I'm not sure if that suits you. See Package R weights .

0
source

I just found a Post in R-Bloggers that introduces the svydesign() function. As far as I know, this function from the Survey Pack is similar to the SPSS function, it allows you to create weighted data for further analysis. I believe this is more useful than using various functions from several packages to perform multi-parameter analysis. I hope you find it useful!

Another social science analyst using R;)

0
source

Source: https://habr.com/ru/post/894833/


All Articles