This is admittedly a very simple question that I simply cannot find the answer to.
In R, I have a file that has 2 columns: 1 from the categorical data names, and the second is the count column (count for each of the categories). With a small dataset, I would use 'reshape' and the 'untable' function to make 1 column and do the analysis this way. The question is how to handle this with a large dataset.
In this case, my data is humane and it just won't work.
My question is: how can I tell R to use something like the following as distribution data:
Cat Count A 5 B 7 C 1
That is, I give him a histogram as an input signal and has a value of R, which means that there are 5 of A, 7 of B and 1 of C when calculating other information about the data.
The desired input, not the output, would be for R to understand that the data would be the same as below,
a b b b b b b b b s
In data of a reasonable size, I can do it myself, but what will you do when the data is very large?
Edit
The total amount of all accounts is 262 916 849.
In terms of what it will be used for:
This is new data that is trying to understand the relationship between this new data and other data. It is necessary to work with linear regressions and mixed models.
source share