I created TermDocumentMatrixfrom a library tmin R. It looks something like this:
> inspect(freq.terms)
A document-term matrix (19 documents, 214 terms)
Non-/sparse entries: 256/3810
Sparsity : 94%
Maximal term length: 19
Weighting : term frequency (tf)
Terms
Docs abundant acid active adhesion aeropyrum alternative
1 0 0 1 0 0 0
2 0 0 0 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 1 0 0 0 0
7 0 0 0 0 0 0
8 0 0 0 0 0 0
9 0 0 0 0 0 0
10 0 0 0 0 1 0
11 0 0 1 0 0 0
12 0 0 0 0 0 0
13 0 0 0 0 0 0
14 0 0 0 0 0 0
15 1 0 0 0 0 0
16 0 0 0 0 0 0
17 0 0 0 0 0 0
18 0 0 0 0 0 0
19 0 0 0 0 0 1
This is just a small sample of the matrix; there are actually 214 terms that I work with. On a small scale, this is normal. If I want to convert mine TermDocumentMatrixto a regular matrix, I would do:
data.matrix <- as.matrix(freq.terms)
However, the data that I showed above is just a subset of my common data. My general data is probably at least 10,000 terms. When I try to create TDM from shared data, I run an error:
> Error cannot allocate vector of size n Kb
So, I am considering alternative ways to find efficient memory allocation for my tdm.
I tried turning my tdm into a sparse matrix from a library Matrix, but ran into the same problem.
? , :
bigmemory/ff , ( bigmemory Windows )irlba SVD tdm,
, , , - . - , ? , , , , , , .
EDIT: 10,00 10 000. @nograpes.