I have an R function to generate K-Skip-N-Grams :
My full function is on github .
My code really generates the necessary k-skip-n gram:
> kSkipNgram("Lorem ipsum dolor sit amet, consectetur adipiscing elit.", n=2, skip=1) [1] "Lorem dolor" "Lorem ipsum" "ipsum sit" [4] "ipsum dolor" "dolor amet" "dolor sit" [7] "sit consectetur" "sit amet" "amet adipiscing" [10] "amet consectetur" "consectetur elit" "consectetur adipiscing" [13] "adipiscing elit"
But I would like to generalize / simplify the following switch statement of nested for-loops:
# x - should be text, sentense # n - n-gramm # skip - number of skips ################################### switch(as.character(n), "0" = {ngram<-c(ngram, paste(x[i]))}, "1" = {for(j in skip:1) { if (i+j <= length(x)) {ngram<-c(ngram, paste(x[i],x[i+j]))} } }, "2" = {for(j in skip:1) {for (k in skip:1) { if (i+j <= length(x) && i+j+k <= length(x)) {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k]))} } } }, "3" = {for(j in skip:1) {for (k in skip:1) {for (l in skip:1) { if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x)) {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l]))} } } } }, "4" = {for(j in skip:1) {for (k in skip:1) {for (l in skip:1) {for (m in skip:1) { if (i+j <= length(x) && i+j+k <= length(x) && i+j+k+l <= length(x) && i+j+k+l+m <= length(x)) {ngram<-c(ngram, paste(x[i],x[i+j],x[i+j+k],x[i+j+k+l],x[i+j+k+l+m]))} } } } } } ) } }
source share