I don’t know how this is usually done, but I can think of one rough way of defining the concept of correlation, which captures the adjacency of words.
Suppose text has length N, for example, it is an array
text[0], text[1], ..., text[N-1]
Suppose in the text
The following words appear:
word[0], word[1], ..., word[k]
For each vocabulary word [i], define a vector of length N-1
X[i] = array();
as follows: the ith record of the vector is 1 if the word is either the ith word or the (i + 1) th word, and zero otherwise.
// compute the vector X[i] for (j = 0:N-2){ if (text[j] == word[i] OR text[j+1] == word[i]) X[i][j] = 1; else X[i][j] = 0; }
You can then calculate the correlation coefficient between the word [a] and the word [b] as the dot product between X [a] and X [b] (note that the dot product is the number of times these words are adjacent) divided by lengths ( the length is the square root of the number of occurrences of the word, which is possibly twice as much). Name this quantity COR (X [a], X [b]). It is clear that COR (X [a], X [a]) = 1, and COR (X [a], X [b]) is greater if the word [a], word [b] are often adjacent.
This can be generalized from “related” to other concepts almost - for example, we could use 3 blocks instead (or 4, 5, etc.). You can also add weight, perhaps do many more things if necessary. You could experiment to find out what is useful, if at all.