How to implement the spectrum core function in MATLAB?

The spectrum core function works with strings, counting the same n-grams between two strings. For example, an “instrument” has three 2 grams (“to”, “oo” and “ol”), and the similarity between the “instrument” and “fool” is 2. (“oo” and “ol”).

How can I write a MATLAB function that could calculate this metric?

+3
source share
2 answers

The first step is to create a function that can generate n-grams for a given string. One way to do this in a vectorized way is with some smart indexing.

function [subStrings, counts] = n_gram(fullString, N)
  if (N == 1)
    [subStrings, ~, index] = unique(cellstr(fullString.'));  %.'# Simple case
  else
    nString = numel(fullString);
    index = hankel(1:(nString-N+1), (nString-N+1):nString);
    [subStrings, ~, index] = unique(cellstr(fullString(index)));
  end
  counts = accumarray(index, 1);
end

HANKEL, , N- . N . CELLSTR . UNIQUE , ACCUMARRAY ( ).

n-, , INTERSECT:

subStrings1 = n_gram('tool',2);
subStrings2 = n_gram('fool',2);
sharedStrings = intersect(subStrings1,subStrings2);
nShared = numel(sharedStrings);
+2

, , , , doc pdist.

A=['Marcin'; 'Martin'; 'Marsha']  %data

squareform(pdist(A, 'hamming'))  returns

         0    0.1667    0.5000

    0.1667         0    0.5000

    0.5000    0.5000         0

, . "" "" - 1 6 , 1/6 = 0.1667 "" "" 3 6, 3/6 = 0,5
, , (A).

-1

Source: https://habr.com/ru/post/1713702/


All Articles