Batch strfind: search for multiple lines in several other lines

Question: I have two large arrays of row cells Aand B. I want to find out the fastest way to determine which items in Awhat are in B. In particular, can this be done without a loop?

Minimum example: (mine are actual Aand Bcontain 7,000,000 and 22,000 lines respectively)

A = {'one';
     'two';
     'three';
     'four'};
B = {'ee';
     'xx';
     'r'};

The desired result for an example would be

C = [ 0 0 0 ;
      0 0 0 ;
      1 0 1 ;
      0 0 1 ];

where rows and columns Ccorrespond to elements Aand Brespectively. For my purpose, I only need a true / false answer, but bonus points if it Creturns the first index, where the line in Bis in A, for example:

C = [ 0 0 0 ;
      0 0 0 ;
      4 0 3 ;
      0 0 4 ];

, : , , , , regexp - , . , :

for i=1:length(A);
    for j=1:length(B);
        C(i,j) = max([0,strfind(A{i},B{j})]); disp(C(i,j));
    end
end

, , cellfun:

AA = repmat(A,[1 length(B)]);
BB = repmat(B,[length(A) 1]);
C  = reshape(cellfun(@(a,b) max([0,strfind(a,b)]),AA(:),BB(:)),[length(A),length(B)]);

: cellfun ( , ):

N=10000; M=200;
A=cellstr(char(randi([97,122],[N,10])));  %// N random length 10 lowercase strings
B=cellstr(char(randi([97,122],[M,4])));   %// M random length 4 lowercase strings

tic;
AA=repmat(A,[1 length(B)]);
BB=repmat(B,[length(A) 1]);
C=reshape(cellfun(@(a,b) max([0,strfind(a,b)]),AA(:),BB(:)),[length(A),length(B)]); 
toc

Elapsed time is 21.91 seconds.

? regexp ? ismember ? ?

+4
1

, , , .

, :

A = {'one';
     'two';
     'three';
     'four'};
B = {'ee';
     'xx';
     'r'};

%// generate indices
n = numel(A);
m = numel(B);
[xi,yi] = ndgrid(1:n,1:m);

%// matching
Ax = A(xi);
By = B(yi);
temp = regexp(Ax,By,'start');

%// localize empty cell elements
%// cellfun+@isempty is quite fast
emptyElements = cellfun(@isempty, temp);

%// generate output
out = zeros(n,m);
out(~emptyElements) = [temp{:}];

out =

     0     0     0
     0     0     0
     4     0     3
     0     0     4
+4

Source: https://habr.com/ru/post/1617009/


All Articles