Question: I have two large arrays of row cells Aand B. I want to find out the fastest way to determine which items in Awhat are in B. In particular, can this be done without a loop?
Minimum example: (mine are actual Aand Bcontain 7,000,000 and 22,000 lines respectively)
A = {'one';
'two';
'three';
'four'};
B = {'ee';
'xx';
'r'};
The desired result for an example would be
C = [ 0 0 0 ;
0 0 0 ;
1 0 1 ;
0 0 1 ];
where rows and columns Ccorrespond to elements Aand Brespectively. For my purpose, I only need a true / false answer, but bonus points if it Creturns the first index, where the line in Bis in A, for example:
C = [ 0 0 0 ;
0 0 0 ;
4 0 3 ;
0 0 4 ];
, : , , , , regexp - , . , :
for i=1:length(A);
for j=1:length(B);
C(i,j) = max([0,strfind(A{i},B{j})]); disp(C(i,j));
end
end
, , cellfun:
AA = repmat(A,[1 length(B)]);
BB = repmat(B,[length(A) 1]);
C = reshape(cellfun(@(a,b) max([0,strfind(a,b)]),AA(:),BB(:)),[length(A),length(B)]);
:
cellfun ( , ):
N=10000; M=200;
A=cellstr(char(randi([97,122],[N,10]))); %// N random length 10 lowercase strings
B=cellstr(char(randi([97,122],[M,4]))); %// M random length 4 lowercase strings
tic;
AA=repmat(A,[1 length(B)]);
BB=repmat(B,[length(A) 1]);
C=reshape(cellfun(@(a,b) max([0,strfind(a,b)]),AA(:),BB(:)),[length(A),length(B)]);
toc
Elapsed time is 21.91 seconds.
? regexp ? ismember ? ?