I am working on a data mining algorithm where I need to tokenize a string using a few words. I have a separate file containing all the temporary words. I need to do this in order to tokenize the input string with any word (stopwatch) working as a delimiter. For instance,
If the file contains temporary words like
this is
and
of
what
and the input line is "a computer cluster consists of a set of loosely coupled computers that work together"
the result should be a computer cluster consists of a set
loosely coupled computers
to work together
Checking a string in all recursive seconds will be very time consuming? Is there a good method to do this?
source share