I have a sorted list of 1,000,000 rows with a maximum length of 256 with protein names. Each row has an associated identifier. I have another unsorted list of 4,000,000,000 lines with a maximum length of 256 words from words, and each word has an identifier.
I want to find all the matches between the list of protein names and the list of words from the articles. Which algorithm should I use? Should I use some pre-build API?
It would be nice if the algorithm worked on a regular PC without special equipment.
Estimates of the time required by the algorithm would be good, but not necessary.
source share