You can use (f) lex to create a DFA that recognizes all literals in parallel. This may seem complicated if there are too many wildcards, but it works up to about 100 literals (for a 4-letter alphabet, perhaps more for natural text). You can disable the default action (ECHO) and only print lines + column numbers of matches.
[I assume grep -F does pretty much the same thing)
%{ #include <stdio.h> %} %% "TTGATTCACCAGCGCGTATTGTC" { printf("@%d: %d:%s\n", yylineno, yycolumn, "OMG! the TTGA pattern again" ); } "AGGTATCTGCTTCAATCAGCG" { printf("@%d: %d:%s\n", yylineno, yycolumn, "WTF?!" ); } ... more lines ... [bd-fh-su-z]+ {;} [ \t\r\n]+ {;} . {;} %% int main(void) { /* Call the lexer, then quit. */ yylex(); return 0; }
A script like the one above can be generated using txt input with awk or any other script language.
source share