from this list of nouns in the .txt file, where nouns are separated by new lines, such as:
hooligan
football
brother
bollocks
... and a separate .txt file containing a series of regular expressions, separated by new lines, for example:
[a-z]+\tNN(S)?
[a-z]+\tJJ(S)?
... I would like to run regular expressions through every sentence of the corpus and every time a regular expression matches a pattern, if this pattern contains one of the nouns in the list of nouns, I would like to print this noun in the output and (divided by it on the tab) regular an expression that matched him. Here is an example of how the resulting result can be:
football [a-z]+NN(S)?\ POS[a-z]+NN(S)?
hooligan [a-z]+NN(S)?,,[a-z]+JJ[a-z]+NN(S)?
hooligan [a-z]+NN(S)?,,[a-z]+JJ[a-z]+NN(S)?
football [a-z]+NN(S)?[a-z]+NN(S)?
brother [a-z]+PP$[a-z]+NN(S)?
bollocks [a-z]+DT[a-z]+NN(S)?
football [a-z]+NN(s)?(be)VBZnotRB
, , ( ) ( <s>
):
<s>
Hooligans hooligan NNS 1 4 NMOD
, , , 2 4 P
unbridled unbridled JJ 3 4 NMOD
passion passion NN 4 0 ROOT
- - : 5 4 P
and and CC 6 4 CC
no no DT 7 9 NMOD
executive executive JJ 8 9 NMOD
boxes box NNS 9 4 COORD
. . SENT 10 0 ROOT
</s>
<s>
Hooligans hooligan NNS 1 4 NMOD
, , , 2 4 P
unbridled unbridled JJ 3 4 NMOD
passion passion NN 4 0 ROOT
- - : 5 4 P
and and CC 6 4 CC
no no DT 7 9 NMOD
executive executive JJ 8 9 NMOD
boxes box NNS 9 4 COORD
. . SENT 10 0 ROOT
</s>
<s>
Portsmouth Portsmouth NP 1 2 SBJ
bring bring VVP 2 0 ROOT
something something NN 3 2 OBJ
entirely entirely RB 4 5 AMOD
different different JJ 5 3 NMOD
to to TO 6 5 AMOD
the the DT 7 12 NMOD
Premiership Premiership NP 8 12 NMOD
: : : 9 12 P
football football NN 10 12 NMOD
POS 11 10 NMOD
past past NN 12 6 PMOD
. . SENT 13 2 P
</s>
<s>
This this DT 1 2 SBJ
is be VBZ 2 0 ROOT
one one CD 3 2 PRD
of of IN 4 3 NMOD
Britain Britain NP 5 10 NMOD
POS 6 5 NMOD
most most RBS 7 8 AMOD
ardent ardent JJ 8 10 NMOD
football football NN 9 10 NMOD
cities city NNS 10 4 PMOD
: : : 11 2 P
think think VVP 12 2 COORD
Liverpool Liverpool NP 13 0 ROOT
or or CC 14 13 CC
Newcastle Newcastle NP 15 19 SBJ
in in IN 16 15 ADV
miniature miniature NN 17 16 PMOD
, , , 18 15 P
wound wind VVD 19 13 COORD
back back RB 20 19 ADV
three three CD 21 22 NMOD
decades decade NNS 22 19 OBJ
. . SENT 23 2 P
</s>
script PERL , , , Tie:: File, script ( , ). , , , .
, , unix (, cat grep)? , ? ( ).