I have two inputs that are read on my command line, the first of which is a series of words that need to be searched in the program I am writing, and the second is a file that contains the words that should be found. So for example, my command prompt reads perl WebScan.pl word WebPage000.htm
Now I have no problems accessing one of these print inputs, but I am having difficulty accessing the contents of the web page, so I can execute regular expressions to remove the html tags and access the content. I understand that for this there is a routine without regular expressions, which is much more efficient, but I need to do regular expressions :(.
I can access the html file for printing without problems:
open (DATA, $ARGV[1]);
my @file = <DATA>;
print @file;
Which prints all the code of the html page, but I cannot pass regular expressions to remove the html blocks. I keep getting an error that says: “It is not possible to change the difference in the array in s /// side by side”, where I use my specific regular expression. I am not sure how to get around this. I tried converting the array to a scalar, but then I can’t access any data in html at all (and no, it doesn’t just print the number of value in the array: P)
How to access the contents of an array so that I can use regular expressions to refine the desired result?
source
share