After work and re-work, this is what I have now. It does everything I want, except that I want to keep spaces in the middle of the input lines to distinguish between words.
open FILE, "funnywords.txt"; # Iterate through funnywords.txt while ( <FILE> ) { chomp; # Show initial text from file print "In: '$_' -> "; my $inputString = $_; # $inputString is scoped within a for each loop which dissects # unicode characters ( example: "é" splits into "e" and "´" ) # and throws away accent marks. Also replaces all # non-alphanumeric characters with spaces and removes # extraneous periods and spaces. for ( $inputString ) { $inputString = NFD( $inputString ); # decompose/dissect s/^\s
Not quite sure why it colors some regular expressions and comments ...
Here are some sample lines from "funnywords.txt":
Kui
22.
? ÉÉíóñúÑ¿¡
[.this? ]
aquí, aLLí
source share