If you control both the input and output sides of this database, you should be able to verify that your UTF-8 data is on the side you like and apply restrictions if necessary. If you are dealing with a system in which you do not control the input side, you will have to check it after you pull it out and possibly convert it to your language of choice (Perl it looks like).
The database is REALLY good storage, but should not be used aggressively for other applications. I think this is one place where you should just let MySQL store data until you need to do something further.
If you want to continue the path you are on, check out this MySQL manual page: http://dev.mysql.com/doc/refman/5.0/en/regexp.html
REGEX is generally VERY similar between languages (in fact, I almost always copy between JavaScript, PHP and Perl with a few adjustments for my transfer functions), so if it works with REGEX, you can easily transfer it.
GL!
EDIT: Look at this Stack article - you can use stored procedures, given that you cannot use scripts to process data: Regular expressions in stored procedures
Using stored procedures, you can scroll through the data and do a lot of processing without exiting MySQL. This second article will direct you back to the one I listed, so I think you need to check your REGEX first and make it work, and then look into the Stored Procedures.
Shane source share