First question: it depends on what exactly goes on the line.
In PHP (up to PHP5, anyway) strings are just sequences of bytes. An implied or explicit character set is not associated with them; what the programmer should keep track of. So, if you put only valid UTF-8 bytes between quotation marks (quite easily if the file itself is encoded as UTF-8), then the line will be UTF-8, and you can safely use mb_strlen () on it.
Also, if you use the mbstring functions, you need to explicitly specify which character your string specifies, either mbstring.internal_encoding or as the last argument to any mbstring function.
The second question: yes, with reservations.
Two strings that are independently valid UTF-8 can be safely combined bytes (for example, with the PHP operator . ) And are still valid for UTF-8. However, you can never be sure without doing any work yourself that the POSTed line is valid UTF-8. Database rows are a little simpler if you carefully set the connection character set, since most DBMSs will do any conversion for you.
chazomaticus Feb 17 '09 at 18:26 2009-02-17 18:26
source share