You do not want to use htmlentities immediately, I would use this in the data in the last paragraph before you save it. One of the problems you will encounter is people who do not always code their objects properly. Not everyone uses and trades; they just copy the trademark. If you put some kind of logic to try to grab everything that they put in and encode correctly, you might be better off. For instance:
$patterns = array(); $patterns[0] = '/—/'; $patterns[1] = '/&nsbsp;/'; $patterns[2] = '/®/'; $replacements = array(); $replacements[2] = '&151;'; $replacements[1] = '&160;'; $replacements[0] = '&174;'; $ourhtml = preg_replace($patterns, $replacements, $html);
You can find all the gotcha characters, such as dashes and single quotes, apostrophes, etc., and encode them manually, as well as use the standard set for objects (text or numeric).
You can also use regular expressions to do the same, and would probably be a more elegant solution. But my suggestion would be to take some time, filtering out what you do not want manually, and then you know that your data will be prepared exactly the way you like.
source share