I programmatically clear some basic grammar in comments and other user submitted content. Capital letter I, first letter of sentence, etc. Comments and content are mixed with HTML, as users have some options in formatting their text.
This actually proves a bit more complex than expected, especially for someone new to PHP and regex.
If there is a function like ucfirst that will ignore html to help capitalize?
Any links or guides to clear text like this in html will also be appreciated. Please leave all that you think will help in the comments. thanks!
EDIT: Sample text:
<div><p>i wuz walkin thru the PaRK and found <strong>ur dog</strong>. <br />i hoPe to get a reward.<br /> plz call or text 7zero4 8two8 49 sevenseven</div>
I need it to be (ultimately)
<div><p>I was walking through the park and found <strong>your dog<strong>. <p>I hope to get a reward.</p><p> Please call or text (704) 828-4977.</p>
I know this goes a little further than the supposed question, but I thought about it gradually. ucfirst () is just one of many functions that I used to do a little cleanup at a time for scanning. Even if I had to run the text 100 times through the filter, this is done when cron starts, when there is no traffic on the site. I would like there to be a discussion forum where this could continue, because obviously there would be great ideas regarding the continuation of the approach. Any thoughts on how to approach this as a common project, please leave a comment.
I think in the spirit of the question itself. ucfirst would not be the best function to do this, since it cannot accept a list of arguments to ignore. The IGNORE_HTML flag will be great!
Given this is a PHP question, then the DOM parser recommended below sounds like the best answer? Thoughts?
source share