One solution (and probably the best I came up with) is to post-process the inserted content. So, catch the publish event and fix all the crappy HTML - catch all the "mso-normal" styles, for example, and delete them. You will have a set of rules that clean the material, for example, MS Word.
Although this is not just a word processing problem. You paste one rich text editor into another, and the styles just don't get transferred between rich editing environments. This is not so much a technical problem as logical problems.
Update: Someone pointed me to this: Copying attachments to a web CMS . No real solutions, but just a confirmation that this is a sticky problem.
Deane source share