How to get rid of copying and pasting text in ajax html editor

I am using ajax html editor for the news description page. When I copy an insert from a word or the Internet, it copies the style of this text, paragraph, etc., which overcomes the default class style of the text block of the HTML editor. What I want is to get rid of the inline style as shown below, but not for the html that is
I want to keep this in paragraph

<span id="ContentPlaceHolder1_newsDetaildesc" class="newsDetails"><span style="font-family: arial, helvetica, sans; font-size: 11px; line-height: 14px; color: #000000; "><strong>Lorem Ipsum</strong>&nbsp;is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.<BR /> It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</span></span></p> 

#left_column .newsDetails span[style] { font-family: Arial !important; font-size: small !important; font-weight: normal !important; color: #808080 !important; }

+6
source share
6 answers

First, keep in mind that the HTML received when pasting from Word (or any other HTML source) will vary greatly depending on the source. Even different versions of Word will give you a radically different entry. If you have developed code that works fine on content from the version of MS Word that you have, it may not work at all for another version of MS Word.

In addition, some sources will embed HTML-like content, but actually garbage. When you paste HTML content into a rich text area in your browser, your browser has nothing to do with how this HTML code is generated. Do not expect this to be valid in any part of your imagination. In addition, your browser will promote HTML as it is pasted into the DOM area of ​​your rich text.

Since the potential inputs are very different, and because the acceptable outputs are difficult to determine, it is difficult to create a suitable filter for these kinds of things. In addition, you cannot control how future versions of MS Word will process their HTML content, so your code will be difficult for the future.

However, with a heart! If all the world's problems were easy, it would be a rather boring place. There are some potential solutions. You can save the good parts of HTML and discard the bad parts.

It looks like your HTML-based RTE works like most HTML editors do. In particular, it has an iframe, and in the document inside the iframe, he set designMode to "on".

You want to catch the paste event when it occurs in the <body> element of the document inside this iframe. I was very specific here, because it should be: do not catch it on an iframe; do not delay it in the iframe; don't linger it on iframe document. Trap in the <body> element of the document inside the iframe. Very important.

 var iframe = your.rich.text.editor.getIframe(), // or whatever win = iframe.contentWindow, doc = win.document, body = doc.body; // Use your favorite library to attach events. Don't actually do this // yourself. But if you did do it yourself, this is how it would be done. if (win.addEventListener) { body.addEventListener('paste', handlePaste, false); } else { body.attachEvent("onpaste", handlePaste); } 

Note that in my code example, a function called handlePaste was added. We get to this. The insert event is funny: some browsers fire it before pasting, after which some browsers fire it. You want to normalize this, so you always deal with the inserted content after the insert. To do this, use the timeout method.

 function handlePaste() { window.setTimeout(filterHTML, 50); } 

So, 50 milliseconds after the insert event, the filterHTML function will be called. This is the meat of the job: you need to filter out the HTML code and remove any unwanted styles or elements. You have something to worry about!

I personally saw MSWord paste in these elements:

  • meta
  • link
  • style
  • o:p (paragraph in another namespace)
  • shapetype
  • shape
  • Comments, for example <!-- comment --> .
  • font
  • And, of course, the MsoNormal class.

The filterHTML function should remove them when necessary. You can also remove other items that you consider necessary. Here is an example filterHTML that removes the elements listed above.

 // Your favorite JavaScript library probably has these utility functions. // Feel free to use them. I'm including them here so this example will // be library-agnostic. function collectionToArray(col) { var x, output = []; for (x = 0; x < col.length; x += 1) { output[x] = col[x]; } return output; } // Another utility function probably covered by your favorite library. function trimString(s) { return s.replace(/^\s\s*/, '').replace(/\s\s*$/, ''); } function filterHTML() { var iframe = your.rich.text.editor.getIframe(), win = iframe.contentWindow, doc = win.document, invalidClass = /(?:^| )msonormal(?:$| )/gi, cursor, nodes = []; // This is a depth-first, pre-order search of the document body. // While searching, we want to remove invalid elements and comments. // We also want to remove invalid classNames. // We also want to remove font elements, but preserve their contents. nodes = collectionToArray(doc.body.childNodes); while (nodes.length) { cursor = nodes.shift(); switch (cursor.nodeName.toLowerCase()) { // Remove these invalid elements. case 'meta': case 'link': case 'style': case 'o:p': case 'shapetype': case 'shape': case '#comment': cursor.parentNode.removeChild(cursor); break; // Remove font elements but preserve their contents. case 'font': // Make sure we scan these child nodes too! nodes.unshift.apply( nodes, collectionToArray(cursor.childNodes) ); while (cursor.lastChild) { if (cursor.nextSibling) { cursor.parentNode.insertBefore( cursor.lastChild, cursor.nextSibling ); } else { cursor.parentNode.appendChild(cursor.lastChild); } } break; default: if (cursor.nodeType === 1) { // Remove all inline styles cursor.removeAttribute('style'); // OR: remove a specific inline style cursor.style.fontFamily = ''; // Remove invalid class names. invalidClass.lastIndex = 0; if ( cursor.className && invalidClass.test(cursor.className) ) { cursor.className = trimString( cursor.className.replace(invalidClass, '') ); if (cursor.className === '') { cursor.removeAttribute('class'); } } // Also scan child nodes of this node. nodes.unshift.apply( nodes, collectionToArray(cursor.childNodes) ); } } } } 

You included some HTML sample that you want to filter, but you did not specify the output sample that you would like to see. If you update your question to show how you want your sample to look after filtering, I will try to configure the filterHTML function to match. For now, consider this feature as a starting point for developing your own filters.

Please note that this code does not attempt to distinguish pasted content from content that existed prior to pasting. It does not need to be done; the things that he removes are considered invalid wherever they appear.

An alternative solution would be to filter out these styles and contents with regular expressions against the innerHTML of the document body. I have taken this path, and I advise against this in favor of the solution that I present here. The HTML that you get when you paste will change so much that regular expression parsing will quickly run into serious problems.


Edit:

I think now I see: you are trying to remove the style attributes themselves, right? If so, you can do this during the filterHTML function by including this line:

 cursor.removeAttribute('style'); 

Or you can target individual inline styles for deletion as follows:

 cursor.style.fontFamily = ''; 

I updated the filterHTML function to show where these lines will go.

Good luck and happy coding!

+8
source

Here is a potential solution that removes text from HTML. It works by first copying the text as HTML into the element (which should probably be hidden, but shown for comparison in my example). Then you will get the inner text of this element. You can then put this text in your editor wherever you want. You will need to capture the insert event in the editor, run this sequence to get the text, and then put that text wherever you like in your editor.

Here's a scenario for an example of how to do it: Get the text from HTML

+4
source

If you use Firefox, you can install this extension: https://addons.mozilla.org/en-US/firefox/addon/extended-copy-menu-fix-vers/ . It allows you to copy text from any site without formatting.

+2
source

Typically, with support for HTML editing by end users, I decided to use one of several solid client-side HTML controls that already have the necessary functionality to handle such things. There are a number of commercial versions, such as Component Art , as well as several free, open source versions, such as CKEditor .

All good ones have solid paste support from Word to cut / fix this excessive CSS. I would just use one (easy way) or see how they do it (hard way).

+2
source

I always get this kind of problem, it's interesting. Well, how I do it is very simple, just open Notepad in the windows and paste the text into Notepad and copy it into the AJAX text editor. This will erase all text.

:)

+1
source

From what I understand from your question, you are using the WYSIWYG editor. And when copying and pasting text from other web pages or text documents, you get some ugly html with built-in styles, etc.

I would suggest that you don't worry about this at all, because it's a mess to solve this cross-browser problem. If you really want to fix this, although I would recommend using TinyMCE, which got this exact behavior that you want.

You can try it in action by visiting http://tinymce.moxiecode.com/tryit/full.php and just copy the text to the editor and then send it all to see the generated HTML. He is clean.

TinyMCE is probably the best WYSIWYG editor you will find imo. Therefore, instead of creating something yourself, just use it and customize it according to your specific needs.

+1
source

Source: https://habr.com/ru/post/888998/


All Articles