Extract text from contentEditable div

Question

Extract text from contentEditable div

I have a div set to contentEditable and styled as " white-space:pre ", so it saves things like linebreaks. In Safari, FF, and IE, a div looks and works pretty much the same. All right. I want to extract text from this div, but in such a way as not to lose formatting - in particular, line breaks.

We use jQuery, whose text() function basically pre-configures DFS and glues all the contents in this DOM branch into one piece. This loses formatting.

I looked at the html() function, but it seems that all three browsers do different things with the actual HTML that is being created behind the scenes in my contentEditable div. Assuming I type this into my div:

 1 2 3

Here are the results:

Safari 4:

 1 <div>2</div> <div>3</div>

Firefox 3.6:

 1 <br _moz_dirty=""> 2 <br _moz_dirty=""> 3 <br _moz_dirty=""> <br _moz_dirty="" type="_moz">

IE 8:

 <P>1</P><P>2</P><P>3</P>

Ugh. There is nothing very consistent here. It's amazing that MSIE looks the most reasonable! (P title tag and all)

The div will dynamically set the style (font, color, size, and alignment) that is done using CSS, so I'm not sure if I can use the pre tag (which some pages link to from Google).

Does anyone know any JavaScript code and / or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve line breaks? I would prefer not to invent parsing if I don't need to.

Update: I cut the getText function from jQuery 1.4.2 and modified it to extract it with an empty space (I only chnaged one line, where I add a new line);

 function extractTextWithWhitespace( elems ) { var ret = "", elem; for ( var i = 0; elems[i]; i++ ) { elem = elems[i]; // Get the text from text nodes and CDATA nodes if ( elem.nodeType === 3 || elem.nodeType === 4 ) { ret += elem.nodeValue + "\n"; // Traverse everything else, except comment nodes } else if ( elem.nodeType !== 8 ) { ret += extractTextWithWhitespace2( elem.childNodes ); } } return ret; }

I call this function and use its output to assign it to an XML node with jQuery, something like:

 var extractedText = extractTextWithWhitespace($(this)); var $someXmlNode = $('<someXmlNode/>'); $someXmlNode.text(extractedText);

The resulting XML is ultimately sent to the server through an AJAX call.

This works well in Safari and Firefox.

In IE, it seems that only the first "\ n" is saved. Looking into it more, it seems that jQuery sets such text (line 4004 jQuery-1.4.2.js):

 return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );

Reading on createTextNode , it looks like the IE implementation can blur the spaces. Is this true or am I doing something wrong?

+43

javascript jquery html css contenteditable

Shaggy Frog Aug 11 '10 at 6:48

source share

6 answers

Unfortunately, you still have to handle this for the pre case separately for each browser (in many cases I do not allow browser detection, use function detection ... but in this case it is necessary), but, fortunately, you can take care of them everything is pretty brief, for example:

 var ce = $("<pre />").html($("#edit").html()); if($.browser.webkit) ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; }); if($.browser.msie) ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; }); if($.browser.mozilla || $.browser.opera ||$.browser.msie ) ce.find("br").replaceWith("\n"); var textWithWhiteSpaceIntact = ce.text();

You can check it out here . IE, in particular, is challenging because the   and new lines in the text conversion, so it processes the above <br> to make it consistent, so it needs 2 passes for proper processing.

The #edit above has the identifier of the contentEditable component, so just change this or make it a function, for example:

 function getContentEditableText(id) { var ce = $("<pre />").html($("#" + id).html()); if ($.browser.webkit) ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; }); if ($.browser.msie) ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; }); if ($.browser.mozilla || $.browser.opera || $.browser.msie) ce.find("br").replaceWith("\n"); return ce.text(); }

You can check it out here . Or, since it is built on jQuery methods anyway, make it a plugin, for example:

 $.fn.getPreText = function () { var ce = $("<pre />").html(this.html()); if ($.browser.webkit) ce.find("div").replaceWith(function() { return "\n" + this.innerHTML; }); if ($.browser.msie) ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; }); if ($.browser.mozilla || $.browser.opera || $.browser.msie) ce.find("br").replaceWith("\n"); return ce.text(); };

Then you can just call it $("#edit").getPreText() to check this version here .

+34

Nick Craver Nov 12 '10 at 10:45

source share

see this fiddle

Or this message

How to parse editable div text with browser support

created after a lot of effort ...........

+1

user10 Oct 11

source share

I discovered this today in Firefox:

I pass a contenteditable div for which white space is set to "pre" for this function, and it works dramatically.

I added a line to show how many nodes there are, and a button that puts the output in another PRE to prove that line breaks are not damaged.

It basically says:

 For each child node of the DIV, if it contains the 'data' property, add the data value to the output otherwise add an LF (or a CRLF for Windows) } and return the result.

There is a problem, tho. When you press enter at the end of any line of source text, instead of pasting LF, it puts "Â" inside. You can press Enter again and it puts LF there, but not the first time. And you need to remove "Â" (it looks like a space). Go shapes - I think a mistake.

This does not happen in IE8. (change textContent to innerText) There is another error, tho. When you press enter, it breaks the node into 2 nodes, as it does in Firefox, but the data property of each of these nodes becomes "undefined".

I am sure that much more is happening here than it seems at first glance, so any contribution to this issue will be enlightened.

 <!DOCTYPE html> <html> <HEAD> <SCRIPT type="text/javascript"> function htmlToText(elem) { var outText=""; for(var x=0; x<elem.childNodes.length; x++){ if(elem.childNodes[x].data){ outText+=elem.childNodes[x].data; }else{ outText+="\n"; } } alert(elem.childNodes.length + " Nodes: \r\n\r\n" + outText); return(outText); } </SCRIPT> </HEAD> <body> <div style="white-space:pre;" contenteditable=true id=test>Text in a pre element is displayed in a fixed-width font, and it preserves both spaces and line breaks </DIV> <INPUT type=button value="submit" onclick="document.getElementById('test2').textContent=htmlToText(document.getElementById('test'))"> <PRE id=test2> </PRE> </body> </html>

+1

alfadog67 May 2, '13 at 23:27

source share

here's a solution (using underscore and jquery) that works on iOS Safari (iOS 7 and 8), Safari 8, Chrome 43, and Firefox 36 on OS X and IE6-11 on Windows:

 _.reduce($editable.contents(), function(text, node) { return text + (node.nodeValue || '\n' + (_.isString(node.textContent) ? node.textContent : node.innerHTML)); }, '')

see the test page here: http://brokendisk.com/code/contenteditable.html

although I believe that the real answer is that if you are not interested in the markup provided by the browser, you should not use the contenteditable attribute - the text box would be a suitable tool for the job.

0

Jon z Feb 25 '15 at 1:21

source share

 this.editableVal = function(cont, opts) { if (!cont) return ''; var el = cont.firstChild; var v = ''; var contTag = new RegExp('^(DIV|P|LI|OL|TR|TD|BLOCKQUOTE)$'); while (el) { switch (el.nodeType) { case 3: var str = el.data.replace(/^\n|\n$/g, ' ').replace(/[\n\xa0]/g, ' ').replace(/[ ]+/g, ' '); v += str; break; case 1: var str = this.editableVal(el); if (el.tagName && el.tagName.match(contTag) && str) { if (str.substr(-1) != '\n') { str += '\n'; } var prev = el.previousSibling; while (prev && prev.nodeType == 3 && PHP.trim(prev.nodeValue) == '') { prev = prev.previousSibling; } if (prev && !(prev.tagName && (prev.tagName.match(contTag) || prev.tagName == 'BR'))) { str = '\n' + str; } }else if (el.tagName == 'BR') { str += '\n'; } v += str; break; } el = el.nextSibling; } return v; }

0

Artur Vanesyan Jun 14 '17 at 14:45

source share

Shaggy Frog · Accepted Answer · 2010-11-10 00:16

I forgot about this issue until now, when Nico hit him with generosity.

I solved the problem by writing the function that I needed, and cut the function from the existing jQuery code base and modified it to work as needed.

I tested this feature with Safari (WebKit), IE, Firefox, and Opera. I did not check any other browsers, since all the information for the content is non-standard. It is also possible that an update for any browser may interfere with this function if they change the way they implement contentEditable. Therefore, programmers beware.

 function extractTextWithWhitespace(elems) { var lineBreakNodeName = "BR"; // Use <br> as a default if ($.browser.webkit) { lineBreakNodeName = "DIV"; } else if ($.browser.msie) { lineBreakNodeName = "P"; } else if ($.browser.mozilla) { lineBreakNodeName = "BR"; } else if ($.browser.opera) { lineBreakNodeName = "P"; } var extractedText = extractTextWithWhitespaceWorker(elems, lineBreakNodeName); return extractedText; } // Cribbed from jQuery 1.4.2 (getText) and modified to retain whitespace function extractTextWithWhitespaceWorker(elems, lineBreakNodeName) { var ret = ""; var elem; for (var i = 0; elems[i]; i++) { elem = elems[i]; if (elem.nodeType === 3 // text node || elem.nodeType === 4) // CDATA node { ret += elem.nodeValue; } if (elem.nodeName === lineBreakNodeName) { ret += "\n"; } if (elem.nodeType !== 8) // comment node { ret += extractTextWithWhitespace(elem.childNodes, lineBreakNodeName); } } return ret; }

Extract text from contentEditable div

More articles: