I have a div set to contentEditable and styled as " white-space:pre ", so it saves things like linebreaks. In Safari, FF, and IE, a div looks and works pretty much the same. All right. I want to extract text from this div, but in such a way as not to lose formatting - in particular, line breaks.
We use jQuery, whose text() function basically pre-configures DFS and glues all the contents in this DOM branch into one piece. This loses formatting.
I looked at the html() function, but it seems that all three browsers do different things with the actual HTML that is being created behind the scenes in my contentEditable div. Assuming I type this into my div:
1 2 3
Here are the results:
Safari 4:
1 <div>2</div> <div>3</div>
Firefox 3.6:
1 <br _moz_dirty=""> 2 <br _moz_dirty=""> 3 <br _moz_dirty=""> <br _moz_dirty="" type="_moz">
IE 8:
<P>1</P><P>2</P><P>3</P>
Ugh. There is nothing very consistent here. It's amazing that MSIE looks the most reasonable! (P title tag and all)
The div will dynamically set the style (font, color, size, and alignment) that is done using CSS, so I'm not sure if I can use the pre tag (which some pages link to from Google).
Does anyone know any JavaScript code and / or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve line breaks? I would prefer not to invent parsing if I don't need to.
Update: I cut the getText function from jQuery 1.4.2 and modified it to extract it with an empty space (I only chnaged one line, where I add a new line);
function extractTextWithWhitespace( elems ) { var ret = "", elem; for ( var i = 0; elems[i]; i++ ) { elem = elems[i]; // Get the text from text nodes and CDATA nodes if ( elem.nodeType === 3 || elem.nodeType === 4 ) { ret += elem.nodeValue + "\n"; // Traverse everything else, except comment nodes } else if ( elem.nodeType !== 8 ) { ret += extractTextWithWhitespace2( elem.childNodes ); } } return ret; }
I call this function and use its output to assign it to an XML node with jQuery, something like:
var extractedText = extractTextWithWhitespace($(this)); var $someXmlNode = $('<someXmlNode/>'); $someXmlNode.text(extractedText);
The resulting XML is ultimately sent to the server through an AJAX call.
This works well in Safari and Firefox.
In IE, it seems that only the first "\ n" is saved. Looking into it more, it seems that jQuery sets such text (line 4004 jQuery-1.4.2.js):
return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );
Reading on createTextNode , it looks like the IE implementation can blur the spaces. Is this true or am I doing something wrong?