Replacing large amounts of text in a browser

I am trying to develop a Firefox add-on that translates text on any page into a specific language. This is actually just a set of 2D arrays that I repeat and use this code

function escapeRegExp(str) { return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1"); } function replaceAll(find, replace) { return document.body.innerHTML.replace(new RegExp(escapeRegExp(find), 'g'), replace); } function convert2latin() { for (var i = 0; i < Table.length; i++) { document.body.innerHTML = replaceAll(Table[i][1], Table[i][0]); } } 

It works, and I can ignore HTML tags, as it can only be in English, but the problem is in performance. Of course, this is very bad. Since I have no experience in JS, I tried to make google and found that perhaps documentFragment could help.
Maybe I should use a different approach?

+1
source share
2 answers

Based on your comments, it seems you have already been told that the most expensive is to restore the DOM, which occurs when you completely replace the entire contents of the page (i.e. when you assign document.body.innerHTML ). You do this for every lookup. This causes Firefox to redraw the entire page for each replacement you make. You only need to assign document.body.innerHTML once, after you have completed all the replacements.

The following should provide a first pass when accelerating:

 function escapeRegExp(str) { return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1"); } function convert2latin() { newInnerHTML = document.body.innerHTML for (let i = 0; i < Table.length; i++) { newInnerHTML = newInnerHTML.replace(new RegExp(escapeRegExp(Table[i][1]), 'g'), Table[i][0]); } document.body.innerHTML = newInnerHTML } 

You note in the comments that there is no real need to use RegExp to match, so the following will be even faster:

 function convert2latin() { newInnerHTML = document.body.innerHTML for (let i = 0; i < Table.length; i++) { newInnerHTML = newInnerHTML.replace(Table[i][1], Table[i][0]); } document.body.innerHTML = newInnerHTML } 

If you really need to use RegExp to match, and you will perform these exact substitutions several times, you are better off creating all RegExp before first use (for example, when a Table is created / modified) and saving them (for example, in Table[i][2] )

However, assigning document.body.innerHTML is a bad way to do this:

As mentioned in 8472, replacing the entire contents of document.body.innerHTML is a very difficult task to accomplish this task, which has some significant drawbacks, including probably a violation of the functionality of other JavaScript on the page and possible security problems. A better solution would be to change only the textContent text nodes.

One way to do this is to use TreeWalker . The code for this might be something like this:

 function convert2latin(text) { for (let i = 0; i < Table.length; i++) { text = text.replace(Table[i][1], Table[i][0]); } return text } //Create the TreeWalker let treeWalker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT,{ acceptNode: function(node) { if(node.textContent.length === 0 || node.parentNode.nodeName === 'SCRIPT' || node.parentNode.nodeName === 'STYLE' ) { //Don't include 0 length, <script>, or <style> text nodes. return NodeFilter.FILTER_SKIP; } //else return NodeFilter.FILTER_ACCEPT; } }, false ); //Make a list of nodes prior to modifying the DOM. Once the DOM is modified the TreeWalker // can become invalid (ie stop after the first modification). Doing so is not needed // in this case, but is a good habit for when it is needed. let nodeList=[]; while(treeWalker.nextNode()) { nodeList.push(treeWalker.currentNode); } //Iterate over all text nodes, changing the textContent of the text nodes nodeList.forEach(function(el){ el.textContent = convert2latin(el.textContent)); }); 
+2
source

Do not use innerhtml, it will destroy any javascript event handlers registered on the DOM nodes, or make links to dom nodes stored on the javascript page obsolete. In other words, you can easily break the page with this. And of course, this is inefficient.

You can use treewalker and filter only text nodes. Circulation can be increased by delaying the next step using window.setTimeout every 1000th node text or something like that.

If you register your addon script early enough, you can also use the mutation observer to receive notifications about text nodes as soon as they get inserted and replace them in stages, which should make things less noisy.

+1
source

Source: https://habr.com/ru/post/1259672/


All Articles