I am going to delve into my problem, you can go to TL; DR if you do not want to read all this
What am i trying to do
I need to save a “file” (text document) that can be edited by the user. If I have a source file (which can be huge)
Lorem ipsum dolor sit amet
and the user had to make changes:
Foo ipsum amet_ sit
Basically, I have a source line and a user-edited line. I want to find the differences, "edits." To prevent duplicate saving of very large rows. I want to keep the original and "edit." Then apply the changes to the original. This is similar to data deduplication. The problem is that I have no idea how there can be different changes, and I should also be able to apply these changes to the string.
Attempts
Since the text can be huge, I wonder what would be the most “effective” way to keep editing text without saving two separate versions. My first guess was something like:
var str = 'Original String of text...'.split(' ') || [], mod = 'Modified String of text...'.split(' ') || [], i, edits = []; for (i = 0; i < str.length; i += 1) { edits.push(str[i]===mod[i] ? undefined : mod[i]); } console.log(edits); // ["Modified", null, null, null] (desired output)
then go back:
for (i = 0; i < str.length; i += 1) { str[i] = edits[i] || str[i]; } str.join(' '); // "Modified String of text..."
Basically, I'm trying to split text with spaces into arrays. Compare arrays and save the differences. Then apply the differences to create the modified version
Problems
But if the number of spaces changed, problems arose:
str : Original String of text... mod : OriginalString of text...
Exit: OriginalString of text... text...
My desired result: OriginalString of text...
Even if I switched str.length to mod.length and edits.length as follows:
// Get edits var str = 'Original String of text...'.split(' ') || [], mod = 'Modified String of text...'.split(' ') || [], i, edits = []; for (i = 0; i < mod.length; i += 1) { edits.push(str[i]===mod[i] ? undefined : mod[i]); } // Apply edits var final = []; for (i = 0; i < edits.length; i += 1) { final[i] = edits[i] || str[i]; } final = final.join(' ');
edits will be: ["ModifiedString", "of", "text..."] as a result, making all "saving changes" useless. And even worse, if the word was added / deleted. If str became Original String of lots of text... The output will still be the same.
I see that they have many shortcomings in how I do this, but I can’t think of anything else.
Excerpt:
document.getElementById('go').onclick = function() { var str = document.getElementById('a').value.split(' ') || [], mod = document.getElementById('b').value.split(' ') || [], i, edits = []; for (i = 0; i < mod.length; i += 1) { edits.push(str[i] === mod[i] ? undefined : mod[i]); }
Base String: <input id="a"> <br/>Modified String: <input id="b" /> <br/> <button id="go">Second method</button> <button id="go2">First Method</button>
TL DR:
How do you find the changes between the two lines?
I am dealing with large fragments of text, each of which can be about one hundred megabytes in megabytes . This is done in the browser.