Software comparison of text documents

I need to compare two office documents, in this case, two-word documents and provide a difference that is somewhat similar to what is displayed in SVN. Not so, but at least differences can be distinguished.

I tried using office dll COM and got this far.

object fileToOpen = (object)@"D:\doc1.docx"; string fileToCompare = @"D:\doc2.docx"; WRD.Application WA = new WRD.Application(); Document wordDoc = null; wordDoc = WA.Documents.Open(ref fileToOpen, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing); wordDoc.Compare(fileToCompare, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing); 

What are some tips for further advancement? It will be a web application with a lot of hits. Is using the office com object the right way, or are there other things that I can look at?

+6
source share
6 answers

I agree with Joseph that he distinguishes the line. I would also recommend a specially created various engine (several found here: Any decent text diff / merge mechanism for .NET? ), Which can help you avoid some normal pitfalls to varying degrees.

+1
source

You must use the Document class to compare files and open the result in a Word document.

 using OfficeWord = Microsoft.Office.Interop.Word; object fileToOpen = (object)@"D:\doc1.docx"; string fileToCompare = @"D:\doc2.docx"; var app = Global.OfficeFile.WordApp; object readOnly = false; object AddToRecent = false; object Visible = false; OfficeWord.Document docZero = app.Documents.Open(fileToOpen, ref missing, ref readOnly, ref AddToRecent, Visible: ref Visible); docZero.Final = false; docZero.TrackRevisions = true; docZero.ShowRevisions = true; docZero.PrintRevisions = true; //the OfficeWord.WdCompareTargetNew defines a new file, you can change this valid value to change how word will open the document docZero.Compare(fileToCompare, missing, OfficeWord.WdCompareTarget.wdCompareTargetNew, true, false, false, false, false); 
+3
source

So, my requirements were that I had to use .Net lib, and I wanted to not work with actual files, but work with streams.

ZipArchive is located in System.IO.Compressed

What I did, and it worked out pretty well, was using ZipArchive from .Net and comparing the contents when skipping the .rels file, because it seems that it is created randomly for each file creation. Here is my snippet:

  private static bool AreWordFilesSame(byte[] wordA, byte[] wordB) { using (var streamA = new MemoryStream(wordA)) using (var streamB = new MemoryStream(wordB)) using (var zipA = new ZipArchive(streamA)) using (var zipB = new ZipArchive(streamB)) { streamA.Seek(0, SeekOrigin.Begin); streamB.Seek(0, SeekOrigin.Begin); for(int i = 0; i < zipA.Entries.Count; ++i) { Assert.AreEqual(zipA.Entries[i].Name, zipB.Entries[i].Name); if (zipA.Entries[i].Name.EndsWith(".rels")) //These are some weird word files with autogenerated hashes { continue; } var streamFromA = zipA.Entries[i].Open(); var streamFromB = zipB.Entries[i].Open(); using (var readerA = new StreamReader(streamFromA)) using (var readerB = new StreamReader(streamFromB)) { var bytesA = readerA.ReadToEnd(); var bytesB = readerB.ReadToEnd(); if (bytesA != bytesB || bytesA.Length == 0) { return false; } } } return true; } } 
+1
source

You really have to extract the document into a string and distinguish between it.

You do not need only text changes, not formatting?

0
source

To compare Word documents, you need to

  • A library for managing a Word document, for example. read paragraphs, text, tables, etc. from a Word file. You can try Office Interop, OpenXML, or Aspose.Words for .NET .
  • Algorithm / library to perform the actual comparison, in text obtained from both Word documents. You can write your own or use a library like DiffMatchPatch or the like.

This question is old, now there are more solutions like GroupDocs Availability .

Compare Documents by Aspose.Words for .NET is an open source project that uses Aspose.Words and DiffMatchPatch comparisons to compare.

I work at Aspose as an evangelist developer.

0
source

To solve on a server or without installing Word and using COM tools, you can use the WmlComparer XmlPowerTools component.

The documentation is a bit limited, but here is an example usage:

 var expected = File.ReadAllBytes(@"c:\expected.docx"); var actual = File.ReadAllBytes(@"c:\result.docx"); var expectedresult = new WmlDocument("expected.docx", expected); var actualDocument = new WmlDocument("result.docx", actual); var comparisonSettings = new WmlComparerSettings(); var comparisonResults = WmlComparer.Compare(expectedresult, actualDocument, comparisonSettings); var revisions = WmlComparer.GetRevisions(comparisonResults, comparisonSettings); 

which will show you the differences between the two documents.

0
source

Source: https://habr.com/ru/post/902216/


All Articles