Are all MS Word documents serialized in XML?

I am trying to understand how Word files are restored when opening Microsoft Word and in what format they are serialized when editing changes and closing the file. Any information you might have would be very helpful to me? Thanks

+4
source share
2 answers

All .doc files are stored in binary format . Opening and manipulating them is an exercise in PAIN.

All .docx files are a collection of XML files stored in ZIP format. That's right, just change the extension .docx or .xmlx or .pptx to .ZIP, and you can open the file just like any other ZIP file. MS even has an API for those formats called Office Open XML . Personally, I think the OOXML API has a pretty steep learning curve, and when I tend to make Word files or manipulate them in some other way, I just make a sample file, unzip it and manipulate its internals. IMO the basics of OOXML files are simple enough to use without the big old API ...

+5
source

Are all MS Word documents serialized in XML format?

The short answer is no.

Long answer: on each issue MS changed the format of text documents. Thus, Word 6.0 - 95 uses the format, Word 97 - 2002 (aka XP) use a different one, another - 2003, and another 2007 - another one.

Of course, each version can save and open documents in older formats (although new features usually cannot be saved in such old formats).

Formats before 2003 (.doc) are incremental updates of the previous ones and are based on a binary format.

The format introduced in Office 2007 (.docx) is XML-based and was enforced as the ISO standard "ISO / IEC 29500: 2008 Office Open XML", although the word itself does not fully comply with this standard. Note that Word 2007 can save (and open) documents in older binary formats.

Hope this helps.

+2
source

Source: https://habr.com/ru/post/1307293/


All Articles