How to parse mathML in WordOpenXML output?

I want to read only the xml used to create the equation that I got with Paragraph.Range.WordOpenXML . But the section used for the equation does not correspond to MathML , which, as I discovered, Equation microsoft is in MathML .

Do I need to use a special converter to get the desired xmls or are there any other methods?

+4
source share
1 answer

You can use the OMML2MML.XSL file (located under %ProgramFiles%\Microsoft Office\Office15 ) to convert the Microsoft Office MathML (equations) included in the text document into MathML.

The code below shows how to convert equations to a text document in MathML using the following steps:

  • Open a Word document using the OpenXML SDK (version 2.5).
  • Create an XslCompiledTransform and load the OMML2MML.XSL file.
  • Convert a word document by calling the Transform () method on the created XslCompiledTransform instance.
  • Display the result of the conversion (for example, print to the console or write to a file).

I checked the code below with a simple Word document containing two equations, text and images.

 using System.IO; using System.Xml; using System.Xml.Xsl; using DocumentFormat.OpenXml.Packaging; public string GetWordDocumentAsMathML(string docFilePath, string officeVersion = "14") { string officeML = string.Empty; using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, false)) { string wordDocXml = doc.MainDocumentPart.Document.OuterXml; XslCompiledTransform xslTransform = new XslCompiledTransform(); // The OMML2MML.xsl file is located under // %ProgramFiles%\Microsoft Office\Office15\ xslTransform.Load(@"c:\Program Files\Microsoft Office\Office" + officeVersion + @"\OMML2MML.XSL"); using (TextReader tr = new StringReader(wordDocXml)) { // Load the xml of your main document part. using (XmlReader reader = XmlReader.Create(tr)) { using (MemoryStream ms = new MemoryStream()) { XmlWriterSettings settings = xslTransform.OutputSettings.Clone(); // Configure xml writer to omit xml declaration. settings.ConformanceLevel = ConformanceLevel.Fragment; settings.OmitXmlDeclaration = true; XmlWriter xw = XmlWriter.Create(ms, settings); // Transform our OfficeMathML to MathML. xslTransform.Transform(reader, xw); ms.Seek(0, SeekOrigin.Begin); using (StreamReader sr = new StreamReader(ms, Encoding.UTF8)) { officeML = sr.ReadToEnd(); // Console.Out.WriteLine(officeML); } } } } } return officeML; } 

To convert only one equation (and not the entire word document), simply request the required Math Office paragraph (m: oMathPara) and use the OuterXML property of this node. The code below shows how to execute a query for the first paragraph of math:

 string mathParagraphXml = doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Math.Paragraph>().First().OuterXml; 

Use the returned XML to feed the TextReader .

+6
source

Source: https://habr.com/ru/post/1482815/


All Articles