Using OpenXML, can I read the contents of a document by page number?
wordDocument.MainDocumentPart.Document.Body provides the full text of the document.
public void OpenWordprocessingDocumentReadonly() { string filepath = @"C:\...\test.docx";
MSDN Link
Update 1:
it looks like page breaks are set below
<w:pw:rsidR="003328B0" w:rsidRDefault="003328B0"> <w:r> <w:br w:type="page" /> </w:r> </w:p>
So now I need to split the XML with the validation above and take InnerTex for each, which will give me text with the page type.
Now the question is, how can I split the XML with the validation above?
Update 2:
Page breaks are only set when you have page breaks, but if text floats from one page to other pages, then there is no page break XML element, so it returns to the same call as identifying page breaks.
source share