There are actually two problems that need to be solved for what you are trying to do. The first is iterating over all the elements of a block level in a document in document order. The second is iterating over all the built-in elements within each element of the block in the order they appear.
python-docx does not yet have functions that you will need to do directly. However, for the first problem, there is an example code here that is likely to work for you: https://github.com/python-openxml/python-docx/issues/40
There is no exact instance that I know to deal with inline elements, but I expect that you can get pretty far with paragraph.runs. All embedded data will be within the paragraph. If you got most of the way and just hung up to receive images or something, you could go down to the lxml level and decode the XML part to get what you need. If you get this far and still passionate, if you put a feature request in the GitHub problem list for something like βfeature: Paragraph.iter_inline_items ()β, I can probably provide you with some similar code to get what you need.
This requirement arises from time to time, so we will definitely want to add it at some point.
Note that block-level elements (paragraphs and tables in general) can be displayed recursively, and this will require a general solution. In particular, a paragraph can (and in fact at least one should always) appear in a table cell. A table can also be displayed in a table cell. Therefore, theoretically, it can become quite deep. A recursive function / method is the right approach to access all of these.
source share