So, you are after the material specified in [MS-ODRAW] , i.e. the so-called OfficeDrawings that can be created directly in Word using the Drawing palette?
Unfortunately, the POI offers a little help here. With HWPF (old binary * .doc file) you can get a handle to such data, for example:
HWPFDocument document; OfficeDrawings officeDrawings = document.getOfficeDrawingsMain(); OfficeDrawing drawing = officeDrawings.getOfficeDrawingAt(OFFSET); // OFFSET is a global character offset describing the position of the drawing in question // ie document.getRange().getStartOffset() + x
This drawing can then be further processed into separate entries:
EscherRecordManager escherRecordManager = new EscherRecordManager(drawing.getOfficeArtSpContainer()); EscherSpRecord escherSpRecord = escherRecordManager.getSpRecord(); EscherOptRecord escherOptRecord = escherRecordManager.getOptRecord();
Using the data from all of these records, you can theoretically display the original drawing again. But it's pretty painful ...
So far, I have only done this in one case, when I had a lot of simple arrows floating on the page. They should have been converted to a textual representation (something like: "Positions (x1, y1) and (x2, y2) are connected by an arrow"). Basically, this meant implementing a subset of [MS-ODRAW] related to these arrows using the above entries. Not a really nice task.
Backup MS Word
If using MS Word in itself is an option for you, then there is another pragmatic way:
- Retrieve all relevant offsets containing OfficeDrawings using the POI.
- Inside Word: navigate through the document using VBA and copy all the pictures with the given offsets to the clipboard.
- Use another application (I chose Visio) to upload the contents of the clipboard to PNG.
The necessary verification of the drawing in step 1 is very simple (see below). The rest can be fully automated in Word. If anyone needs this, I can share the appropriate VBA code.
if (characterRun.isSpecialCharacter()) { for (char currentChar : characterRun.text().toCharArray()) { if ('\u0008' == currentChar) return true; } }