I am trying to write a Python function that, given the path to a document file, returns the number of words in this document. This is pretty easy to do with .txt files, and there are tools that let me hack support for several more complex document formats together, but I want a really comprehensive solution.
Looking at the OpenOffice.org py-uno script interface and the list of supported formats, it would be ideal to load documents into the mute OOo and call the word count function. However, I cannot find any tutorials or code examples that go beyond the generation of the base document, and even the found code fragments are out of date and a half decades and no longer work.
Can I use OOo and Uno or not, how can I get reliable words for documents of different formats?
source share