Character reliability for .doc files

What is a reliable way to automatically count characters and / or words in a .doc or .docx file?

The only real requirement is a reasonably accurate and reasonably reliable account.
It should work with documents containing something other than a Latin script, so character counting is good enough for most cases.
The score does not have to match Word, but the closer the better.
Since there are many applications that can generate .doc files, it still doesn’t cost anything, but this case should be fun, so we know that the score may be inaccurate. For all other cases, the counter must be, say, at least 99% accurate, at least 99% of the time.

I am open to the technologies involved, but something that can work on the * NIX command line would be very preferable.

Is there any reasonable solution for this?

+3
source share
4 answers

Here's a link for some Linux Vocabulary converters.

For example, you can use

antiword file.doc | wc

to do the counting.

Edit:

This link shows that AbiWord has a command line interface that you can use to convert the .docx format to .txt, and then count the words using "wc". AbiWord supports docx format

+3
source

Mac OS X , , , , . MacRuby:

NSSpellChecker.sharedSpellChecker.countWordsInString(NSAttributedString.alloc.initWithURL(fileURL, documentAttributes:nil), language:nil)

- docx - Antiword antiword | wc -w.

+1

Microsoft Office. DOC , , . , , Word - , , ( ) , "", , "" , , , , Word , .

0

- , , .
( ) , : http://allworldphone.com/count-words-characters.htm

, , / .

100% 99%, ( 20-50 ), .

, . .

0

Source: https://habr.com/ru/post/1733113/


All Articles