How can I get a blank line from a document that excludes all images or tables or numbers. I will manipulate and create a list of words from these documents. Therefore, I need only the text part of documents using C #
You probably need to learn IFilters . This is how most search indexers access plain text from documents in Windows. Here's a tutorial and sample project with source code that you can use to extract text from Office documents and PDF files, etc.
You just need to make sure that the correct IFilters are installed on your computer. Microsoft provides a free set of filters for Office documents . Adobe also provides a filter, but it fills the trash. If you can, try FoxIt IFilter , it is much better.
You must maintain the format of each document; There is no general method for reading all document formats.For example, Microsoft Office Word document files must be interpreted in their own library, unlike OpenOffice document files.
Source: https://habr.com/ru/post/1776521/More articles:С# Unit Testing: Почему ожидаемый результат должен быть включен в тестовое имя? - c#Как я могу работать с несколькими настройками клиента perforce в TextMate? - perforceDesired upload to .Net4 Link-to-EF; possibly like DataLoadOptions - .netКак разместить два элемента HTML с размером содержимого рядом? - htmlThe official standard document for the specification of HTTP streaming - httpIs this an error in WSS3.0 web service requests? - filterIs there a way to get the localization of a Sharepoint site using web services? - timezoneHow to determine return type for JNI method with user class Object? - javaJSON date format mm / dd / yyyy - jsonDjango with IronPython and VS2010? - pythonAll Articles