LoadIFilter () does not work in all PDF files (but MS filtdump.exe does not.)

I am trying to write a C # utility that mimics the behavior of filtdump.exe from the Windows Search SDK (since filtdump does not seem to be self-propagating.) I am facing a combination of conflicting and / or non-existent documentation and technical issues that I cannot track. I hope someone can help remove one or another of these obstacles ...

According to MSDN, filtdump uses ILoadFilter::LoadIFilter to load IFilter. I claim that MSDN is lying, as it also claims that ILoadFilter::LoadIFilter exists only on Windows 7, but filtdump works fine on Windows. Process Monitor indicates that it actually calls LoadIFilter() from query.dll , so what I do:

 public static class NativeMethods { // From Windows SDK v7.1, NTQuery.h [DllImport("query.dll", CharSet = CharSet.Unicode)] public static extern int LoadIFilter( string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] ref object pUnkOuter, ref IFilter ppIUnk); } object iUnknown = null; IFilter filter = null; var result = NativeMethods.LoadIFilter(args[0], ref iUnknown, ref filter); if (result != ResultCodes.S_OK) { Console.WriteLine("Failed to load an IFilter for {0}: {1}", args[0], result); return; } 

For the most part, this application and filtdump give me the same results - they can open and extract text from text, a Word document and Outlook e-mail, and both do not work in the same set of other documents that have no IFilter. However, PDF files give me a problem. filtdump manages to open and extract text from most of the PDFs that I have selected for it, but each of the PDFs that I try to use in my own application gives me HRESULT 0x80004005, E_FAIL.

This is the same error from this question , but I get it in every PDF file, and filtdump not, so I know that IFilter works on at least some documents. Has anyone done this kind of thing before using PDF files that can see what I'm doing wrong?

+6
source share
3 answers

You might want to see this blog post . In short, the Adobe PDF v10 filter uses a whitelist of applications that are allowed to use the filter, including Microsoft diagnostic tools such as filtdump.exe , presumably as a "security measure."

+3
source

IFilter loading error due to the fact that Adove PDF Filter is marked as STA, and our c-application has MTA by default, so it cannot load the PDF filter. Try making your STA app, and then download the PDF filter.

Ajax

+1
source

I also expect filterdump to use the old Win32 LoadIFilter call , which was available from Windows 2000.

I saw the same problem that you solved by starting the calling process in the task. fooobar.com/questions/895768 / ....

I also had a similar problem with installing Reader 10.1.5, although Win32 LoadIFilter () returned E_NOTIMPL not E_FAIL.

Adobe seems to have violated the standard Win32 call to LoadIFilter () by excluding the ability to load content into IFilter using the IStorage interface loading method, but the object still returns that interface, accessible via QI.

For this problem in Windows 7 and later, you can create a FilterRegistration object that implements ILoadFilter, and then call ILoadFilter :: LoadIFilter () to create a filter COM object. Then get IPersistStream and call Load () on it with an IStream containing the contents of the file.

For older versions, you must first find the CLSID of the filter in the registry or statically set Adobe CLSID as the configuration value if you want to make it permanent.

0
source

Source: https://habr.com/ru/post/895767/


All Articles