Using Microsoft.MSHTML in a loop, memory leak

Hey, I'm trying to use the Microsoft.MSHTML library (version 7.0.3300.0) to extract the main text from an HTML string. I diverted this functionality to a single GetBody helper method (string).

When called in an infinite loop, the process ends up in memory (confirmed by viewing Mem Usage in the task manager). I suspect that the problem is due to my incorrect cleaning of MSHTML objects. What am I doing wrong?

My current GetBody definition (string):

public static string GetBody(string html)
{
    mshtml.IHTMLDocument2 htmlDoc = null;
    mshtml.IHTMLElement bodyElement = null;
    string body;

    try
    {
        htmlDoc = new mshtml.HTMLDocumentClass();
        htmlDoc.write(html);
        bodyElement = htmlDoc.body;
        body = bodyElement.innerText;
    }
    catch (Exception ex)
    {
        Trace.TraceError("Failed to use MSHTML to parse HTML body: " + ex.Message);
        body = email.Body;
    }
    finally
    {
        if (bodyElement != null)
            Marshal.ReleaseComObject(bodyElement);
        if (htmlDoc != null)
            Marshal.ReleaseComObject(htmlDoc);
    }

    return body;
}

Edit: A memory leak has been traced back to the code used to populate the value for html. In this case, it was Outlook Redemption.

+3
1

, mshtml, IHTMLElement2 close? ?

, ?

, , , mshtml , .

EDIT:

, , HTMLDocument2, com, .

, , ReleaseComObject , . com , .

+2

Source: https://habr.com/ru/post/1727568/


All Articles