Hey, I'm trying to use the Microsoft.MSHTML library (version 7.0.3300.0) to extract the main text from an HTML string. I diverted this functionality to a single GetBody helper method (string).
When called in an infinite loop, the process ends up in memory (confirmed by viewing Mem Usage in the task manager). I suspect that the problem is due to my incorrect cleaning of MSHTML objects. What am I doing wrong?
My current GetBody definition (string):
public static string GetBody(string html)
{
mshtml.IHTMLDocument2 htmlDoc = null;
mshtml.IHTMLElement bodyElement = null;
string body;
try
{
htmlDoc = new mshtml.HTMLDocumentClass();
htmlDoc.write(html);
bodyElement = htmlDoc.body;
body = bodyElement.innerText;
}
catch (Exception ex)
{
Trace.TraceError("Failed to use MSHTML to parse HTML body: " + ex.Message);
body = email.Body;
}
finally
{
if (bodyElement != null)
Marshal.ReleaseComObject(bodyElement);
if (htmlDoc != null)
Marshal.ReleaseComObject(htmlDoc);
}
return body;
}
Edit: A memory leak has been traced back to the code used to populate the value for html. In this case, it was Outlook Redemption.