Extract text from mht

I have an mht file, I want to get all the mht text. I find it difficult to use a regular expression, but I have languages ​​other than mht in English, so the text itself contains things like A7 = A98 = D6 ...

select all the text of the file viewed in your browser, and then copy and paste it into notepad - this is what I need.

Thanks.

+3
source share
1 answer

Open the file in Internet Explorer and save it as plain text (UTF-8). :) If you need an automatic solution, find the mht to txt converter for your platform or programming language.

Actually, you can automate this in Powershell too:

$ie = New-Object -ComObject "InternetExplorer.Application"
$ie.Navigate2("file:///C:/MyFile.mht")
$text = $ie.Document.documentElement.innerText
+1

Source: https://habr.com/ru/post/1708438/


All Articles