Invoke-Webrequest response encoding

When using the InvokeWebRequest cmdlet for some web pages with non-English characters, I don’t see a way to determine the encoding of the response / page content.

I use a simple GET at http://colours.cz/ucinkujici/ and the names of these artists are corrupted. You can try this with this simple line:

Invoke-WebRequest http://colours.cz/ucinkujici 

Is this caused by the design of the cmdlet? Is there any way to specify the encoding? Is there a workaround to properly analyze the response?

+4
source share
1 answer

I think you're right: /

Here is one way to get the right content: first save the response in a file and then read it into a variable with the correct encoding. however you are not dealing with an HtmlWebResponseObject :

 Invoke-WebRequest http://colours.cz/ucinkujici -outfile .\colours.cz.txt $content = gc .\colours.cz.txt -Encoding utf8 -raw 

This will take you equally far:

 [net.httpwebrequest]$httpwebrequest = [net.webrequest]::create('http://colours.cz/ucinkujici/') [net.httpWebResponse]$httpwebresponse = $httpwebrequest.getResponse() $reader = new-object IO.StreamReader($httpwebresponse.getResponseStream()) $content = $reader.ReadToEnd() $reader.Close() 

If you really want such an HtmlWebResponseObject , here is a way to get, for example, material from ParsedHtml more or less "readable" using Invoke-WebRequest ( $bad versus $better ):

 Invoke-WebRequest http://colours.cz/ucinkujici -outvariable htmlwebresponse $bad = $htmlwebresponse.parsedhtml.title $better = [text.encoding]::utf8.getstring([text.encoding]::default.GetBytes($bad)) $bad = $htmlwebresponse.links[7].outerhtml $better = [text.encoding]::utf8.getstring([text.encoding]::default.GetBytes($bad)) 

Update : this is a new approach, knowing that you want to work with ParsedHtml .
When you have content (see the first two-line code fragment, which 1) saves the response to a file, and then 2) reads the contents of the file with the correct encoding), you can do this:

 $ParsedHtml = New-Object -com "HTMLFILE" $ParsedHtml.IHTMLDocument2_write($content) $ParsedHtml.Close() 

Et voilà:] For example, $ParsedHtml.title now displays correctly, assuming everything else is okay too ...

+7
source

Source: https://habr.com/ru/post/1491937/


All Articles