How to mirror a site using the JavaScript menu?

Im trying to reflect a site that uses a crazy javascript menu created on the client. Both wget and httrack cannot load the whole site, because the links are simply missing until the JS code is run. What can I do?

I tried to load the index home page into the browser. This runs the JS code, a menu is created, and I can unload the resulting DOM into an HTML file and a mirror from this file. This downloads more files since the links are already in the source. But, obviously, mirroring soon breaks down on other recently loaded pages that contain JS's uninterpreted menus.

I was thinking of replacing the menu portion of each loaded page with a static version of the menu, but I cannot find any wget or httrack flags that will allow me to run the downloaded files with an external command. I could write simple proxy filtering, but it starts to sound extreme. Other ideas?

+4
source share
2 answers

I used HtmlUnit with great success even on sites where things are obfuscated by dynamic elements.

+2
source

In my case, this will not help, but perhaps it will be useful to someone; This is what a simple proxy filtering server in Perl looks like:

 #!/usr/bin/env perl use HTTP::Proxy; use HTTP::Proxy::BodyFilter::simple; my $proxy = HTTP::Proxy->new(port => 3128); $proxy->push_filter( mime => 'text/html', response => HTTP::Proxy::BodyFilter::simple->new( sub { ${ $_[1] } =~ s/foo/bar/g } ) ); $proxy->start; 
+1
source

Source: https://habr.com/ru/post/1488452/


All Articles