Embed an HTML section from another site?

Is there a way to insert only part of the site into another HTML page?

Example. I see the answer that I want to blog about, so I grab the HTML content and post it somewhere, and show only this how it is done in stackoverflow. Basically, I want to block the page section with the original style, if that makes sense. Is this what the site itself needs to provide, or can I use an iframe and tell it to show only a specific element or something crazy? Open all the options, but I want it to display as HTML, not as an image (this is really a last resort).

If possible, are there any security issues I should be aware of?

+4
source share
5 answers

Do not think that the image really should be the last thing. You do not have control over the HTML / CSS of the original page, so even if you create a solution (perhaps using JavaScript to parse the desired fragment), there is no guarantee that the site will not decide to change its layout tomorrow.

Even Jeff, who controls the layout of stackoverflow.com, still prefers screen capturing on the site, rather than pulling content live.

Now, if your goal was to automatically update content, that would be a different story. But still, if you do not use any consistent method of sharing content, such as RSS, your decision will be very fragile.

+7
source

The concept you are describing roughly corresponds to what is called a “purple inclusion” or “transition”. There is a library for him, but it is not very actively developing. Here are a couple of ajaxian articles on it.

+2
source

I would recommend using a server-side solution with Python; using urllib2 to request the page, then using BeautifulSoup to parse the bit you need. BeautifulSoup has a very flexible choice of api, with which you can create heuristics for the section you are interested in.

To illustrate:

soup = BeautifulSoup(html) text = soup.find(text="Some text on the page that is unlikely to change") print soup.parent.prettify() 

That way, if the webmaster later changes the layout on the page, your scraping script should still work.

+1
source

On the client side, <iframe> is the only practical option. You can scroll it, but it may not work in the long run, because it is technically close to a clickjacking attack.

There is also a cross-site XHR, but you need to abandon the destination, and today it works only in the last few browsers.

Getting server-side HTML is very simple (every decent web platform has the ability to load a page and parse HTML, and you can use XPath / XSLT or DOM to extract the bit you want).

Getting styles will be tricky - CSS rules may not work with HTML snippets taken out of context. You will need to parse the CSS, extract and transform the rules, or use a browser and read the currentStyle each node.

Obviously, you need to strongly filter the HTML that you extract to avoid XSS. This is harder than it sounds.

If you don't need to automate this, a good HTML + CSS WYSIWYG editor can extract a piece of content with styles.

+1
source

It looks like IE8 Web Slices is perfect. However, it is only available in IE8, and the origin site will need to be implemented so that you can use it.

0
source

Source: https://habr.com/ru/post/1286251/


All Articles